Skip to content
Commit c4be9833 authored by Dmitry Kazakov's avatar Dmitry Kazakov
Browse files

Backported the Vc code

reviewed by Boudewijn

Squashed commit of the following:

commit 08df248c17d90a3f753be509c5ec7536d6915306
Author: Dmitry Kazakov <dimula73@gmail.com>
Date:   Wed Dec 12 16:21:26 2012 +0400

    Added a note for packagers how to build multi-arch build of Calligra

commit 06b96db7da4428d112d0e981a67e18b6e871ae53
Author: Dmitry Kazakov <dimula73@gmail.com>
Date:   Thu Dec 6 13:52:10 2012 +0400

    Finished multi-arch build for the Circle Mask Generator

    I also merged the factories code with the composition multi-arch
    implementation, so the code is quite nice and compact now. At least no so
    frightening templatish as it was in the beginning =)

commit be8a4307501cfe995a7961d5d1e04398f6e08d9a
Author: Dmitry Kazakov <dimula73@gmail.com>
Date:   Thu Dec 6 12:18:53 2012 +0400

    Added first multi-arch implementation of KisCircleMaskGenerator code

    It is not yet finished:
    1) It doesn't compile on !HAVE_VC
    2) It is not merged with composition factories code

commit 2553baf2af3db994ac7642fdd35dc0ebe65748c2
Author: Dmitry Kazakov <dimula73@gmail.com>
Date:   Sat Jan 12 15:37:53 2013 +0400

    Made the per-arch compilation code reusable

    Now I can start making the same thing for KisAutoBrush

    Conflicts:

    	libs/pigment/CMakeLists.txt
    	libs/pigment/compositeops/KoOptimizedCompositeOpFactory.cpp
    	libs/pigment/compositeops/KoOptimizedCompositeOpFactoryPerArch.h

commit 9271691074e37828eb7a4281347333958c24977c
Author: Dmitry Kazakov <dimula73@gmail.com>
Date:   Tue Dec 4 11:06:01 2012 +0400

    Added a PACKAGERS_BUILD option for code generation for many architectures

    This option is disabled by default.

    By default we build the whole Calligra optimized for the host architecture.
    When the option is on, the hottest parts of calligra will compile optimized
    for several most popular architectures. The rest of the code will not use
    any brand-new instructions for not breaking binary compatibility among cpus.

    Short manual:
    1) If you build Calligra for yourself and are not going to copy Krita binary
       to another CPU, disable this option.
    2) If you build a Calligra package and are going to distribute it among users,
       then enable the option.

    Conflicts:

    	CMakeLists.txt
    	config-vc.h.cmake

commit d20213b143560e415cece4e8d31e001730626a0e
Author: Dmitry Kazakov <dimula73@gmail.com>
Date:   Mon Dec 3 10:47:53 2012 +0400

    Removed hardcoded setting of optimization flags for non-multiarch parts

commit 9d8b10e0841680e26630b7050328735ca98061b4
Author: Dmitry Kazakov <dimula73@gmail.com>
Date:   Mon Dec 3 10:15:08 2012 +0400

    Fix compilation when no Vc library is present

commit b157f2103a5598dd08abd694d22d8604b6522b64
Author: Dmitry Kazakov <dimula73@gmail.com>
Date:   Sun Dec 2 22:16:08 2012 +0400

    Fixed an alpha-locked bug

    Sorry for the inconvenience.

    BUG:311012

commit 308170695c6c5277cd04b7bbe1d4228dad282995
Author: Dmitry Kazakov <dimula73@gmail.com>
Date:   Sat Jan 12 15:17:49 2013 +0400

    Added the first version of per-architecture binaries for composition

    Pros:
    + we can have prebuild versions for all the architectures supported
      by Vc (Amd XMA4 and XOP are not supported by Vc yet)
    + the implementation is chosen dynamically on Krita start
    + the semi-general code for multi-arch builds now in
      KoVcMultiArchBuildSupport.h (might be ported upstream in the future)

    Cons:
    - it depends on Vc's 'staging' branch, so it can't be put in master
      right now
    - the code became much less readable due to all that template magic
    - I had to copy-paste Vc's 'vc_compile_for_all_implementations' cmake
      macro, because we do not need 'Scalar' implementation
    - the size of the pigment library grew almost 1.5 times: 11->17 MiB
      (probably, we still need plugin system for this)


    Conflicts:

    	libs/pigment/CMakeLists.txt
    	libs/pigment/compositeops/KoOptimizedCompositeOpAlphaDarken32.h
    	libs/pigment/compositeops/KoOptimizedCompositeOpFactory.cpp
    	libs/pigment/compositeops/KoOptimizedCompositeOpOver32.h
    	libs/pigment/compositeops/KoStreamedMath.h

commit 12933dd4ea64ea59b9a038c2f70db87e6ef60810
Author: Dmitry Kazakov <dimula73@gmail.com>
Date:   Sun Dec 2 09:37:49 2012 +0400

    Optimized vector composite ops by 1.5-2 times more

    Conversion Uint<->Float is quite expensive in comparison to
    Int<->Float (2-2.5 times). This happens because of special code
    that handles sign bit of the number. So discarding this bit with
    conversion Uint->Int makes a huge speedup.

    Now the vector version of the composition is 1.8-8.7 times faster
    that the old version (weighted: 3.2 times).

    Many thanks to Matthias Kretz for pointing this out!

    CCMAIL:kimageshop@kde.org
    CCMAIL:kretz@kde.org

commit 57ee76dc327b7d771ffaa62ccc4670d29abb015c
Author: Dmitry Kazakov <dimula73@gmail.com>
Date:   Sat Dec 1 22:19:49 2012 +0400

    Fixed a 1.4 times speed regression when legacy/optimized ops are put together

    The optimized and legacy composite ops should be put into separate
    object files. Otherwise, some code layout/locality problem arises.
    I do not know the exact explanation of this phenomenon, but splitting
    the implementations fixes it.

commit 3a3491ce37a284110963e584609f1e3a030005a4
Author: Dmitry Kazakov <dimula73@gmail.com>
Date:   Tue Nov 27 18:34:24 2012 +0400

    Create Vc version of the composite op only when the online cpu supports it

commit 767405a815c1208b79ad40dffe3db6ecefeed7a8
Author: Dmitry Kazakov <dimula73@gmail.com>
Date:   Mon Nov 26 17:28:26 2012 +0400

    Fixed a zero-alpha bug in the vector implementation of the OVER composite op

commit 0485f3bdc33d6ac846003e4c6313567b93273347
Author: Dmitry Kazakov <dimula73@gmail.com>
Date:   Thu Nov 22 20:29:03 2012 +0400

    Fixed warnings and a bug in the vector compositioning

commit bb36f513fbf442e9e00af0f10a874a8814076bbc
Author: Dmitry Kazakov <dimula73@gmail.com>
Date:   Thu Nov 22 20:05:17 2012 +0400

    Fixed compilation when no Vc library is present in the system

commit 1f52906297b725b5c33b4e25d1df72db8849c516
Author: Dmitry Kazakov <dimula73@gmail.com>
Date:   Fri Oct 26 15:25:12 2012 +0400

    Fixed a bug in the optimized OVER composite

    In some cases src_alpha does not correspond to the real source alpha
    because it has opacity and mask mixed to it. In these cases we cannot
    use memcpy.

commit 8645a698f71cadec3c4403e2d09c4a5ff953f1cd
Author: Dmitry Kazakov <dimula73@gmail.com>
Date:   Thu Oct 25 19:21:40 2012 +0400

    Added fast-path optimizations to the vector Alpha Darken composite op

    The cases of 0 or 255 alpha value are quite common in Krita

commit 1580f62db31a6ad73aadea5a51c9227f53cd41db
Author: Dmitry Kazakov <dimula73@gmail.com>
Date:   Wed Oct 24 22:53:56 2012 +0400

    Optimized Vector Composite Over to special cases of alpha

    Alpha: 255 and 0 are too common in Krita, so these checks do really
    good work.

    Now some of the Stroke Benchmark execute 10 or 20% faster. For others
    there is no change.

commit 8a2258b95a22fce7483e6539e713199d0faa62e9
Author: Dmitry Kazakov <dimula73@gmail.com>
Date:   Tue Oct 16 16:51:09 2012 +0400

    The Vc implementation of the composite ops in ready for testing

    All the known bugs are fixed.

commit 6ec028de613d7b5d79e93622dc5b7b8b74470023
Author: Dmitry Kazakov <dimula73@gmail.com>
Date:   Tue Oct 16 10:37:26 2012 +0400

    Added Vc implementation of the "over" composite

    There is still one bug in both the composites: the calculation
    of a single pixel compositions should be done in float instead of
    integers, otherwise it causes artifacts on the canvas during painting.

commit 23879a388b2ad8089e946e5f614c2d03d515c73d
Author: Dmitry Kazakov <dimula73@gmail.com>
Date:   Mon Oct 15 11:07:58 2012 +0400

    Added an optimized version of Alpha Darken composite op

    It gives 1.58...1.74 times better result of the composition
    on Sandy Bridge. Other architectures are to be tested.

    Conflicts:

    	krita/CMakeLists.txt
    	krita/benchmarks/CMakeLists.txt

commit d8d16dcb5e406e2beaf0e285ef2485037fe38166
Author: Dmitry Kazakov <dimula73@gmail.com>
Date:   Mon Oct 15 10:57:16 2012 +0400

    Fixed a cmake config bug which made Vc do not use streamed extensions

    Without these flags Vc falls back to sse2 and doesn't use extensions
    present in the current cpu.
parent 04395067
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment