Backported the Vc code
reviewed by Boudewijn Squashed commit of the following: commit 08df248c17d90a3f753be509c5ec7536d6915306 Author: Dmitry Kazakov <dimula73@gmail.com> Date: Wed Dec 12 16:21:26 2012 +0400 Added a note for packagers how to build multi-arch build of Calligra commit 06b96db7da4428d112d0e981a67e18b6e871ae53 Author: Dmitry Kazakov <dimula73@gmail.com> Date: Thu Dec 6 13:52:10 2012 +0400 Finished multi-arch build for the Circle Mask Generator I also merged the factories code with the composition multi-arch implementation, so the code is quite nice and compact now. At least no so frightening templatish as it was in the beginning =) commit be8a4307501cfe995a7961d5d1e04398f6e08d9a Author: Dmitry Kazakov <dimula73@gmail.com> Date: Thu Dec 6 12:18:53 2012 +0400 Added first multi-arch implementation of KisCircleMaskGenerator code It is not yet finished: 1) It doesn't compile on !HAVE_VC 2) It is not merged with composition factories code commit 2553baf2af3db994ac7642fdd35dc0ebe65748c2 Author: Dmitry Kazakov <dimula73@gmail.com> Date: Sat Jan 12 15:37:53 2013 +0400 Made the per-arch compilation code reusable Now I can start making the same thing for KisAutoBrush Conflicts: libs/pigment/CMakeLists.txt libs/pigment/compositeops/KoOptimizedCompositeOpFactory.cpp libs/pigment/compositeops/KoOptimizedCompositeOpFactoryPerArch.h commit 9271691074e37828eb7a4281347333958c24977c Author: Dmitry Kazakov <dimula73@gmail.com> Date: Tue Dec 4 11:06:01 2012 +0400 Added a PACKAGERS_BUILD option for code generation for many architectures This option is disabled by default. By default we build the whole Calligra optimized for the host architecture. When the option is on, the hottest parts of calligra will compile optimized for several most popular architectures. The rest of the code will not use any brand-new instructions for not breaking binary compatibility among cpus. Short manual: 1) If you build Calligra for yourself and are not going to copy Krita binary to another CPU, disable this option. 2) If you build a Calligra package and are going to distribute it among users, then enable the option. Conflicts: CMakeLists.txt config-vc.h.cmake commit d20213b143560e415cece4e8d31e001730626a0e Author: Dmitry Kazakov <dimula73@gmail.com> Date: Mon Dec 3 10:47:53 2012 +0400 Removed hardcoded setting of optimization flags for non-multiarch parts commit 9d8b10e0841680e26630b7050328735ca98061b4 Author: Dmitry Kazakov <dimula73@gmail.com> Date: Mon Dec 3 10:15:08 2012 +0400 Fix compilation when no Vc library is present commit b157f2103a5598dd08abd694d22d8604b6522b64 Author: Dmitry Kazakov <dimula73@gmail.com> Date: Sun Dec 2 22:16:08 2012 +0400 Fixed an alpha-locked bug Sorry for the inconvenience. BUG:311012 commit 308170695c6c5277cd04b7bbe1d4228dad282995 Author: Dmitry Kazakov <dimula73@gmail.com> Date: Sat Jan 12 15:17:49 2013 +0400 Added the first version of per-architecture binaries for composition Pros: + we can have prebuild versions for all the architectures supported by Vc (Amd XMA4 and XOP are not supported by Vc yet) + the implementation is chosen dynamically on Krita start + the semi-general code for multi-arch builds now in KoVcMultiArchBuildSupport.h (might be ported upstream in the future) Cons: - it depends on Vc's 'staging' branch, so it can't be put in master right now - the code became much less readable due to all that template magic - I had to copy-paste Vc's 'vc_compile_for_all_implementations' cmake macro, because we do not need 'Scalar' implementation - the size of the pigment library grew almost 1.5 times: 11->17 MiB (probably, we still need plugin system for this) Conflicts: libs/pigment/CMakeLists.txt libs/pigment/compositeops/KoOptimizedCompositeOpAlphaDarken32.h libs/pigment/compositeops/KoOptimizedCompositeOpFactory.cpp libs/pigment/compositeops/KoOptimizedCompositeOpOver32.h libs/pigment/compositeops/KoStreamedMath.h commit 12933dd4ea64ea59b9a038c2f70db87e6ef60810 Author: Dmitry Kazakov <dimula73@gmail.com> Date: Sun Dec 2 09:37:49 2012 +0400 Optimized vector composite ops by 1.5-2 times more Conversion Uint<->Float is quite expensive in comparison to Int<->Float (2-2.5 times). This happens because of special code that handles sign bit of the number. So discarding this bit with conversion Uint->Int makes a huge speedup. Now the vector version of the composition is 1.8-8.7 times faster that the old version (weighted: 3.2 times). Many thanks to Matthias Kretz for pointing this out! CCMAIL:kimageshop@kde.org CCMAIL:kretz@kde.org commit 57ee76dc327b7d771ffaa62ccc4670d29abb015c Author: Dmitry Kazakov <dimula73@gmail.com> Date: Sat Dec 1 22:19:49 2012 +0400 Fixed a 1.4 times speed regression when legacy/optimized ops are put together The optimized and legacy composite ops should be put into separate object files. Otherwise, some code layout/locality problem arises. I do not know the exact explanation of this phenomenon, but splitting the implementations fixes it. commit 3a3491ce37a284110963e584609f1e3a030005a4 Author: Dmitry Kazakov <dimula73@gmail.com> Date: Tue Nov 27 18:34:24 2012 +0400 Create Vc version of the composite op only when the online cpu supports it commit 767405a815c1208b79ad40dffe3db6ecefeed7a8 Author: Dmitry Kazakov <dimula73@gmail.com> Date: Mon Nov 26 17:28:26 2012 +0400 Fixed a zero-alpha bug in the vector implementation of the OVER composite op commit 0485f3bdc33d6ac846003e4c6313567b93273347 Author: Dmitry Kazakov <dimula73@gmail.com> Date: Thu Nov 22 20:29:03 2012 +0400 Fixed warnings and a bug in the vector compositioning commit bb36f513fbf442e9e00af0f10a874a8814076bbc Author: Dmitry Kazakov <dimula73@gmail.com> Date: Thu Nov 22 20:05:17 2012 +0400 Fixed compilation when no Vc library is present in the system commit 1f52906297b725b5c33b4e25d1df72db8849c516 Author: Dmitry Kazakov <dimula73@gmail.com> Date: Fri Oct 26 15:25:12 2012 +0400 Fixed a bug in the optimized OVER composite In some cases src_alpha does not correspond to the real source alpha because it has opacity and mask mixed to it. In these cases we cannot use memcpy. commit 8645a698f71cadec3c4403e2d09c4a5ff953f1cd Author: Dmitry Kazakov <dimula73@gmail.com> Date: Thu Oct 25 19:21:40 2012 +0400 Added fast-path optimizations to the vector Alpha Darken composite op The cases of 0 or 255 alpha value are quite common in Krita commit 1580f62db31a6ad73aadea5a51c9227f53cd41db Author: Dmitry Kazakov <dimula73@gmail.com> Date: Wed Oct 24 22:53:56 2012 +0400 Optimized Vector Composite Over to special cases of alpha Alpha: 255 and 0 are too common in Krita, so these checks do really good work. Now some of the Stroke Benchmark execute 10 or 20% faster. For others there is no change. commit 8a2258b95a22fce7483e6539e713199d0faa62e9 Author: Dmitry Kazakov <dimula73@gmail.com> Date: Tue Oct 16 16:51:09 2012 +0400 The Vc implementation of the composite ops in ready for testing All the known bugs are fixed. commit 6ec028de613d7b5d79e93622dc5b7b8b74470023 Author: Dmitry Kazakov <dimula73@gmail.com> Date: Tue Oct 16 10:37:26 2012 +0400 Added Vc implementation of the "over" composite There is still one bug in both the composites: the calculation of a single pixel compositions should be done in float instead of integers, otherwise it causes artifacts on the canvas during painting. commit 23879a388b2ad8089e946e5f614c2d03d515c73d Author: Dmitry Kazakov <dimula73@gmail.com> Date: Mon Oct 15 11:07:58 2012 +0400 Added an optimized version of Alpha Darken composite op It gives 1.58...1.74 times better result of the composition on Sandy Bridge. Other architectures are to be tested. Conflicts: krita/CMakeLists.txt krita/benchmarks/CMakeLists.txt commit d8d16dcb5e406e2beaf0e285ef2485037fe38166 Author: Dmitry Kazakov <dimula73@gmail.com> Date: Mon Oct 15 10:57:16 2012 +0400 Fixed a cmake config bug which made Vc do not use streamed extensions Without these flags Vc falls back to sse2 and doesn't use extensions present in the current cpu.
parent
04395067
Please register or sign in to comment