Fix and improve LanguageController's MimeTypeCache
Bugs in the old implementation that are fixed in this commit: * The regular expression matching code was incorrect and so it never matched the only pattern it handled: "CMakeLists.txt". As a result, LanguageController::languagesForUrl(".../CMakeLists.txt") returned an empty language list in background threads; resorted to extra work culminating in a call to LanguageController::languagesForMimetype() in the main thread. * The suffix matching optimization matched patterns case-sensitively and so didn't match names like "X.CPP". As with the regular expression matching bug, this resulted in a wrong return value in background threads and extra work in the main thread. * The suffix matching optimization assumed that '*' is the only wildcard character. While this is actually the case for glob patterns of the mime types that can currently end up in mimeTypeCache, it might have led to a surprising bug if more complex glob patterns became supported in the future. TestLanguageController::testLanguagesForUrlWithCache() fails on the following data rows without this commit because of these bugs: - CMakeLists - cmakelists wrong case - upper-case - mixed-case Improvements of the MimeTypeCache reimplementation in this commit: * Literal pattern optimization: "CMakeLists.txt" was the only pattern that required regular expression matching in the old implementation. It could not be handled by the suffix matching optimization. Now the new separate pattern category m_literalPatterns handles this case so that the slower regular expression matching never happens in practice. * The suffixes, literal patterns and regular expressions are now created once and cached rather than constructed in each languagesForUrl() call. * QRegularExpression is now used instead of the deprecated QRegExp. This is the list of wildcard patterns supported by maintained KDevelop plugins (collated from X-KDevelop-SupportedMimeTypes plugin entries and /usr/share/mime/globs2): kdevclangsupport text/x-chdr *.h text/x-c++hdr *.hh *.hpp *.hp *.h++ *.hxx text/x-csrc *.c:cs text/x-c++src *.c++ *.cc *.cxx *.C:cs *.cpp text/x-opencl-src *.cl text/vnd.nvidia.cuda.csrc *.cu text/vnd.nvidia.cuda.chdr *.cuh text/x-objcsrc *.m kdevpatchreview text/x-patch *.patch *.diff kdevqmljs text/x-qml *.qml *.qmlproject *.qmltypes application/javascript *.jsm *.mjs *.js KDevCMakeManager text/x-cmake cmakelists.txt *.cmake KDevCssSupport text/css *.css text/html *.html *.htm KDevPhpSupport application/x-php *.phps *.php *.php3 *.php4 *.php5 kdevpythonsupport text/x-python *.wsgi *.py *.pyx text/x-python3 *.py3x *.py3 *.py KDevRubySupport application/x-ruby *.rb Only *.c and *.C out of all supported patterns should be matched case-sensitively. But both of these patterns belong to the same plugin - kdevclangsupport. So LanguageController can safely match all patterns case-insensitively. See also https://specifications.freedesktop.org/shared-mime-info-spec/shared-mime-info-spec-latest.html Average BenchLanguageController results before and at this commit in milliseconds per iteration: Data row Before At 1. benchLanguagesForUrlNoCache() CMakeLists 0.029 0.00046 cmakelists wrong case 0.029 0.00046 lower-case 0.0023 0.00058 upper-case 0.029 0.00058 mixed-case 0.029 0.00058 .C 0.0023 0.00050 .cl 0.0023 0.00070 existent C with extension 0.0022 0.00053 .cc 0.0023 0.00058 .cmake 0.0016 0.00039 .diff 0.00094 0.00037 .qml 0.0012 0.00036 existent C w/o extension 0.16 0.16 existent patch w/o extension 0.20 0.20 2. benchLanguagesForUrlFilledCache() CMakeLists 0.032 0.0011 cmakelists wrong case 0.031 0.0011 lower-case 0.0039 0.00083 upper-case 0.030 0.00083 mixed-case 0.030 0.00085 .C 0.0039 0.00072 .cl 0.0039 0.00091 existent C with extension 0.0038 0.00064 .cc 0.0039 0.00080 .cmake 0.0039 0.00090 .diff 0.0039 0.00083 .qml 0.0039 0.00093 existent C w/o extension 0.16 0.16 existent patch w/o extension 0.20 0.20 3. benchLanguagesForUrlNoMatchNoCache() empty 0.0021 0.0016 archive 0.024 0.023 OpenDocument Text 0.024 0.023 existent archive with extension 0.030 0.029 existent archive w/o extension 0.15 0.15 4. benchLanguagesForUrlNoMatchFilledCache() empty 0.0054 0.0018 archive 0.029 0.024 OpenDocument Text 0.029 0.024 existent archive with extension 0.035 0.030 existent archive w/o extension 0.16 0.15 Almost every benchmark runs faster now. Many run more than ten times faster thanks to this commit.
parent
1fedf941
Please register or sign in to comment