Skip to content
Commit 025e4fa9 authored by Igor Kushnir's avatar Igor Kushnir
Browse files

Fix and improve LanguageController's MimeTypeCache

Bugs in the old implementation that are fixed in this commit:
  * The regular expression matching code was incorrect and so it never
matched the only pattern it handled: "CMakeLists.txt". As a result,
LanguageController::languagesForUrl(".../CMakeLists.txt") returned an
empty language list in background threads; resorted to extra work
culminating in a call to LanguageController::languagesForMimetype() in
the main thread.
  * The suffix matching optimization matched patterns case-sensitively
and so didn't match names like "X.CPP". As with the regular expression
matching bug, this resulted in a wrong return value in background
threads and extra work in the main thread.
  * The suffix matching optimization assumed that '*' is the only
wildcard character. While this is actually the case for glob patterns of
the mime types that can currently end up in mimeTypeCache, it might have
led to a surprising bug if more complex glob patterns became supported
in the future.

TestLanguageController::testLanguagesForUrlWithCache() fails on the
following data rows without this commit because of these bugs:
  - CMakeLists
  - cmakelists wrong case
  - upper-case
  - mixed-case

Improvements of the MimeTypeCache reimplementation in this commit:
  * Literal pattern optimization: "CMakeLists.txt" was the only pattern
that required regular expression matching in the old implementation. It
could not be handled by the suffix matching optimization. Now the new
separate pattern category m_literalPatterns handles this case so that
the slower regular expression matching never happens in practice.
  * The suffixes, literal patterns and regular expressions are now
created once and cached rather than constructed in each
languagesForUrl() call.
  * QRegularExpression is now used instead of the deprecated QRegExp.

This is the list of wildcard patterns supported by maintained KDevelop
plugins (collated from X-KDevelop-SupportedMimeTypes plugin entries and
/usr/share/mime/globs2):
kdevclangsupport
    text/x-chdr
        *.h
    text/x-c++hdr
        *.hh
        *.hpp
        *.hp
        *.h++
        *.hxx
    text/x-csrc
        *.c:cs
    text/x-c++src
        *.c++
        *.cc
        *.cxx
        *.C:cs
        *.cpp
    text/x-opencl-src
        *.cl
    text/vnd.nvidia.cuda.csrc
        *.cu
    text/vnd.nvidia.cuda.chdr
        *.cuh
    text/x-objcsrc
        *.m
kdevpatchreview
    text/x-patch
        *.patch
        *.diff
kdevqmljs
    text/x-qml
        *.qml
        *.qmlproject
        *.qmltypes
    application/javascript
        *.jsm
        *.mjs
        *.js
KDevCMakeManager
    text/x-cmake
        cmakelists.txt
        *.cmake
KDevCssSupport
    text/css
        *.css
    text/html
        *.html
        *.htm
KDevPhpSupport
    application/x-php
        *.phps
        *.php
        *.php3
        *.php4
        *.php5
kdevpythonsupport
    text/x-python
        *.wsgi
        *.py
        *.pyx
    text/x-python3
        *.py3x
        *.py3
        *.py
KDevRubySupport
    application/x-ruby
        *.rb

Only *.c and *.C out of all supported patterns should be matched
case-sensitively. But both of these patterns belong to the same plugin -
kdevclangsupport. So LanguageController can safely match all patterns
case-insensitively. See also
https://specifications.freedesktop.org/shared-mime-info-spec/shared-mime-info-spec-latest.html

Average BenchLanguageController results before and at this commit in
milliseconds per iteration:
        Data row                    Before      At
1. benchLanguagesForUrlNoCache()
    CMakeLists                      0.029       0.00046
    cmakelists wrong case           0.029       0.00046
    lower-case                      0.0023      0.00058
    upper-case                      0.029       0.00058
    mixed-case                      0.029       0.00058
    .C                              0.0023      0.00050
    .cl                             0.0023      0.00070
    existent C with extension       0.0022      0.00053
    .cc                             0.0023      0.00058
    .cmake                          0.0016      0.00039
    .diff                           0.00094     0.00037
    .qml                            0.0012      0.00036
    existent C w/o extension        0.16        0.16
    existent patch w/o extension    0.20        0.20
2. benchLanguagesForUrlFilledCache()
    CMakeLists                      0.032       0.0011
    cmakelists wrong case           0.031       0.0011
    lower-case                      0.0039      0.00083
    upper-case                      0.030       0.00083
    mixed-case                      0.030       0.00085
    .C                              0.0039      0.00072
    .cl                             0.0039      0.00091
    existent C with extension       0.0038      0.00064
    .cc                             0.0039      0.00080
    .cmake                          0.0039      0.00090
    .diff                           0.0039      0.00083
    .qml                            0.0039      0.00093
    existent C w/o extension        0.16        0.16
    existent patch w/o extension    0.20        0.20
3. benchLanguagesForUrlNoMatchNoCache()
    empty                           0.0021      0.0016
    archive                         0.024       0.023
    OpenDocument Text               0.024       0.023
    existent archive with extension 0.030       0.029
    existent archive w/o extension  0.15        0.15
4. benchLanguagesForUrlNoMatchFilledCache()
    empty                           0.0054      0.0018
    archive                         0.029       0.024
    OpenDocument Text               0.029       0.024
    existent archive with extension 0.035       0.030
    existent archive w/o extension  0.16        0.15

Almost every benchmark runs faster now. Many run more than ten times
faster thanks to this commit.
parent 1fedf941
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment