Optimize LanguageController's MimeTypeCache
Previously the filename of each url argument to LanguageController::languagesForUrl() was matched against all cached glob pattern suffixes (sometimes not "all" as suffixes of an already matched/added language were skipped). Now the filename is only matched against the suffixes that end with the same case-insensitive character. This change should improve the performance of this frequently called function, because case-insensitive string comparison looks like the most expensive among the actually performed in practice operations in cases when the language is found in mimeTypeCache. Suffix matching performance could be further improved by storing reverse suffixes in a trie (aka prefix tree). However I could not find a suitable trie implementation in KDevelop or one of its dependencies. QmlJS::PersistentTrie seems to be the closest match, but it has many unneeded features and stores only strings with no ability to attach user data to them (such as ILanguageSupport*). Implementing a trie data structure specifically for this optimization would be an overkill. For some reason, using std::lower_bound in place of std::equal_range in MimeTypeCache::languagesForFileName() makes the most important benchmark - benchLanguagesForUrlFilledCache() - somewhat slower on most data rows. I have implemented and benchmarked 3 alternative optimizations of suffix matching in MimeTypeCache::languagesForFileName(): 1. Binary searching for the last fileName's extension in the std::vector<StringLanguagePair> sorted by the entire last extension string case-insensitively. 2. Binary searching for lower-cased last fileName's extension in the lower-cased std::vector<StringLanguagePair> sorted by the entire last extension string case-sensitively. 3. Looking up lower-cased last fileName's extension in lower-cased QMultiHash<QString, ILanguageSupport*>. The second alternative optimization turned out to be faster than the first one; the third - faster than the second one. All 3 alternative optimizations turned out to be: a) Less general than this version: multiple extensions, such as tar.gz, were no longer optimized. This is only a theoretical downside because none of the maintained KDevelop plugins supports wildcard patterns with multiple extensions. b) Slower than this version. Probably because of expensive temporary QString construction - the (lower-cased) last fileName's extension. There is currently no need to optimize literal pattern matching, because only one literal pattern is supported: "CMakeLists.txt". If more literal patterns become supported in the future, the elements of MimeTypeCache::m_literalPatterns can be easily lower-cased and sorted to speed up matching against them. Average BenchLanguageController results before and at this commit in milliseconds per iteration: Data row Before At 1. benchLanguagesForUrlNoCache() CMakeLists 0.00046 0.00045 cmakelists wrong case 0.00046 0.00045 lower-case 0.00058 0.00045 upper-case 0.00058 0.00045 mixed-case 0.00058 0.00046 .C 0.00050 0.00039 .cl 0.00070 0.00038 existent C with extension 0.00053 0.00044 .cc 0.00058 0.00041 .cmake 0.00039 0.00041 .diff 0.00037 0.00038 .qml 0.00036 0.00037 existent C w/o extension 0.16 0.16 existent patch w/o extension 0.20 0.20 2. benchLanguagesForUrlFilledCache() CMakeLists 0.0011 0.00050 cmakelists wrong case 0.0011 0.00050 lower-case 0.00083 0.00048 upper-case 0.00083 0.00048 mixed-case 0.00085 0.00048 .C 0.00072 0.00040 .cl 0.00091 0.00042 existent C with extension 0.00064 0.00046 .cc 0.00080 0.00043 .cmake 0.00090 0.00042 .diff 0.00083 0.00041 .qml 0.00093 0.00045 existent C w/o extension 0.16 0.16 existent patch w/o extension 0.20 0.20 3. benchLanguagesForUrlNoMatchNoCache() empty 0.0016 0.000013 4. benchLanguagesForUrlNoMatchFilledCache() empty 0.0018 0.000013 Note: benchLanguagesForUrlNoMatch*() benchmark results are practically unaffected by this commit because the QMimeDatabase::mimeTypeForFile() call is orders of magnitude slower than the MimeTypeCache::languagesForFileName() call optimized here. The only exception is the "empty" data row, which became more than 100 times faster thanks to the added early return. Most benchmarks run faster now. benchLanguagesForUrlFilledCache(), which is closest to the real-life LanguageController usage in KDevelop, runs about two times faster thanks to this commit.
parent
025e4fa9