CollectionSync: Optimized algorithm
This is combination of commits by Daniel Vratil. * The weakest point of CollectionSync is in findMatchingLocalNode() method that needs to traverse the LocalNode tree every time from up down to find the LocalNode* that matches the given remote Collection, which is very inefficient, especially in the case of initial synchronization. * Calling parentCollection() is expensive (even though Collection is implicitly shared), because of Entity::Entity(const Entity &other) which is very ineffective (but probably not possible to fix without API changes), so we try to reduce the number of calls to minimum by calling it once and then storing the parent collection in a variable. * The new algorithm can sync 50 000 collections in about 2 minutes, which is 4 times faster then what I ever was able to optimize the original algorithm. The algorithm stores the \"tree\" structure of collections differently than the old one: it stores list of collections for each parent in a QHash, and then starts recursively querying Akonadi for collections, easilly comparing list of descendats it got from Akonadi with the list in the QHash for the same parent RID chain. * With non-hieararchical RIDs, we don't have the full parent chain available - each collection only refers to RID of its parent, but we know nothing of the parent's parent. To correctly match those, we need to compare only parts of RemoteID chains. * Instead of listing children for each Collection individually, we request all Collections at once and then process them in one go. This means that the algorithm will require more memory, but it will be much faster when dealing with large amounts of Collections. * This adds two banchmarks for CollectionSync: one that does an initial sync of 60 000 Collections, and one that does incremental sync against 60 000 Collections. On my computer, the results are around 160 seconds for benchmarkInitialSync() and around 8 seconds for benchmarkIncrementalSync(). The benchmarks are disabled by default, because the cleanup takes insane amount of time (I blame poor implementation of DELETE command on the server).
parent
e1aee4a4
Please register or sign in to comment