Skip to content
Commit a82745a4 authored by Daniel Vrátil's avatar Daniel Vrátil 🤖 Committed by Christian Mollekopf
Browse files

CollectionSync: Optimized algorithm

This is combination of commits by Daniel Vratil.

* The weakest point of CollectionSync is in findMatchingLocalNode() method
that needs to traverse the LocalNode tree every time from up down to find the
LocalNode* that matches the given remote Collection, which is very inefficient,
especially in the case of initial synchronization.

* Calling parentCollection() is expensive (even though Collection is implicitly shared),
because of Entity::Entity(const Entity &other) which is very ineffective (but probably
not possible to fix without API changes), so we try to reduce the number of calls to
minimum by calling it once and then storing the parent collection in a variable.

* The new algorithm can sync 50 000 collections in about 2 minutes, which is 4 times
faster then what I ever was able to optimize the original algorithm.

The algorithm stores the \"tree\" structure of collections differently than the old
one: it stores list of collections for each parent in a QHash, and then starts
recursively querying Akonadi for collections, easilly comparing list of descendats
it got from Akonadi with the list in the QHash for the same parent RID chain.

* With non-hieararchical RIDs, we don't have the full parent chain available -
each collection only refers to RID of its parent, but we know nothing of the
parent's parent. To correctly match those, we need to compare only parts of
RemoteID chains.

* Instead of listing children for each Collection individually, we request all Collections
at once and then process them in one go. This means that the algorithm will require more
memory, but it will be much faster when dealing with large amounts of Collections.

* This adds two banchmarks for CollectionSync: one that does an initial sync
of 60 000 Collections, and one that does incremental sync against 60 000
Collections.

On my computer, the results are around 160 seconds for benchmarkInitialSync()
and around 8 seconds for benchmarkIncrementalSync().

The benchmarks are disabled by default, because the cleanup takes insane
amount of time (I blame poor implementation of DELETE command on the server).
parent e1aee4a4
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment