Qmulum synchronisation concerns itself with synchronising three representations of files:
The device keeps track of its files and the server files it is aware of. The server also keeps track of the files that have been synced to a particular client. Each client (e.g. each device) has a client ID. The server tracks the sync status against each client of each file.
The App performs synchronisation in the following stages, in order:
Batch size: 100
Post-condition: Each file on the device has a record in the local DB. If it’s newly added, the record’s UploadStatus is Pending.
Pre-condition: Permissions have been granted to access the device media.
Synchronising the device to the local DB will select a batch of the 100 most recently modified images or videos from the device, that were modified since the most recently modified file already in the Local DB.
It relies on succeeding in Stage 1 synchronisation for all files in the batch before being able to continue. Is this a risk? We must assume that a file can fail. We probably need to mark this file as failed.
Steps:
Notes
Pre-condition: Post-condition:
File IDs Match | Hashes Match | Action |
---|---|---|
Y | N | Determine which is newest, and download or upload accordingly (not yet implemented) |
N | Y | Overwrite local file ID with remote |
Y | Y | No action required |
N | N | Upload local file to server |
Pre-condition: None Post-condition: Server files that have no record of sync to this client are downloaded and added to local DB.
Pre-condition: Thumbnails have been generated on the server Post-condition: 500px and 1800px thumbnails have been downloaded to device for all media files with no thumbnail and thumbnail records created in local DB.
We need to make the assumption that something might go wrong during a synchronisation. This could range from the more likely but temporary in nature (like the device being unable to contact the server), to a permanent unknown issue of some sort, like a particular file that won’t process.
In either example, it would be good for the user to know what has happened, or at least for this to be logged for later investigation.
My library has 130,000 photos. Let’s assume a fair size library has 300,000 photos and 50,000 videos.
Type | Count | Average size | Space required |
---|---|---|---|
Photos - 550px thumbnail | 300,000 | 20kb | 5.9Gb |
Photos - 1,800px thumbnail | 300,000 | 75kb | 22 Gb |
Videos - 550px thumbnail | 50,000 | 20kb | 1Gb |
Total | ~29Gb |
I have had a bunch of challenges and changes with selection of a hashing algorithm. I have landed on XXH3, which is a variant of the xxhash algorithm that has optimisations for modern CPUs and is a fast and well respected non-cryptographic hashing algorithm.