Syncing

The Three Repositories

Qmulum synchronisation process concerns itself with synchronising three representations of files:

The device keeps track of its files and the server files it is aware of. Each client (e.g. each device) has a client ID. The server tracks the sync status against each client of each file.

Synchronisation Sequence

The App performs synchronisation in the following stages, in order:

  1. Stage 1: Sync Device files to Device DB (adds device files to the local DB where the xxhash does not exist in the local DB)
  2. Stage 2: Pushing files listed in the Device DB to the Server. In scope files: where fileID is blank or modified since last sync.
  3. Stage 3: Server to Local DB
  4. Stage 4: Download thumbnails
  5. Stage 5: Local DB to device (tbc if needed)

Stage 1: Sync device to Local DB

Batch size: 100

Pre-condition:

Post-conditions:

In scope files: media from the device that were modified since the most recently modified file already in the Device DB, in order from oldest to newest.

Exceptions: If a file fails, it must be flagged as a failure, and processing must continue with the next file. These failures may need a way to be processed in the future.

Status: IN DEVELOPMENT

Testing: Limited opportunity to test device in a repeatable way.

Steps:

Stage 2: Two-way Local DB and Server Sync

Pre-condition:

Steps:

Local File ID Local Hash Remote File ID Remote Hash
blank any n/a n/a
matches remote does not match remote matches local does not match local
File IDs Match Hashes Match Action
Y N Determine which is newest, and download or upload accordingly (not yet implemented)
N Y Overwrite local file ID with remote
Y Y No action required
N N Upload local file to server

Stage 3: Download thumbnails

Pre-condition: Thumbnails have been generated on the server Post-condition: 500px and 1800px thumbnails have been downloaded to device for all media files with no thumbnail and thumbnail records created in local DB.

When things go wrong

We need to make the assumption that something might go wrong during a synchronisation. This could range from the more likely but temporary in nature (like the device being unable to contact the server), to a permanent unknown issue of some sort, like a particular file that won’t process.

In either example, it would be good for the user to know what has happened, or at least for this to be logged for later investigation.

Storage

My library has 130,000 photos. Let’s assume a fair size library has 300,000 photos and 50,000 videos.

Type Count Average size Space required
Photos - 550px thumbnail 300,000 20kb 5.9Gb
Photos - 1,800px thumbnail 300,000 75kb 22 Gb
Videos - 550px thumbnail 50,000 20kb 1Gb
Total ~29Gb

Notes on hashing

I have had a bunch of challenges and changes with selection of a hashing algorithm. I have landed on XXH3, which is a variant of the xxhash algorithm that has optimisations for modern CPUs and is a fast and well respected non-cryptographic hashing algorithm.