Syncing

The Three Repositories

Qmulum synchronisation concerns itself with synchronising three representations of files:

The device keeps track of its files and the server files it is aware of. The server also keeps track of the files that have been synced to a particular client. Each client (e.g. each device) has a client ID. The server tracks the sync status against each client of each file.

Synchronisation Sequence

The App performs synchronisation in the following stages, in order:

  1. Stage 1: Sync Device files to Local DB (adds files to local DB where the xxhash does not exist in the local DB)
  2. Stage 2: Pushing Files listed in the Local DB to Server. This gets files where the uploadStatus of the file is Pending/In Progress.
  3. Stage 3: Server to Local DB
  4. Stage 4: Local DB to device (tbc)

Stage 1: Sync device to Local DB

Batch size: 100

Post-condition: Each file on the device has a record in the local DB. If it’s newly added, the record’s UploadStatus is Pending.

Pre-condition: Permissions have been granted to access the device media.

How it works

Synchronising the device to the local DB will select a batch of the 100 most recently modified images or videos from the device, that were modified since the most recently modified file already in the Local DB.

It relies on succeeding in Stage 1 synchronisation for all files in the batch before being able to continue. Is this a risk? We must assume that a file can fail. We probably need to mark this file as failed.

Steps:

Notes

Stage 2: Sync local DB to server

Pre-condition: Post-condition:

File IDs Match Hashes Match Action
Y N Determine which is newest, and download or upload accordingly (not yet implemented)
N Y Overwrite local file ID with remote
Y Y No action required
N N Upload local file to server

Stage 3: Sync server to local DB

Pre-condition: None Post-condition: Server files that have no record of sync to this client are downloaded and added to local DB.

Stage 4: Download thumbnails

Pre-condition: Thumbnails have been generated on the server Post-condition: 500px and 1800px thumbnails have been downloaded to device for all media files with no thumbnail and thumbnail records created in local DB.

When things go wrong

We need to make the assumption that something might go wrong during a synchronisation. This could range from the more likely but temporary in nature (like the device being unable to contact the server), to a permanent unknown issue of some sort, like a particular file that won’t process.

In either example, it would be good for the user to know what has happened, or at least for this to be logged for later investigation.

Storage

My library has 130,000 photos. Let’s assume a fair size library has 300,000 photos and 50,000 videos.

Type Count Average size Space required
Photos - 550px thumbnail 300,000 20kb 5.9Gb
Photos - 1,800px thumbnail 300,000 75kb 22 Gb
Videos - 550px thumbnail 50,000 20kb 1Gb
Total ~29Gb

Notes on hashing

I have had a bunch of challenges and changes with selection of a hashing algorithm. I have landed on XXH3, which is a variant of the xxhash algorithm that has optimisations for modern CPUs and is a fast and well respected non-cryptographic hashing algorithm.