These are my thoughts

Thoughts on Storage

During development, I have realised that object storage like AWS S3 is far cheaper (like a 20th of the cost) than normal file (or block) storage. So for any Cloud-hosted scenario, which I think most people would actually want, S3 support would be really valuable. The thing is, I may also want to host it on my Raspberry Pi and use the NAS for storage, so that’s very different.

Let me think for a moment. Let’s say I have a BasicStorageProvider interface. This could be implemented as either a FileStorageProvider or S3StorageProvider. It could contain the code to do integrity checks and interact with either the file system or S3. This leads to the question as to whether queues should be stored in the same file system, or if they should use local file storage. The problem then is that the storage is split between file system and local file storage. This is also relevant for performance. Let’s say I want to storage my files on my NAS. If it’s got low latency, is that going to be a problem for responsiveness?

Keeping all files in one place seems like a good idea, for things like migrating from one to another, and for simplicity of understanding. So that would imply I even keep the queue in S3.

My queue system works on presence of folders. Does S3 support a folder structure?

Thoughts on Queues

The server runs all the time, listening for HTTP messages, but also ensuring various tasks are run against files, like thumbnail creation.

Tasks can be initiated, or at least become eligible to run, in these circumstances:

  1. A file is uploaded, changed or deleted.
  2. A new task is introduced into the code, and needs to be run for all existing files.

Tasks carry a risk of causing conflict/locking issues with other tasks. For example, a task to create an index of e.g. dates photos were taken where the index is a common file between images/videos could cause locking issues on the index file, if triggered multiple times concurrently.

The most restrictive control to prevent this would be to process all tasks sequentially, or at least all incompatible tasks. The problem with this approach is we may end up slowing things down substantially by not using multiple processor cores.

So what alternatives are there? Another option is to handle tasks that are independent (e.g. creating a thumbnail for a single picture) differently from those that do share resources (e.g. updating a shared index). All options do give rise to another consideration - what if the server terminates before all requested tasks are complete? How does it recover? This calls for a queue of sorts.

There may be instances where we want to be able to kick off a task manually via the API.

A queue operation can:

  1. Transform a file (i.e. alter the file) without changing its metadata in the database. e.g. rotate an image.
  2. Create a new version of a file (e.g. generate a thumbnail for a larger image), with updates needing to be made to the database, i.e. adding the thumbnail record.

Can both of these be handled through the same interface? Are there so many alternatives that the solution must be generic, or are there a limited number of two or three types?

Operation->Output->Post processing tasks which may depend on the output.

A queue’s ProcessFile method is not responsible for obtaining the file, or dealing with the result. All it does is accept an input io.Reader and write to an output io.Writer. If special post processing is needed, this would ideally live within the Queue Operation code, and possibly be able to accept data passed to it from the ProcessFile method, if required.

Design Principles - Queues

Checkpoint 15 June 2024

At the moment, I have built it-photos-server to a point where you can upload files, it creates thumbnails asyncronously, supports S3 or a local folder, and can also perform a 90 CW rotation on an image, and update the UI with the updated image. The user experience is poor. The time it takes for the user to see the rotated image is too long.

This calls for a rethink about where work is done. At the moment, it-photos-server is very chatty. For the rotation, the following happens:

  1. User initiates rotation.
  2. Server downloads full image from S3 (size = 6mb)
  3. Server performs rotation, preserving EXIF data
  4. Server uploads rotated file.
  5. Server initiates thumbnail creation (small and large)
  6. Server downloads full image from S3 for a second time (size = 6Mb)
  7. Server performs resize of full image to small
  8. Server uploads small thumbnail to S3
  9. Server downloads full image from S3 for a THIRD time (size = 6Mb) for the large thumbnail operation
  10. Server performs resize of full image to large thumbnail
  11. Server uploads large thumbnail
  12. Only then, does the user see the rotated image. Steps 1 to 12 takes 20 seconds. The user may expect this to take under 5 seconds, but more likely 1-2 seconds.

Some facts on this: Small thumbnail (550px) ~ 13-45KB, so 200k of these would be 8GB. That’s not ridiculous to have as a local cache. Large thumbnail (1800px) ~ 285-590 KB

To speed this up, we need to change it, so that it only downloads the full image ONCE. That’s the first obvious speed improvement.

The other one I’ve been thinking about, is to store the thumbnails locally.

Checkpoint 14 July

I’m dealing with a couple of challenges at the moment:

  1. Soft and hard deletion. I’m thinking of implementing this as a queue operation, thinking it may take some time, especially if thousands of files need to be deleted. In some ways I’m concerned that by making this a standard file operation, it could be ‘requeued’ and delete all files. That would be catastrophic. Is it best to make some queue operations “Requeueable”. So maybe we would have two queue operations: -> Soft delete (Priority 1) - just stamps the file with a soft deletion time. -> Hard delete
  2. Migrating up a user database from one version to another:
    • when to do it? if there were a single DB it would be easier to do it as a one-off activity. But with lots, I think it may just need to happen automatically. And possibly every time the database is accessed. Is this too much overhead? What if it fails? Log it. What alternatives are there? a) A single database. When you upgrade the software, you upgrade the single database. Pros: Simple Cons: A separate Sqlite user database per user keeps things simple for self-hosters. This is a big plus. Moving to Postgres would create another dependency. b) When code is upgraded, have a queue process to upgrade all user DBs. Also do upgrade check at key, high frequency points, e.g. viewing photos. Pros: Allows separate user Sqlite DBs. Cons: Slight performance hit, creates a need to check the version almost every query.
  3. Knowing that storing sessions in memory could be an issue if there are multiple users. Currently there aren’t. It’s a future problem to think about.

I think next priorities are:

  1. Create a registration page, endpoints and single SQLite user DB per instance. Other infrastructure needed: forgot password flow. This creates a view of all users, and makes the instance multi-user.

Checkpoint 25 July.

Hosting. I want to put ITPS live and start hosting it. I’ve been building against AWS S3, but I think there are far better priced options. One that I want to consider is Backblaze B2.

Another thing I keep wondering about is how it might work if someone wants their data on a NAS, or their PC, but doesn’t want the hassle of having to set up port forwarding so their NAS/PC can be accessed externally.

This might mean that the hosted portion is a cache of sorts that has:

  1. Thumbnails
  2. Temporary storage of uploads until the NAS replicates/pulls them.
  3. Potentially a small version of the video

It obviously comes with some challenges around using a web UI to access the full versions of the images or videos.

Checkpoint 22 September.

Two months since last update. I’ve got Qmulum in a container and have started doing more testing on it on one of my domains hackersmacker.net on a DigitalOcean droplet. I’ve also been spending more time on the App.

Key things that I need to work on:

  1. Issue causing failure on upload of a ~160Mb video. No visibility on cause. More observability?
  2. Enhance browsing photos and videos on mobile device, including opening in larger view.
  3. Perform basic operation (rotate image) on mobile device.

Note: DigitalOcean Load balancers cost $12/node/month

Checkpoint 3 December 2024

I had been spending an awful amount of time trying to get docker compose to work properly as a deployment mechanism to production. I ended up having to try to then use GitHub actions, which brought its own set of complexity. I really struggled with the lack of visibility of what was happening in the container, and had a good think about whether I’m introducing complexity prematurely.

I decided to go back to basics and deploy with a simple bash script. And it didn’t take too long and now I can build and deploy to production with a simple make build-linux && make deploy. I’m happy with the outcome. It did make me ponder simplicity and wonder if I should look to serving static files from within the single executable to further simplify deployment.

I’ve also come up with a name and registered qmulum.com and updated the program name in the server code. My focus now needs to return to achieving the core function of being able to back up photos and videos from a mobile phone.

On this point, I’ve written sync code and it’s not reliable. I still feel I do not have the visibility of what’s happening, and I don’t have automated tests to confirm the functionality - neither on the server, nor on the app.

I need to be razor focussed on this core function. Making it bullet proof. What is needed to make it bulletproof?

  1. Clear specifications of the steps, their pre and post conditions.
  2. Automated tests of some sort, proving these steps work as expected.
  3. Strong logging and good observability of the software, but also of the history of what’s happened, including where a file originated from.