During development, I have realised that object storage like AWS S3 is far cheaper (like a 20th of the cost) than normal file (or block) storage. So for any Cloud-hosted scenario, which I think most people would actually want, S3 support would be really valuable. The thing is, I may also want to host it on my Raspberry Pi and use the NAS for storage, so that’s very different.
Let me think for a moment. Let’s say I have a BasicStorageProvider interface. This could be implemented as either a FileStorageProvider or S3StorageProvider. It could contain the code to do integrity checks and interact with either the file system or S3. This leads to the question as to whether queues should be stored in the same file system, or if they should use local file storage. The problem then is that the storage is split between file system and local file storage. This is also relevant for performance. Let’s say I want to storage my files on my NAS. If it’s got low latency, is that going to be a problem for responsiveness?
Keeping all files in one place seems like a good idea, for things like migrating from one to another, and for simplicity of understanding. So that would imply I even keep the queue in S3.
My queue system works on presence of folders. Does S3 support a folder structure?
The server runs all the time, listening for HTTP messages, but also ensuring various tasks are run against files, like thumbnail creation.
Tasks can be initiated, or at least become eligible to run, in these circumstances:
Tasks carry a risk of causing conflict/locking issues with other tasks. For example, a task to create an index of e.g. dates photos were taken where the index is a common file between images/videos could cause locking issues on the index file, if triggered multiple times concurrently.
The most restrictive control to prevent this would be to process all tasks sequentially, or at least all incompatible tasks. The problem with this approach is we may end up slowing things down substantially by not using multiple processor cores.
So what alternatives are there? Another option is to handle tasks that are independent (e.g. creating a thumbnail for a single picture) differently from those that do share resources (e.g. updating a shared index). All options do give rise to another consideration - what if the server terminates before all requested tasks are complete? How does it recover? This calls for a queue of sorts.
There may be instances where we want to be able to kick off a task manually via the API.
A queue operation can:
Can both of these be handled through the same interface? Are there so many alternatives that the solution must be generic, or are there a limited number of two or three types?
Operation->Output->Post processing tasks which may depend on the output.
A queue’s ProcessFile method is not responsible for obtaining the file, or dealing with the result. All it does is accept an input io.Reader and write to an output io.Writer. If special post processing is needed, this would ideally live within the Queue Operation code, and possibly be able to accept data passed to it from the ProcessFile method, if required.
Design Principles - Queues
At the moment, I have built it-photos-server to a point where you can upload files, it creates thumbnails asyncronously, supports S3 or a local folder, and can also perform a 90 CW rotation on an image, and update the UI with the updated image. The user experience is poor. The time it takes for the user to see the rotated image is too long.
This calls for a rethink about where work is done. At the moment, it-photos-server is very chatty. For the rotation, the following happens:
Some facts on this: Small thumbnail (550px) ~ 13-45KB, so 200k of these would be 8GB. That’s not ridiculous to have as a local cache. Large thumbnail (1800px) ~ 285-590 KB
To speed this up, we need to change it, so that it only downloads the full image ONCE. That’s the first obvious speed improvement.
The other one I’ve been thinking about, is to store the thumbnails locally.
I’m dealing with a couple of challenges at the moment:
I think next priorities are:
Hosting. I want to put ITPS live and start hosting it. I’ve been building against AWS S3, but I think there are far better priced options. One that I want to consider is Backblaze B2.
Another thing I keep wondering about is how it might work if someone wants their data on a NAS, or their PC, but doesn’t want the hassle of having to set up port forwarding so their NAS/PC can be accessed externally.
This might mean that the hosted portion is a cache of sorts that has:
It obviously comes with some challenges around using a web UI to access the full versions of the images or videos.
Two months since last update. I’ve got Qmulum in a container and have started doing more testing on it on one of my domains hackersmacker.net on a DigitalOcean droplet. I’ve also been spending more time on the App.
Key things that I need to work on:
Note: DigitalOcean Load balancers cost $12/node/month
I had been spending an awful amount of time trying to get docker compose to work properly as a deployment mechanism to production. I ended up having to try to then use GitHub actions, which brought its own set of complexity. I really struggled with the lack of visibility of what was happening in the container, and had a good think about whether I’m introducing complexity prematurely.
I decided to go back to basics and deploy with a simple bash script. And it didn’t take too long and now I can build and deploy to production with a simple make build-linux && make deploy. I’m happy with the outcome. It did make me ponder simplicity and wonder if I should look to serving static files from within the single executable to further simplify deployment.
I’ve also come up with a name and registered qmulum.com and updated the program name in the server code. My focus now needs to return to achieving the core function of being able to back up photos and videos from a mobile phone.
On this point, I’ve written sync code and it’s not reliable. I still feel I do not have the visibility of what’s happening, and I don’t have automated tests to confirm the functionality - neither on the server, nor on the app.
I need to be razor focussed on this core function. Making it bullet proof. What is needed to make it bulletproof?