Qmulum Multipart Upload

Introduction

Qmulum is designed for photo and video backup. This means we will deal with large files, sometimes of potentially tens of gigabytes. Uploading files of this size as a single HTTP request is fraught with issues like timeouts, maximum request sizes, and the inability to recover a partly uploaded file.

How Qmulum Multipart upload works

Multipart upload uploads the individual parts of a file. It is up to the client how big each part is, but let’s work on 10Mb per part. Each part is individually integrity checked by the server when it’s uploaded. When the final part is uploaded, the server will kick off an asynchronous queue operation to join the parts together and then move the file to it’s final storage place, e.g. on S3.

So there are two stages:

  1. Upload to the ‘staging’ area.
  2. Combining the parts and moving them to the normal storage location, and clearing up the parts from the ‘staging’ area.

The client app may want to poll for the status of a multipart upload, because it may take some time to join together and move to its final location, depending on the size of the file. But it should generally be considered safe that when the final part has been received successfully, that all parts have been received and individually their integrity has been checked. It stands to reason that we do not need another integrity check of the entire file.

Multipart upload status

Status Meaning
IN PROGRESS Multipart upload has commenced but not all parts are received yet.
PROCESSING The upload parts are being joined together into one file, being integrity checked.
COMPLETE File processing is complete.
FAILED There was a failure processing the multipart upload
    request
    ---
    POST /files/{file_id}/multipart

    body:
    "upload_id": "<uuid>",
    "filename": "",
    "part_no": 1,
    "total_parts": 10,
    "part_bytes": 122, <-- number of bytes in this part
    "base64_bytes": "", <-- Base64 encoded bytes
    "file_xxhash": "" <-- xxhash for the whole file
    "part_xxhash": "" <-- xxHash of this part (pre base 64 encoding)

    server action:
    ---
    Non-final response: 
    - Store on server disk (multi upload storage provider, typically local disk)
    - Add a record to the database table.

    response
    ---
    HTTP STATUS CODE: 200 OK or 500 INTERNAL SERVER ERROR OR 400 BAD REQUEST
    body:
    {
        code: 400,
        message: "Out of sequence"
    }

    If xxhash does not match xxhash of decoded bytes:
    Response: 400
    {
        code: 400,
        message: "xxhash does not match"
    }

    If received out of sequence, then error with out of sequence
    Response: 400
    {
        code: 400,
        message: "Out of sequence"
    }


    GET /files/{file_id}/{upload_id}