Using lambda to make immutable S3 backups

2 Jan 2017 17:55 | AWS | security | linux

S3 is really handy for server backups and at $0.023/GB/month it’s incredibly cost-effective.

However the default way most people use it is to simply spray their data directly into an S3 bucket from the machine they’re backing up. This works fine right up until you get hacked by someone malicious who then has the ability to trash all of your backups from the machine that has access to the bucket.

Enter lambda, Amazon’s magic function-in-the-sky service that allows you to do serverless computation.

This post describes how to secure your backups using a lambda function.

Scenario: a server that creates tarball backups overnight of around 5GB and hourly SQL snapshots that are around ~250MB.

We will create two S3 buckets - backups1 and backups2.

The server will have write access to backups1 but no access to backups2.

The process the backups will follow is:

  1. The server will execute its backup and write a file to backups1 called backup.tar.gz.gpg. This might be done with a cron job along the lines of:

/bin/tar -cP /data | /bin/gzip | /usr/bin/gpg –no-use-agent –no-tty –passphrase-file /root/key –cipher-algo AES256 -c | /usr/local/bin/s3cmd put - s3://backups1/backup.tar.gz.gpg

  1. Any writes to the backups1 bucket will trigger this lambda function:

  1. The lambda function checks the name of the uploaded object. If it’s backup.tar.gz.gpg it will check for a file in the backups2 bucket called {YYYY-mm-dd}.tar.gz.gpg. If the file doesn’t exist then it will move backup.tar.gz.gpg from backups1 to backups2 using the timestamped filename. If it already exists it will do nothing - this prevents backups from being overwritten once created.

  2. The lambda function also handles hourly sql snapshots - if the uploaded file is called sql.gz.gpg it will look for an object called {YYYY-mm-dd}.{HH}.sql.gz.gpg. Again if the file doesn’t exist it will move the uploaded file to backups2 using the timestamped name.

Because the filenames are determined by the lambda function which cannot be changed by the server, an attacker breaking into the server has no way to destroy any previously created backups. This is a lot more secure than simply writing the data straight into S3 from a server that has full access to the target bucket.

Note that because the backup archives are written based on timestamps you must set the timezone in the lambda function to the timezone of your server to avoid issues.

You will probably also want to create a lifecycle policy for your backups2 bucket to delete the backups after a certain time period or archive them to glacier long term storage.