S3 is really handy for server backups and at $0.023/GB/month it's incredibly
cost-effective.
However the default way most people use it is to simply spray their data
directly into an S3 bucket from the machine they're backing up. This works fine
right up until you get hacked by someone malicious who then has the ability to
trash all of your backups from the machine that has access to the bucket.
Enter lambda, Amazon's magic function-in-the-sky service that allows you to do
serverless computation.
This post describes how to secure your backups using a lambda function.
Scenario: a server that creates tarball backups overnight of around 5GB and
hourly SQL snapshots that are around ~250MB.
We will create two S3 buckets - backups1 and backups2.
The server will have write access to backups1 but no access to backups2.
The process the backups will follow is:
1. The server will execute its backup and write a file to backups1 called
backup.tar.gz.gpg. This might be done with a cron job along the lines of:
/bin/tar -cP /data | /bin/gzip | /usr/bin/gpg --no-use-agent --no-tty
--passphrase-file /root/key --cipher-algo AES256 -c | /usr/local/bin/s3cmd put -
s3://backups1/backup.tar.gz.gpg
2. Any writes to the backups1 bucket will trigger this lambda function:
https://m4.rkw.io/lambda.py
3. The lambda function checks the name of the uploaded object. If it's
backup.tar.gz.gpg it will check for a file in the backups2 bucket called
{YYYY-mm-dd}.tar.gz.gpg. If the file doesn't exist then it will move
backup.tar.gz.gpg from backups1 to backups2 using the timestamped filename.
If it already exists it will do nothing - this prevents backups from being
overwritten once created.
4. The lambda function also handles hourly sql snapshots - if the uploaded file
is called sql.gz.gpg it will look for an object called
{YYYY-mm-dd}.{HH}.sql.gz.gpg. Again if the file doesn't exist it will move the
uploaded file to backups2 using the timestamped name.
Because the filenames are determined by the lambda function which cannot be
changed by the server, an attacker breaking into the server has no way to
destroy any previously created backups. This is a lot more secure than simply
writing the data straight into S3 from a server that has full access to the
target bucket.
Note that because the backup archives are written based on timestamps you must
set the timezone in the lambda function to the timezone of your server to avoid
issues.
You will probably also want to create a lifecycle policy for your backups2
bucket to delete the backups after a certain time period or archive them to
glacier long term storage.