Using Amazon Glacier with the AWS CLI for Media Archival
If you're like me, you might have many gigabytes of media files that you've accumulated over the
years: video, audio, high-resolution photos and the like. In the past, I've used everything from
stacks of floppy disks to Iomega Ditto (tape) and Zip drives to CDs and DVDs, and I still have a few
large-capacity hard drives about.
Each of these solutions worked OK at the time, but today where cheap cloud storage is running
rampant across the digisphere, there are new alternatives to these old methods, such as Amazon
Glacier.
What is Glacier?
Amazon Glacier is a storage service that works with Amazon S3 to archive files at very low cost (currently, a penny ($0.01) per Gigabyte. Yes—that
means that you can currently store 100 Gigabytes of data for one dollar a month). Since its
stored on the cloud in Amazon's secure facilities, you don't need to worry about fire destroying
your data, or losing the media on which it's stored, or bit-rot, or many other concerns that you
must deal with if you're storing media yourself.
The catch is this—there is an additional transfer cost if you want to transfer large amounts of
data out of Glacier, and moving data from Glacier to S3 takes some time (3-5 hours, according to the
FAQ). However, for data archival, you are not expecting to
access it very often (only if your trusty local backup dies, for example). This is exactly the thing
that Glacier was designed for.
Meet the AWS CLI
I've tried, but I just can't kick the command-line habit. For me, working on the command-line is
natural, easy and powerful. Unlike the capriciousness of Graphical User Interface (GUI)
environments, the AWS CLI can be used similarly on all platforms that it supports, and is a great
way to transfer files to and from Amazon Glacier. If you intend to follow my example and use the AWS
CLI, you should first download and install it on your system. Instructions to do so are here:
Now that we have that out of the way, I'll assume that everything went well and that you now have
the AWS CLI installed...
Setting up a Glacier Archival Bucket in S3
The easiest way to archive items to Glacier is through S3. Basically, you set up an S3 bucket that
has a lifecycle rule that
moves your files to Glacier after a predetermined time (such as one day).
Create an archival bucket
To begin, create a bucket that you'll use for your data. You can have many such buckets, if you
like. For this example, I'll store some Free Lossless Audio Codec (FLAC) files, which can take up a
bit of space... So, I'll create a bucket with the AWS CLI to store them in:
aws s3 mb s3://my-flac-audio
Set a bucket lifecycle rule
Next, I'm going to set a lifecycle rule for the bucket that will transfer items to Glacier after a
day. You could do this with the AWS Management Console if you want, but I'm going to do this using
the command-line. Here's the bucket lifecycle rule, in JavaScript Object Notation (JSON):
{
"Rules": [
{
"ID": "Rule for the Entire Bucket",
"Status": "Enabled",
"Prefix": null,
"Transition": {
"Days": 1,
"StorageClass": "GLACIER"
}
}
]
}
Rather than trying to enter this information on the command-line, it's a good idea to save it in a
file, which you can also validate for correctness before passing it to an AWS CLI command. So I save
the above JSON block in a file called glacier-rule.json
.
Next, we'll apply the rule to the bucket we just created using the following command:
aws s3api put-bucket-lifecycle --bucket my-flac-audio --lifecycle-configuration file://glacier-rule.json
Now, the bucket my-flac-audio
will automatically transfer any new files placed in it to Glacier
after one day.
Archiving files
To use the bucket to archive files, all I need to do is transfer files from my local system into the
bucket. There are a number of ways to do this, but I like to use the aws s3 sync command, like so:
aws s3 sync --delete . s3://my-flac-audio
The aws s3 sync
command will automatically avoid transferring files that you've already
uploaded, and by using the --delete
switch, you can also make sure that any files that no longer
exist in your local version of the archive (for example, if you renamed a file) will be deleted in
S3, as well.
By setting up a Glacier archival bucket like this, you can run the aws s3 sync
command any time
you want to update your archive in the cloud. Easy!