With the power of duplicity and chroot we can make a Amazon EC2 image that is as good as a harware node, i.e. with persistent storage. Let me explain how to do it your self as well. However I’ll be leaving out the minute details.
Step 1: Start an instance of a public AMI
I would recommend ami-76cb2e1f because i you are able to use the same image for x-large and large instance powering up. Also it has the ec2 ami tools installed and patched. Login to the instance as root using the certificate you provided when starting the instance. Also do not forget to give the following as User Data.
chroot_bucket=[your_bucket_name]
Step 2: Download and install duplicity and boto
You need to install duplicity 0.4.9 or later and boto 1.0 or later.
Step 3: Create a PGP key
Run the following and follow the instructions that will appear.
# gpg --gen-key
Please note down the key id because we need it later on. It should look something like 860BCFF6.
gpg: key 860BCFF6 marked as ultimately trusted
Step 4: Install libpam-chroot
You have to install libpam-chroot for it to be possible to push the user inside the chroot when the user logs in via ssh.
Step 5: Create the chroot
Create the chroot and install all the applications you need inside the chroot. Read about how to create a chroot in a debian system here. Create your users inside the chroot. It is important that you understand how chroot works as well.
Step 6: Push the users to chroot
You need to change /etc/security/chroot.cnf and add a line similar to bellow.
[username] /mnt/chroot
Step 6: Download the scripts
You need to download the scripts archive that contains the scripts necessary to do all the magic to ensure that data actually persist. Download it from http://www.mohanjith.net/downloads/amazon/ec2/ec2-chroot-persistence-1.0.tar.gz
Step 7: Extract and edit the scripts
Extract the scripts from out side the chroot, preferably in /.
# cd /
# tar -xzf [path_to_archive]/ec2-chroot-persistence-1.0.tar.gz
You need to edit /etc/init.d/ec2 and /etc/ec2/cron and change the lines that look like bellow.
export AWS_ACCESS_KEY_ID=[your_aws_access_key_id]
export AWS_SECRET_ACCESS_KEY=[your_aws_secret_access_key]
export PASSPHRASE=[your_gpg_passphrase]
export gpg_key=[your_gpg_key_id]
Step 8: Set up the scripts
You will also have to setup a cron job outside the chroot to backup the data to S3. The script to invoke is /etc/ec2/cron. I would recommend hourly backups, but anything more frequently will be bad because the time it takes to backup will increase drastically.
You will also have to make sure ec2 service (/etc/init.d/ec2) is run on power on, power off and restart. To do that you will have to create sym links to /etc/init.d/ec2 from /etc/rc0.d/K10ec2, /etc/rc3.d/S90ec2, /etc/rc4.d/S90ec2, and /etc/rc6.d/K10ec2.
Step 9: Where to persist data.
Run the bellow as root outside the chroot.
curl http://169.254.169.254/2007-08-29/user-data > /tmp/my-user-data
Step 8: Remaster the AMI
Step 10: Create your machine image
Read more about creating an machine image at Amazon EC2 Getting started guide here.
Step 11: Back up your chroot
Run /etc/ec2/cron to back up the chroot.
Step 12: Power off and power on
Power off the instance you are running with the public image and when it has properly shutdown, start the image we just created in step 10 with the chroot_bucket with the same bucket you provided when you power up the public image.
All the data in /mnt/chroot is backed up to S3 by /etc/ec2/cron and when the instance is started after a shutdown /mnt/chroot is restored from S3. The script is configured to backup on power down but it is always recommended to run /etc/ec2/cron just before a power down.
You might also want to set up dynamic DNS for your instance such that you don’t have to always try hard to remember the ugly public DNS provided by Amazon. You can use ddclient
to update the dynamic DNS service with your new IP. You can install ddclient inside the chroot.
This method was tested for more than 1 month and everything worked smoothly for me, but depending on your configuration your experience may defer. It is always good to test before you use in production environment.
You could store your database files on PersistentFS and get continuous synchronization to S3. See this post on the AWS Forums.
Allen, just backing up the database files are not sufficient in some cases, you might have to store other files e.g. /etc/shadow. If I’m not mistaken PersistentFS is not free software, where is duplicity is free software licensed under GNU GPL.
You can store anything you want on PersistentFS, meaning of course, anything important that you want to keep. The easiest way to do that is to symlink directories you want to keep so they are stored under the PersistentFS mount point.
PersistentFS synchronizes data to S3 continuously, so it offers much better protection than hourly backups. It can also be remounted instantly (no need to reconstruct or restore your files).