Amanda backup and AWS S3
After I had my DDS Streamer at home running for a while again with AMANDA, the Advanced Maryland Automatic Network Disk Archiver, I was considering using it at work as well. Always looking at cost optimization our Vertica clusters looked like good candidates. As most DBAs I fear loosing data. Following the safe route I use vbr hardcopy backup followed by an rsync to a dedicated backup EBS volume. From those we take snapshots via Amazon EBS Snapshot Lifecycle.
It works and as nothing is written to the drives during snapshot time you can safely assume that you get healthy snapshots. But were is light there is shadow:
- You get a limited number of snapshots. Nothing longtime.
- your snapshots are in the same region.
- additional costs for the ebs backup volumes.
With a multi-node cluster the ebs volumes count up and get quite expensive. Sometimes you figure problems in your database weeks later. With no longterm backup getting the data back can be impossible. With S3 it is easy to store your backup in a different region. A big safety net even though i haven’t experienced regional outages yet. The 2012 Christmas Eve Outage is probably one of the most famous ones. December 2016 I’ve seen slow EBS Volumes for myself.
Following my intro there is not that much additional work for using Amanzon S3.
If you run a classical setup of one Amanda server and multiple clients you might run into limitations. Lets say you want to backup something like a Vertica cluster. If you have 10 nodes each with 1TB of data to backup all of this has to travel through one ec2 instance on its way to S3. This can be easily 10 hours of backup time. Consider the cluster gets bigger over time and your backup time grows even more. Sure, a full backup is not typical and most of the backup might finish in 2 hours or even less. Now comes the BUT: In case of a complete cluster failure you need the full time for recoverying the cluster. I don’t know about you but for me that time is unacceptable. If each node runs it’s own Amanda server the backup might be finished in an hour or two. Far better as this approach scales as well.
If server and client reside on the same host we need an easy way of recovering the server information. A separate Amanda configuration for the Amanda server backup itself and one for the rest of the data is the solution.
After starting an empty ec2 instance all it’s need is something like this Except script:
#!/bin/expect -f if { $argc == 0 } { puts "\nusage: $argv0 hostname\n\n" } else { set cmd [list grep tpchanger /etc/amanda/amanda-server/amanda.conf | sed -e "s/tpchanger \"chg-multi:s3:\\(.*\\)\\/\[^\\/\]*\\/amanda-server\\/slot.*/\\1/"] set S3 [ exec {*}$cmd ] set cmd [list grep S3_ACCESS_KEY /etc/amanda/amanda-server/amanda.conf | awk "\{print \$3\}" | sed -e "s/\"//g" ] set ACCESS [ exec {*}$cmd ] set cmd [list grep S3_SECRET_KEY /etc/amanda/amanda-server/amanda.conf | awk "\{print \$3\}" | sed -e "s/\"//g" ] set SECRET [ exec {*}$cmd ] set env(AWS_ACCESS_KEY_ID) $ACCESS set env(AWS_SECRET_ACCESS_KEY) $SECRET set cmd [list aws s3 ls s3://$S3/[lindex $argv 0]/amanda-server/ ] set cmd [ list aws s3 ls s3://$S3/[lindex $argv 0]/amanda-server/ ] set AWS_LS [exec {*}$cmd ] send_user "$AWS_LS\n\n" send_user "enter slot-????-mp.data:\n" gets stdin tarfile set cmd [ list aws s3 cp s3://$S3/[lindex $argv 0]/amanda-server/$tarfile /tmp/ ] exec {*}$cmd set cmd [ list aws s3 cp s3://$S3/[lindex $argv 0]/amanda-server/$tarfile /tmp/ ] exec {*}$cmd set cmd [ list tar -xf /tmp/$tarfile --directory=/etc --exclude='amanda-client.conf' ./amanda ] exec {*}$cmd set cmd [ list tar -xf /tmp/$tarfile --directory=/etc ./amanda-security.conf ] exec {*}$cmd }
Now the instance knows everything about the backups it has taken in the past and you can continue with the regular amanda commands. One of the benfits of not using a commercial backup software with proprietary backup files.
But lets have a closer look into the configuration files!
If you manually create the s3 bucket there is no need to give the amanda amanzon user privileges outside of that bucket.
org "amanda-server" # mailto "root" dumpuser "amandabackup" inparallel 1 dumporder "sssS" taperalgo first displayunit "g" netusage 8000 Kbps dumpcycle 1 weeks runspercycle 7 tapecycle 10 tapes etimeout 300 dtimeout 1800 ctimeout 30 bumpsize 20 Mb bumppercent 20 bumpdays 1 bumpmult 4 device_output_buffer_size 1280k autoflush yes runtapes 1 tapedev "my_s3" tapetype S3 maxdumpsize -1 labelstr "^amanda-server-[0-9][0-9]*$" autolabel "amanda-server-%%%%" empty amrecover_changer "changer" define changer my_S3 { tpchanger "chg-multi:s3:your-backup-bucket/path/slot-{01..10}" # number of tapes in your "tapecycle" device-property "S3_BUCKET_LOCATION" "eu-west-1" device-property "S3_ACCESS_KEY" "foo" device-property "S3_SECRET_KEY" "bar" device-property "NB_THREADDS_BACKUP" "6" device-property "NB_THREADS_RECOVERY" "10" device-property "S3_MULTI_PART_UPLOAD" "YES" device-property "S3_SSL" "YES" changerfile "s3-statefile" } holdingdisk hd 1{ comment "main holding disk" directory "/opt/amanda" use -100 Mb chunksize 1Gb } infofile "/etc/amanda/amanda-server/curinfo" logdir "/etc/amanda/amanda-server" indexdir "/etc/amanda/amanda-server/index" define interface local { comment "a local disk" use 8000 kbps } define application-tool app_amgtar { comment "amgtar" plugin "amgtar" property "XATTRS" "YES" } define dumptype normal { global program "APPLICATION" application "app_amgtar" encrypt none compress client best index yes exclude list ".amanda.excludes" } define dumptype normal-archive { normal record no dumpcycle 0 } define dumptype normal-uncompressed { normal compress none } define dumptype all { normal exclude "" } define dumptype all-archive { normal-archive comment "backup all. no excludes. no incremental." exclude "" } define dumptype all-archive-uncompressed { all-archive compress none } define dumptype all-uncompressed { normal-uncompressed exclude "" } define dumptype normal-uncompressed-archive { normal-uncompressed exclude "" record no } define dumptype normal-server-encrypt { normal comment "dump with server symmetric encryption" encrypt server server_encrypt "/sbin/amcrypt" server_decrypt_option "-d" } define dumptype normal-server-encrypt-archive { normal-server-encrypt exclude "" record no } define tapetype S3 { comment "S3 pseudo-tape" length 500 gigabytes part_size 50 gigabytes part_cache_type none blocksize 10 megabytes }
the disklist for the server:
localhost /etc/ all-archive
As mentioned in my previous post the postinstall scripts contain a bug for generating the encryption keys.
This will genrate a proper encryption key:
# get_random_lines 65 lines=65 pad_lines=`expr $lines + 1` block_size=`expr $pad_lines \* 60` dd bs=${block_size} count=1 if=/dev/urandom 2>/dev/null | \ base64 | \ head -$pad_lines | \ tail -$lines >~amandabackup/.gnupg/am_key_new gpg2 --homedir ~amandabackup/.gnupg \ --no-permission-warning \ --armor \ --batch \ --symmetric \ --passphrase-file ~amandabackup/.am_passphrase \ --output ~amandabackup/.gnupg/am_key_new.gpg \ ~amandabackup/.gnupg/am_key_new
Keep in mind that you need of copy of that file to read your backupfiles!
a disklist using the encryption might look like:
localhost /opt/vertica normal-server-encrypt localhost /usr/local/data all-uncompressed localhost /home normal-server-encrypt
Amanda cycles through it’s tape which means an automated transfer to glacier via s3 bucket configuration is not an option. Instead
device_property “TRANSITION-TO-GLACIER” “1”
could be used together with a script after each run:
LABELSTR=`amgetconf server-config labelstr | sed -e 's/[\^\$"]//g'` TAPELIST=`amstatus server-config | egrep $LABELSTR | awk '{print $9 }'` for TAPE in ${TAPELIST}; do amadmin server-config no-reuse $TAPE done