Amanda backup

Many moons ago I had to convince my former employer that we should have backup. There are only 2 kinds of companies: those without backup and those loosing data. wait. those are the same. Aside from lack of backup we were in the need of more storage. Finally I got backup and learned a bit about Fibreoptic, SAN storage and tapelibraries. Of course a IBM TS3200 with 2 LTO5 drives was not cheap. We had 50 tapes for the unit. For sure there was no budget left for a commercial backup software. But we were the department running almost completely open source. After some research we picked AMANDA the Advanced Maryland Automatic Network Disk Archiver. I wrote a small script to read the LTO barcode tape labels to add them to the AMANDA tape inventory and we were good to go.

Around the same time i bought a HP StorageWorks DAT160 for my homeserver.

Back to the future. Recently we moved to a new flat. Since then my corner in the living room turned into a dedicated office room. Having 30 DAT160 tapes i decided to reconfigure AMANDA. You never now what challenge your next job will bring you. Knowing how to configure a reliable backupsolution doesn’t hurt. 160 GB doesn’t sound much today. But if you focus on the really important stuff - configuration files, git repositories, text files - it is ok. For my webserver i run rsnapshot. The chance that both my webserver and my homeserver die together is very low.

At home i don’t have that many changes. Being lazy i decided to run 1 backup per week.

This is my crontab:

0 2 * * 2 /usr/sbin/amcheck -m -M my_email DailySet1
0 12 * * 3 /usr/sbin/amdump DailySet1

Every tuesday we check for the presence of a usable tape and every wednesday we run the backup. If we need more than one tape we have plenty of time to finish the backup.

AMANDA turns 30 next year. Which means a lot of thoughts went into the software. As it’s open source many eyes have scanned the code over the years. I wouldn’t expect too many bugs left and not much left on the wishlist either. Deduplication is probably the missing tip of the ice cream. Of course not everybody wants to handle tapes anymore. Even though LTO-8 can store 12TB raw or 30TB compressed data. But Amanda supports virtual tapes on harddrives and AWS S3 as well.

What sets Amanda appart from the commercial competition is the way she handles your data. Commercial products always use a proprietary format to write the data. Without the software you cannot read your data. Amanda is just a wrapper around open source tools. if you write unencrypted, uncompressed data to your tape all you need to access the data is tar.

For the most recent version of Amanda - 3.5.1 - you should visit AMANDA.org. The release date of December 1 2017 sounds dated but keep in mind that the software is around for a long time and doesn’t require that much attention anymore. The community is quite active though.

If you want to use encryption - highly recommended if your data leaves your datacenter - there is a pull request for the automatic creation of the encryption keys during package installation. My guess is that this feature is not widely used. Otherwise I could not explain this bug.

The software contains a server and a client part. The client must be configured on the system you want to backup. In the world of localhost or a direct vpn link you might consider using xinetd:

# default: on
#
# description: Amanda services for Amanda server and client.
#

service amanda
{
        disable         = no
        flags           = IPv4
        socket_type     = stream
        protocol        = tcp
        wait            = no
        user            = amandabackup
        group           = disk
        groups          = yes
        server          = /usr/libexec/amanda/amandad
        server_args     = -auth=bsdtcp amdump amindexd amidxtaped senddiscover
}

or on the client:

service amanda
{
       only_from       = your.amanda.server.ip
       socket_type     = stream
       protocol        = tcp
       wait            = no
       user            = amandabackup
       group           = disk
       groups          = yes
       server          = /usr/sbin/amandad
       server_args     = -auth=bsdtcp amdump
       disable         = no
}

additionaly in /var/lib/amanda/ or where ever your amanda user has it’s home directory you need a .amandahosts file:

your.amanda.server.ip.or.dns amandabackup amdump
your.amanda.server.ip.or.dns amindexd amidxtaped

In case you wonder why i add links to “well known” software: in recent years i notice a change in our industry. Just as your typical car mechanic changed in the last 20 years from somebody with a deep understanding of mechanics and the basics of a combustion engine requiring gas, air and a spark at the right time to somebody only able to swap parts in the hope of replacing the broken part our IT industry is changing as well. Some coworkers are able to operate on AWS and run EKS but have never heard about dd or its noerror flag and have trouble recovering data from a half broken metal driven harddisk. But i don’t blame anybody for that. The amount of information is just overwhelming. I am a generalist but there were always colleagues focusing on a specific topic. Think about those caring for the backup for a company with thousands of coworkers. Or those guys heavily into Oracle databases. They have always relied on the help of there coworkers for topics outside of their universe.

In /etc/amanda/ you can have mulitple configurations. Each one in it’s own folder. The folders contain 3 files:

amanda.conf
amanda-client.conf
disklist

There are plenty of other files but you nurse them with your editor.

Here we have basic amanda.conf:

org        "DailySet1"
mailto     "root"
dumpuser   "amandabackup"
inparallel 1
dumporder  "sssS"
taperalgo  first
displayunit "m"
netusage 8000 Kbps
dumpcycle 4 weeks
runspercycle 4
tapecycle 30 tapes
bumpsize 20 Mb
bumppercent 20
bumpdays 1
bumpmult 4
etimeout 28800
dtimeout 1800
ctimeout 30
device_output_buffer_size 1280k
flush-threshold-dumped 100
flush-threshold-scheduled 100
taperflush 0
autoflush yes
runtapes 10
tapedev "tape:/dev/tape/by-id/usb-HP_DAT160_4855450922344348-0:0-nst"
maxdumpsize -1
tapetype hp_dat160
labelstr "^DailySet1-[0-9][0-9]*$"
amrecover_changer "changer"

holdingdisk hd 1{
  comment "main holding disk"
  directory "/opt/amanda"
  use -100 Mb
  chunksize 1Gb
}

infofile "/etc/amanda/DailySet1/curinfo"
logdir   "/etc/amanda/DailySet1"
indexdir "/etc/amanda/DailySet1/index"

define interface local {
    comment "a local disk"
    use 8000 kbps
}

define tapetype hp_dat160 {
    comment "Created by amtapetype; compression enabled"
    length 66420608 kbytes
    filemark 617 kbytes
    speed 5346 kps
    blocksize 32 kbytes
    part-size 8 gbytes
    part-cache-type memory
    part-cache-max-size 512 mbytes
}

define dumptype normal {
   comment "gnutar backup"
   program "GNUTAR"
   auth "bsdtcp"
   index yes
   holdingdisk yes # on by default
   compress client best
   priority medium
   exclude list ".amanda.excludes"
}

define dumptype all {
   normal
   exclude ""
}

amanda-client.conf:

conf "DailySet1"
index_server "amandahost"
tape_server "amandahost"
tapedev "tape:/dev/tape/by-id/usb-HP_DAT160_4855450922344348-0:0-nst"
auth "bsdtcp"
unreserved-tcp-port 1024,65535

disklist:

server1.foo.com opt /opt/ normal
server1.foo.com system / normal
server2.foo.com etc /etc/ normal

usually you don’t want to backup everything. think of /sys/ or /tmp/. then you need an exclude file on the client.

here is one example for system:

./proc/*
./sys/*
./tmp/*

if you want to backup everything don’t make the mistake of using an empty exclude file. You will end up with a full backup in every run. instead use a different dumptype.

each run will send you a report via email. most commands must be executed as the amanda user. amstatus DailySet1 will give you the current status while backup is running. amreport DailySet1 will give you the report of the last backup:

Hostname: server1.foo.com
Org     : DailySet1
Config  : DailySet1
Date    : February 5, 2020

These dumps were to tape DailySet1-21.
The next 10 tapes Amanda expects to use are: 9 new tapes, DailySet1-1.
The next 9 tape already labelled are: DailySet1-30,DailySet1-29,DailySet1-28,DailySet1-27,DailySet1-26,DailySet1-25,DailySet1-24,DailySet1-23,DailySet1-22.

STATISTICS:
                          Total       Full      Incr.   Level:#
                        --------   --------   --------  --------
Estimate Time (hrs:min)     0:00
Run Time (hrs:min)          0:12
Dump Time (hrs:min)         0:04       0:04       0:00
Output Size (meg)          152.4      140.7       11.7
Original Size (meg)        187.8      171.8       16.0
Avg Compressed Size (%)     81.2       81.9       73.3
DLEs Dumped                    4          2          2  1:2
Avg Dump Rate (k/s)        581.7      554.6     1405.1

Tape Time (hrs:min)         0:06       0:03       0:04
Tape Size (meg)            152.4      140.7       11.7
Tape Used (%)                0.2        0.2        0.0
DLEs Taped                     4          2          2  1:2
Parts Taped                    4          2          2  1:2
Avg Tp Write Rate (k/s)    412.8      867.9       56.5


USAGE BY TAPE:
  Label                 Time         Size      %  DLEs Parts
  DailySet1-21          0:04         152M    0.2     4     4


NOTES:
  planner: Last full dump of server1.foo.com:opt on tape DailySet1-18 overwritten in 2 runs.
  planner: Last full dump of server1.foo.com:system on tape DailySet1-20 overwritten in 2 runs.
  planner: Last full dump of server2.foo.com:git on tape DailySet1-19 overwritten in 2 runs.
  planner: Last full dump of server2.foo.com:etc on tape DailySet1-19 overwritten in 2 runs.
  planner: Full dump of server2.foo.com:git promoted from 14 days ahead.
  planner: Full dump of server2.foo.com:etc promoted from 14 days ahead.
  taper: Slot 1 with label DailySet1-21 is usable
  taper: tape DailySet1-21 kb 156055 fm 4 [OK]


DUMP SUMMARY:
                                                                            DUMPER STATS   TAPER STATS
HOSTNAME                              DISK        L ORIG-MB  OUT-MB  COMP%  MMM:SS   KB/s MMM:SS   KB/s
--------------------------------------------------- ---------------------- -------------- -------------
server2.foo.com                       etc         0      74      46   61.9    1:29  525.5   1:51  421.9
server2.foo.com                       git         0      98      95   96.9    2:51  569.8   0:55 1768.0
server1.foo.com                       opt         1      12      11   93.5    0:04 3123.0   1:08  164.5
server2.foo.com                       system      1       4       1   18.3    0:05  161.7   2:24    5.6

(brought to you by Amanda version 3.5.1)

Overall I am happy with Amanda. Some of the documetation out there might be for older versions. Of course there were a few changes over the years regarding configuration files. OReilly’s Backup & Recovery spends a whole chapter on Amanda. Having used some other, commercial backup products in the past I wouldn’t consider the configuration as more complicated as for other products. Sometimes maybe even simpler. Aside from sending emails amreport can be used to write json as well. You might want to use that option to get the information into a central logging system like Splunk