Vertica Backup And Restore

disclaimer: use at your own risk

If you pay close attentation to the HP Vertica documentation you will see this requirement for doing a restore:

The cluster to which you are restoring the backup has the same number of hosts as the one used to create the backup. The node names and the IP addresses must also be identical.

UPDATE: Since Vertica 9.2 most of this information is opsolete. See Restoring Objects to an Alternate Cluster.

Using the same node names is not complicated. Even if you run your Vertica Cluster on Amazon AWS. IP addresses are difficult though.

Here is a way to solve the problem. It works for me. You might want to test it for yourself before relying on it. Don’t come back to me and tell me you lost your data. It’s your data and in the end you are responsible for it. Enough said.

Let’s dive into the details:

In the first step we need a configuration file for vbr:

[Misc]
snapshotName = backup
dest_verticaBinDir = /opt/vertica/bin
restorePointLimit = 6
objectRestoreMode = createOrReplace
tempDir = /tmp/vbr
retryCount = 2
retryDelay = 1
passwordFile = /opt/vertica/config/passwd

[Database]
dbName = testdb
dbUser = dbadmin

[Transmission]
encrypt = False
checksum = False
port_rsync = 50000
serviceAccessUser = None
total_bwlimit_backup = 0
concurrency_backup = 1
total_bwlimit_restore = 0
concurrency_restore = 1
hardLinkLocal = True

[Mapping]
v_testdb_node0001 = []:/mnt/data/testdb/backup
v_testdb_node0002 = []:/mnt/data/testdb/backup
v_testdb_node0003 = []:/mnt/data/testdb/backup
v_testdb_node0004 = []:/mnt/data/testdb/backup
v_testdb_node0005 = []:/mnt/data/testdb/backup
v_testdb_node0006 = []:/mnt/data/testdb/backup

You might wonder why there are empty brackets. This means vbr is using rsync without tcp. Researching this approach I made several performance tests. If you do the normal backup and not the mentioned hardLinkLocal version you already gain a big performance boost by using local instead of tcp. But the biggest time savings I had with using the hardLinkLocal backup.

Now you can run your backup:

/opt/vertica/bin/vbr.py --config-file /opt/vertica/config/backup.ini --task backup 2>&1 >> /opt/vertica/log/backup.log

The bonus with the hardlink backup: You get faster restore times if you have to eg restore a table due to human error.

The downside: you still have the backup on the same disks as the original data. If the disks went down for whatever reason you are without a backup. Sure, in case of AWS you could just relay on direct EBS Volume snapshots. In worst case this would mean you take your snapshots and your new cluster greets you with a failing filesystem check due to inconsistent data.

So let’s extend the vbr command with some extra steps. Use some additional EBS Volumes for your backup.

With a wrapper script like this you can get the files to your backup volumes:

#!/bin/bash
BACKUPCONFIG=/opt/vertica/config/backup.ini
BACKUPLOG=/opt/vertica/log/backup.log

date  >> ${BACKUPLOG}
{ time /opt/vertica/bin/vbr.py --config-file ${BACKUPCONFIG} --task backup; } &>> ${BACKUPLOG}


LOCALIP=`ip -f inet -o addr show dev eth0 | cut -d' ' -f7 | cut -d/ -f1`
for HOST in `grep --color=never ^v_ ${BACKUPCONFIG} | awk '{print $1}'`; do
    IP=`grep --color=never ^${HOST} /opt/vertica/config/admintools.conf | grep -v ${LOCALIP} | sed -e "s/${HOST} = //" | awk -F, '{print $1}'`
    if [[ -n "${IP}" ]]; then
        echo ${IP} >> ${BACKUPLOG}
        ssh ${IP} "nohup nice -n 5 ionice -c 3 rsync -aH --exclude=lost+found --delete /mnt/data/backup /mnt/backup/ < /dev/null &" &
    fi
done
echo "starting local rsync." >> ${BACKUPLOG}
{ time nice -n 5 ionice -c 3 rsync -aH --exclude=lost+found --delete /mnt/data/backup /mnt/backup/; } &>> ${BACKUPLOG}

After this script is finished you can take EBS Volume snapshots of these backup volumes. In case of a recovery you can take the backup volumes as your new data volumes.

If you do a recovery test on a new cluster with different IPs you follow the documentation and create the database. After the shutdown you adjust the backupfiles with this script:

for file in `grep -lr '172.16.50.1[0-1]' /mnt/data/backup/testdb/Snapshots/*`; do
    sed -i \
        -e 's/172.16.50.11/172.16.55.61/g' \
        -e 's/172.16.50.12/172.16.55.62/g' \
        -e 's/172.16.50.13/172.16.55.63/g' \
        -e 's/172.16.50.14/172.16.55.64/g' \
        -e 's/172.16.50.15/172.16.55.65/g' \
        -e 's/172.16.50.16/172.16.55.66/g' \
        ${file}
done

if [ -e /opt/vertica/config/backup.ini ]; then
    sed -i \
        -e 's/172.16.50.11/172.16.55.61/g' \
        -e 's/172.16.50.12/172.16.55.62/g' \
        -e 's/172.16.50.13/172.16.55.63/g' \
        -e 's/172.16.50.14/172.16.55.64/g' \
        -e 's/172.16.50.15/172.16.55.65/g' \
        -e 's/172.16.50.16/172.16.55.66/g' \
        /opt/vertica/config/backup.ini
fi

now you can proceed and follow the documentation again and run something like:

vbr -t restore --archive 20170221_161405 --config-file /opt/vertica/config/backup.ini