A blog probably of interest only to nerds by John Morton.

26Oct2013

Auto­mate dai­ly data­base back­ups with crontab and SFTP and be the mas­ter your domain.

I recent­ly worked on a site which saved data in it’s local data­base which need­ed to be col­lect­ed and sent each day to a remote serv­er via SFTP for analy­sis. Need­less to say, I didn’t want to man­u­al­ly do this process every day, but I had nev­er ful­ly explored cron jobs, so I had to teach myself about cron jobs while I fig­ured out the prob­lem at hand. Here’s how I end­ed up automat­ing the job.

Your friend, the com­mand line

I built the Ubun­tu serv­er myself and I had full access to serv­er via the com­mand line. The root” user was locked down though for safety’s sake. I logged in under a 2nd user I cre­at­ed. To make it as clear as pos­si­ble, I’ll call that 2nd user myuser­name” in this post. myuser­name” can use sudo when need­ed to exe­cute root lev­el commands.

When I log into the serv­er as myuser­name”, I land in the direc­to­ry “/​home/​myusername”. I didn’t have a bin” direc­to­ry yet, so I made one to store the scripts I would write that I would even­tu­al­ly run in my cron job.

After mak­ing the bin” direc­to­ry, I went into it and made a new file called dai­ly­back­up” by using my default text editor.

cd /bin
sudo nano dailybackup

I had to use sudo’ to make this file because of the per­mis­sions on the bin’ fold­er were for root’ instead of myuser­name’. (This was because of an Ubun­tu require­ment that the base direc­to­ry be root’ I don’t ful­ly under­stand. All I know is that if I made the home direc­to­ry belong to myuser­name’, I couldn’t log in.)

I changed own­er­ship of the dai­ly­back­up’ file using chown to allow myuser­name’ full own­er­ship of that file, both the own­er and group.

chown myusername:myusername dailybackup

Then I need­ed to make the dai­ly­back­up file executable.

sudo chmod 755 dailybackup

The file was now an exe­cutable. It was emp­ty though.

I could run this script as I worked on it by typ­ing the fol­low­ing into my com­mand line:

/home/myusername/bin/dailybackup

I did this fre­quent­ly along the way to test as I went along.

Basic script starter needs

The first thing in the file defined which shell to use to inter­rupt the script. bash’ is the shell I am most famil­iar with so I went with that.

#!/bin/bash

Next I need­ed to define the PATH for the com­mands in my script. If I didn’t do this, I’d get error telling me that what seem to be the most basic pieces of func­tion­al­i­ty were miss­ing. Defin­ing the PATH vari­able solves this problem.

export PATH=/bin:/usr/bin

Rotate back­ups

Next I need­ed to start get­ting the dai­ly file export from my data­base, a MySQL data­base in my case. There are many resources, like this where I picked dif­fer­ent pieces of this up.

# modify the following to suit your environment
export DB_BACKUP="/home/myusername/backup"
export DB_USER="dbusername"
export DB_PASSWD="dbpassword"
export THEDATE='date +"%A, %B %-d, %Y"'

# title and version
echo ""
echo "Daily Tabbed Report: " $THEDATE
echo "----------------------"
echo "* Rotating backups…"
rm -rf $DB_BACKUP/04
mv $DB_BACKUP/03 $DB_BACKUP/04
mv $DB_BACKUP/02 $DB_BACKUP/03
mv $DB_BACKUP/01 $DB_BACKUP/02
mkdir $DB_BACKUP/01

The script starts by defin­ing some vari­ables. The first is the loca­tion where I want­ed to keep the back­up files, DB_BACKUP. The next 2 vari­ables are the user­name and pass­word for the MySQL database.

The export’ part of these com­mands should make the vari­able avail­able to sub-shells, from what I under­stand. I think this means if I sep­a­rat­ed out the var­i­ous parts of this script into their own files, these export­ed vari­ables would be able to be used in those scripts.

The vari­able called THE­DATE” holds the cur­rent date. The echo com­mands you see will print out help­ful infor­ma­tion into your com­mand line if you run it from there or in an email if you have your cron job to noti­fy you by email when it’s run.

The rest of the com­mands above rotate the back­up files from the pre­vi­ous days. (The first time you run this, there aren’t any data files to rotate, but I made place­hold­er direc­to­ries in my back­up fold­er to get things going.)

There are 4 dai­ly back ups at any one time. To rotate the back­up file, the 04’ direc­to­ry is first erased, and the remain­ing 3 direc­to­ries are renamed (or, actu­al­ly moved”) down the line. Final­ly, a new 01’ direc­to­ry is cre­at­ed to hold the new data export that we cre­ate in the next step.

Export the dai­ly data

This sec­tion starts by echo­ing out what I’m try­ing to accom­plish in each step. This helps out in debug­ging. Believe me, I didn’t do this cor­rect­ly on the first attempt.

echo "* Creating new backup..."
mysql --password=$DB_PASSWD -h 'localhost' -e "SELECT * FROM databasenaem.tablename WHERE datecreated >= DATE_SUB(NOW(), INTERVAL 1 DAY)" > $DB_BACKUP/01/exporteddata-`date +%Y-%m-%d`.txt
echo "* File: " $DB_BACKUP/01/exporteddata-`date +%Y-%m-%d`.txt
echo "----------------------"

Since I run this script every 24 hours, I want each export to include the pre­vi­ous 24 hours data. In the data­base, I have a field in the table called date­cre­at­ed” which I pop­u­late with the date auto­mat­i­cal­ly the moment it’s cre­at­ed. In the mysql’ line above, I query for data where the date­cre­at­ed” val­ue is greater than or equal to now” minus 1 day. The “>” in the mysql’ line exports what­ev­er was found in a tab delim­i­tat­ed file name export­ed data” with a date attached to it. This tab-delim­i­tat­ed file is read­able by Excel and Numbers.

Trans­fer the new­ly export­ed file via SFTP

With the script so far, I was mak­ing the file that I want­ed, but it was only on my serv­er in the “/​home/​myusername/​backup/​01/​exporteddata – 2013 – 10-26.txt” file. I still need­ed to get the data off the serv­er and to the big data ana­lyt­ics” company.

Get­ting the data off my serv­er onto the serv­er I’ll call big​dat​a​com​pa​ny​.com was a lit­tle tricky. If I were able to get my SSH pub­lic key set up on their serv­er would have made this much eas­i­er. Due to cir­cum­stances beyond my con­trol, that wasn’t happening.

Doing SFTP via the com­mand line isn’t that dif­fi­cult, but you need to enter your pass­word each time. You can’t sim­ply add in a pass­word para­me­ter to the sftp’ com­mand unfor­tu­nate­ly. It doesn’t work that way.

I found 2 options: ssh­pass” and expect”. I end­ed up using ssh­pass”, but I think using expect” would have also worked. If you want to check out the link that looked most promis­ing for how to use the expect approach, check it out here. Look for the 3rd answer on that page.

Below is the script I end­ed up with using ssh­pass”. I’ll describe it after the code block.

echo "* Logging into external big data SFTP server"

FILE=$DB_BACKUP/01/exporteddata-`date +%Y-%m-%d`.txt

sshpass -p 'mysftppassword' sftp -oBatchMode=no -b - -oStrictHostKeyChecking=no externalusername@ftp.bigdatacompany.com <<_EOF_
cd /incoming
put $FILE
bye
_EOF_

echo "* Logging off external bigdatacompany server"
echo "----------------------"
exit 0

Like before, the echo com­mands let me know what I was try­ing to exe­cute in the script.

I made a vari­able defin­ing the name of the file that was cre­at­ed in the pre­vi­ous sec­tion of code in a vari­able called FILE”. You can see where I use it lat­er in the script where you see $FILE.

In my exam­ple code, I’m show­ing that the big data com­pa­ny gave me an SFTP user­name of exter­naluser­name’ and a pass­word of mys­ftp­pass­word’. The server’s address is ftp​.big​dat​a​com​pa​ny​.com’. I used these val­ues in the ssh­pass’ command.

ssh­pass’ accepts the -p para­me­ter with your pass­word as a string fol­lowed by the actu­al sftp’ com­mand you would use. I have the “-oStrictHostKeyChecking=no” in there because I thought it would help me avoid error that warns against an unknown host. (It end­ed up work­ing for me when I ran the script direct­ly from the com­mand line, but it failed when I had it run in the cron job. I’ll show you how I dealt with that lat­er in this post.)

Once my script logged onto the exter­nal serv­er, I had to exe­cute the com­mands that would have done if I were doing it man­u­al­ly. First I changed into the direc­to­ry with the cd’ com­mand. Then I put’ the file there. Last­ly, I say bye’ to leave sftp.

I wrapped all of those com­mands in a here script” sec­tion. What’s here script? It a way of includ­ing a bunch of com­mands, a stream of them, in your script. In my exam­ple, the EOF marks the begin­ning and end of the here script. Here’s an excel­lent tuto­r­i­al that I learned about here scripts from myself.

The final exit 0” com­mand is there to tell your cron job that you com­plet­ed the script. It’s like say­ing The End”.

Know your crontab

The script may be over, but we’re not done yet because we haven’t talked about cron jobs yet.

All was going well with the script. I could run the com­mand from my com­mand line and it made the data export and copied it to the exter­nal serv­er. As men­tioned ear­li­er, I’d run the com­mand like this:

/home/myusername/bin/dailybackup

Now I need­ed to add that com­mand to my cron job list. You do this by using crontab -e’ com­mand to edit the cur­rent crontab using the default edi­tor which is nano in my case.

crontab -e

This showed my cur­rent crontab file for myuser­nam’.

I edit­ed the default crontab file to include the fol­low­ing lines:

SHELL=/bin/bash
PATH=/sbin:/bin/usr/bin
HOME=/
MAILTO='notifyme@domain.com'

These lines set the shell, path, and home for the crontab jobs. The MAIL­TO” line let me over­ride the email address I had set up as the admin user email address on my server.

I want­ed my dai­ly­back­up’ job to run every day at 12:01am so I added the fol­low­ing com­mand to the end of my crontab file.

1 0 * * * /home/myusername/bin/dailybackup

The eas­i­est way is to fig­ure out the right syn­tax for your crontab com­mand is to use the web-based Corntab util­i­ty. (Yes, it’s Corntab” not Crontab” because corn is fun­nier than cron.)

Of course I test­ed this ini­tial­ly by not hav­ing it run only at 12:01am every day. I test­ed it by hav­ing it run every 5 min­utes or so until I got the whole thing debugged properly.

Failed host error

I had an error that would appear only when I exe­cut­ed my dai­ly­back­up’ script via crontab. I could see the error in the email reports that the cron job was send­ing to the email address I’d spec­i­fied. The error was:

Failed to add the host to the list of known hosts (/home/myusername/.ssh/known_hosts). I already had an .ssh fold­er but the known_​hosts’ file wasn’t there and my cron job didn’t seem to be able to cre­ate it. I made the file man­u­al­ly instead and made sure myuser­name’ was the own­er of the file.

/home/myusername/.ssh/known_hosts
chown myusername:myusername /home/myusername/.ssh/known_hosts

I then logged into the exter­nal ftp serv­er man­u­al­ly again from with­in my Ubun­tu serv­er. I was asked if I want­ed to add this host to my known hosts file. I said yes’ and that solved the problem.

Now do this, computer!

After this dai­ly back­up was debugged and worked, I felt like I had a new pow­er that I hadn’t had before. I added oth­er jobs, like doing a SQL back­up of my data­base reg­u­lar­ly as well. I guess this is what Super­man feels like.