In this article we explain how to automate the backup of files on remote machines to a centralized server using rsync.
rsync is a command line utility that is used to synchronize files between two computers over a network to synchronize files between two filesystems. It was written as a replacement for rcp but with many new features. For example it uses an algorithm that will only transfer files that have been modified. SSH will be used to authenticate between the machines and to encrypt the network traffic.
The situation: We have four machines named: server, machine1, machine2, and machine3. The server has a tape drive that is used to do nightly backups. machine1 is used as a development box and has files that need to be backed up in /src and in /home. machine2 is used for mail and needs /home and /mail to be backed up. machine3 is a web server and needs /home, /var/www, and /etc/httpd backed up.
Create a shell script for each machine. Simplify your maintenance by placing the scripts in a central location. I like to use /root/scripts. Decide on where you want to log your output. I like /root/logs but another common option is to have the script mail you the output.
Add entries to your crontab to call the scripts. Make sure you leave enough time before your normal backups of the server that the rsync jobs complete.
Each night the following will occur:
1. rsync machine1 -> Server
2. rsync machine2 -> Server
3. rsync machine3 -> Server
4. backup server to tape
Let's take a look at the flags used for rsync in the examples:
rsync -ave ssh --numeric-ids --delete machine1:/home /machine1
* -a:
Archive mode
* -v:
Verbose output
* -e ssh:
Specify the remote shell as ssh
* --numeric-ids:
Tells rsync to not map user and group id numbers local user and group names
* --delete:
Makes server copy an exact copy of the source by removing any files that have been removed on the remote machine
* machine1:/home:
The remote machine name, then the directory to be backed up
* /machine1:
The directory to place the backup
Next generate a public private key pair with ssh. Place the public key in the ~/.ssh/authorized_keys file in an account on machine1, machine2, and machine3 that has read access to the directories that need to be backed up. It is best not to use the root account on the remote machines, but you should evaluate the risk in your environment. Test that you can login to these accounts using ssh without using a password.
Test each one of the rsync scripts. The first time you run rsync will take the longest as it will need to copy all the files from the remote machines and not just the files that have changed.
Add the /machine1, /machine2, and /machine3 (or whatever you have named them) directories to the servers backup script.
While this process does not backup the entire remote machine, it will ensure that you will not lose irreplaceable data.
Starting with the example scripts included in this tutorial there are many changes that can be made to fit your specific circumstances.
The frequency of the rsyncs can be modified to occur more often or at different times. Simply by adding additional crontab lines the backup from the remote machines could be done everyday at lunch, multiple times a day or even hourly.
The scripts could also be changed to rotate between multiple backups on the server or could be changed to do some sort of processing on the files before they are backed up. For example if the home directories you are backing up contain web browser caches, they could be removed after the rsync but before the system backup.
Using this article as a starting point you should create a backup plan that fit your needs.
Example rsync script for machine1:
#!/bin/bash
rsync -ave ssh --numeric-ids --delete machine1:/home /machine1
rsync -ave ssh --numeric-ids --delete machine1:/src /machine1
Example rsync script for machine2:
#!/bin/bash
rsync -ave ssh --numeric-ids --delete machine2:/home /machine2
rsync -ave ssh --numeric-ids --delete machine2:/mail /machine2
Example rsync script for machine3:
#!/bin/bash
rsync -ave ssh --numeric-ids --delete machine3:/home /machine3
rsync -ave ssh --numeric-ids --delete machine3:/var/www /machine3
rsync -ave ssh --numeric-ids --delete machine3:/etc/httpd /machine3
Example crontab file logging to a directory:
# Scripts to rsync machines
59 20 * * * /root/scripts/sync-machine1.sh >/root/logs/sync-machine1.log 2>&1
59 21 * * * /root/scripts/sync-machine2.sh >/root/logs/sync-machine2.log 2>&1
59 22 * * * /root/scripts/sync-machine3.sh >/root/logs/sync-machine3.log 2>&1
#
# Nightly Backup script
59 23 * * * /root/scripts/backup.sh > /root/logs/backup.log 2>&1
Example crontab file mailing the output:
# Scripts to rsync machines
59 20 * * * /root/scripts/sync-machine1.sh
59 21 * * * /root/scripts/sync-machine2.sh
59 22 * * * /root/scripts/sync-machine3.sh
#
# Nightly Backup script
59 23 * * * /root/scripts/backup.sh
-----------
-----------
This article originally appeared quite some time ago. But for some unknown reason, it was lost from the indexes. I've just come back to upgrade it with some new error observations.We now return you to your regularly scheduled read...rsync is an amazing and powerful tool for moving files around. I know of people that use it for file transfers, keeping dns server records up-to-date, and along with sshd to remote restart the services when rsync reports a file change (how they do that, I don't know, I'm just told they do it).This article describes how you can use rsync to synchronize file trees. In this case, I'm using two websites to make sure one is a backup of the other. As an example, I'll be making sure that one box contains the same files as the other box in case I need to put the backup box into production, should a failure occur. |
Overview |
rsync can be used in six different ways, as documented in man rsync :
|
This was an easy port to install (aren't they all, for the most part?). Remember, I have the entire ports tree, so I did this: If you don't have the ports tree installed, you have a bit more work to do.... As far as I know, you need rsync installed on both client and server, although you do not need to be running rsyncd unless you are connecting via method 4. |
Setting up the server |
In this example, we're going to be using a remote rsync server (4). On the production web server, I created the /usr/local/etc/rsyncd.conf file. The contents is based on man rsyncd.conf . You'll note that I'm runninguid = rsync gid = rsync use chroot = no max connections = 4 syslog facility = local5 pid file = /var/run/rsyncd.pid [www] path = /usr/local/websites/ comment = all of the websites rsync as rsync:rsync . I added lines to vipw and /etc/group to reflect the new user. Something like this: and Then I started the rsync daemon and verified it was running by doing this: And I found this in /var/log/messages : Then I verified that I could connect to the daemon by doing this: I determined the port 873 by looking at man rsyncd.conf .See the security section for more information. You can also specify a login and user id. But if you do that, I suggest you make /usr/local/etc/rsyncd.conf non-world readable: This example is straight from the man page. Add this to the configuration file: The /usr/local/etc/rsyncd.secrets file would look something like this: And don't forget to hide that file from the world as well: |
Setting up the client |
You may have to install rsync on the client as well.. There wasn't much to set up on the client. I merely issued the following command. The rsync server in question is ducky . In the above example, I'm connecting to ducky, getting the www collection, and putting it all in /home/dan/test.And rsync took off! Note that I have not implemented any security here at all. See the security section for that.I checked the output of my first rsync and decided I didn't want everything transferred. So I modified the command to this: See the man pages for more exclusion options.I also wanted deleted server files to be deleted on the client. So I did this: Of course, you can combine all of these arguments to suit your needs.I found the --stats option interesting: |
My transfers are occur on a trusted network and I'm not worried about the contents of the transfer being observed. However, you can use ssh as the transfer medium by using the following command: Note that this differs from the previous example in that you have only one : (colon) not two as in the previous example. See man rsync for details. In this example, we will be grabbing the contents of ~/www from host ducky using our existing user login. The contents of the remote directory will be synchronized with the local directory test . Now if you try an rsync , you'll see this: Here I supplied the wrong password and I didn't specify the user ID. I suspect it used my login. A check of the man page confirmed this. This was my next attempt. You can see that I added the user name before the host, ducky .. In this case, nothing was transferred as I'd already done several successful rsyncs .The next section deals with how to use a password in batch mode. |
Do it on a regular basis |
There's no sense in having an rsync set up if you aren't going to use it on a regular basis. In order to use rsync from a cron job, you should supply the password in a non-world readable file. I put my password in /home/dan/test/rsync.password . Remember to chmod 640 that password file!I put the command into a script file (rsync.sh ), which looks like this: Remember to chmod 740 the script file!Then I put this into /etc/crontab in order to run this command every hour (this should be all on one line): The above will mail you a copy of the output.If you want to use ssh as your transport medium, I suggest using using the authorized_keys feature. |
My comments |
I think rsync is one of the most powerful tools I've seen for transferring files around a network and the Internet. It is just so powerful! Although I actually use cvsup to publish the Diary, I am still impressed with rsync . |
I was recently adding some new files to my rsync tree. I found these errors: It took me a while to understand the problem. It's a read issue. rsyncd didn't have permission to read the files in question. You can either make rsynd run as a different user, or change the permissions on the files.If you get the user id for rsync wrong, you'll see this error: I had the rsync user misspelt as rysnc . |