How to use inotify and rsync to create a live backup system

Why should you use Bash Scripts to perform folder synchronizations and backups?

Bash is by far the most popular and used sh-compatible command language interpreter. Today you can find Bash almost everywhere, including Microsoft Windows with the new Windows Subsystem for Linux. Virtually all GNU/Linux distribution comes with Bash as the default shell. The same applies to MacOS and some other Unix-Like operating systems.

Bash is not only a command language; as other Unix shells, Bash is both a programming language and a command interpreter. Technically speaking, the programming side of a shell gives the user abilities and features to combine system or shell utilities in a file. The user can create commands just by combining commands in a text file; these special types of text file that include a collection of commands is called shell scripts and, when those files receive permission to execute, the shell interpreter sees them as a single command.

The advantage of a bash script is that you can use command-line tools directly inside it without the need for import or source external libraries. This command-line tools and built-in utilities are powerful and can interact directly with the operating system without compilation or additional interpreters; usually core utilities and command-line interfaces, like awk, xargs, find, and grep, can have a much better performance than using Python scripts and its libraries for example. It is not difficult to find people doing advanced data analysis using only bash script and GNU built-in utilities. Others claim that this kind of approach can be 235 x faster than a Hadoop cluster – which is not so difficult to believe considering some clustering monstrosities you can find nowadays just to suit bad software designs.

In this matter, one question always arises: if Bash is so powerful, why not use it to automate all the boring stuff? Bash syntax is simple and pragmatic: it gives you the ability to combine programs to automate common tasks. However, when the script needs to deal with multiple conditions or accumulate too many purposes, it is time to consider a more robust programming language, like C or other script languages, where Python and Perl are good examples.

On the other hand, Bash scripts are very good for single tasks like the intention of this article: to combine utilities with capabilities to check for changes in a specific folder and then synchronize its files. A bash script can perfectly suit this task.

What do you need to perform syncronization or autobackups?

There is a large list of different methods to synchronize folders and files. The number of applications that can be used to accomplish this simple task is vast, and some of these are third-party solutions. However, this article shows you a more elegant way to accomplish the same using only inotifywait and rsync in a Bash script. In general, this solution will be light, inexpensive and, why not say, safer. In essence, only inotify-tools, Rsync, and a while loop are required to complete this mission.

How to use inotifywait for autobacks and syncronizations?

inotifywait uses inotify API to wait for changes in files. This command was specially designed to be used in shell scripts. One powerful characteristic of inotifywait is to check for changes continuously; as soon new events occur, inotifywait prints the modifications and exits.

inotifywait provides two options that are very interesting for folder synchronization or real-time backups. The first one is the -r, –recursive option; as the name implies, this flag watches unlimited subdirectories depths of a specific directory passed as arguments to inotifywait, excluding symbolic links.

The -e, –event flag provides another interesting feature. This option requires a list of predefined events. Inotify-tool documentation lists more than 15 events for inotifywait; but a simple backup and synchronization system requires only the delete, modify, and create events.
The following command is a good example of a real-world scenario:

$ inotifywait -r -e modify,create,delete /home/userDir/Documents

In this case, the command waits for changes – modifications, file or folder creations or exclusions of any kind – in the fictitious /home/userDir/Documents directory. As soon the user makes any change, inotifywait outputs the modification and exit.

Supposing you create a new file called newFile inside the Documents folder while the inotifywait is monitoring it. Once the command detects the file creation, it outputs

Documents/ CREATE newFile

In other words, inotifywait prints where the modification occurs, what type of changes it made, and the name of the file or folder that has been changed.

Examining the exit status of inotifywait when a change takes place, you see a 0-exit status that means a successful execution. This situation is perfect for a shell script because an exit status can be used as a true or false condition.

Consequently, the first step of the script is complete: to find a utility that waits for changes in directories. The second one is to search for a utility able to synchronize two directories, and rsync is a perfect candidate.

How to use Rsync for autobackups?

rsync is a powerful application. You can write a book describing everything you can do with this versatile utility. Technically speaking, rsync is nothing more than a file-copying tool, a kind of cp command with steroids and special powers like secure transfer files. The use of rsync in this script is more modest but no less elegant.

The main goal is to find a way to:

Recurse into directories;
Copy symbolic links as symbolic links;
Preserve permissions, ownership, groups, modification time, devices and special files;
Provide additional details, verbose output – so, it is possible to create a log file if needed;
Compress files during the transference for optimization.

The rsync documentation is well written; checking the summary of the options available, you can easily select the -avz flags as the better choice. A simple usage looks as follows:

rsync -avz <origin folder>/ <destination folder>

It is important to put a slash after the origin folder. On the contrary, rsync copies the entire origin folder (including itself) to the destination folder.

For example, if you create two folders, one called originFolder and the other destinationFolder, to make rsync send to the second one every change made on the first, use the subsequent command:

$ rsync -avz origenFolder/ destinationFolder

Once you create a new file named newFile, rsync prints something like:

Sending incremental file list

./

newFile

sent 101 bytes received 38 bytes 278.00 bytes/sec

total size is 0 speedup is 0.00

In the first line, the directive prints the type of process, an incremental copy; this means that the rsync uses its compressions capabilities to only increment the file and not change the whole archive. As it is the first time the command is executed, the application copies the entire file; once new changes occur, only incrementations take place. The subsequent output is the location, the file name, and performance data. Checking the exit status of the rsync command, you receive a 0-exit for successful execution.

So, there are two important applications to give support in this script: one is capable of waiting for changes, and the other can create copies of this modification in real-time. Here, what is missing is a way to connect both utilities in a manner that rsync takes action as soon as inotifywait perceives any alteration.

Why we need a while loop?

The simplest solution for the problem above is a while loop. In other words, on each occasion inotifywait exists successfully, the bash script needs to call rsync to perform its incrementation; immediately after the copy occurs, the shell needs to go back to the initial state and wait for a new exit of the inotifywait command. That is exactly what a while loop does.

You do not need an extensive background in programming to write a bash script. It is very common to find good system administrators that have no, or very limited, experience with programming. However, creating functional scripts is always an important task of system administration. The good news is that the concept behind a while loop is easy to understand.

The following diagram represents a while loop:

Infinite while loop diagram.

A represents the inotifywait command discussed above and B, rsync. Every time A exists with a 0-exit status, the shell interprets it as true; thus, the while loop permits the execution of B; as soon B also successfully exits, the command returns to A again and repeats the loop.
In this case, the while loop always evaluates true for A. Technically, it generates an infinite loop, what is good for the propose of this script; inotifywait will recurrently be executed, meaning it will always wait for new modifications.

More formally, the syntax for a bash while loop is:

while <list of conditions>
do
<list of commands>
done

<list of conditions> means the list of conditions (A) that need to be true; so, the while loop can execute the <list of commands>, standing for the block of commands (B). If the pre-test loop A is false, then the while loop exits without executing B.

Here is how rsync and inotifywait commands fit inside the while loop,

while inotifywait -r -e modify,create,delete origenFolder
do
rsync -avz origenFolder/ destinationFolder
done

Combining everything

Now it is time to combine everything discussed above to create a shell script. The first thing is to create an empty file and name it; as an example, liveBackup.bash represents a good choice. It is good practice to place shell scripts in the bin folder under the user home directory, a.k.a. $HOME/bin.

After that, you can edit the file in the text editor of your choice. The first line of a Bash script is very important; this is where the script defines the interpreter directive, for example:

#!<interpreter directive> <commands> [options]

The shebang is this strange symbol with a hash and an exclamation mark (#!). When the shell loads the script for the first time, it looks for this sign, since it indicates what interpreter needs to be used to run the program. The shebang is not a comment, and it needs to be placed at the top of the script without spaces above.

You can leave the first line empty and not define the interpreter. In this way, the shell uses the default interpreter to load and execute the script, yet, it is not endorsed. The most appropriated and secure choice is to indicate the interpreter directive as follows:

#!/usr/bin/bash

With the interpreter directive explicit like that, the shell looks for the bash interpreter under the /usr/bin directory. As the task of this script is simple, there is no need to specify more commands or options. A more sophisticated possibility is to call the interpreter using the env command.

#!/usr/bin/env bash

In this context, the shell searches for the default bash command under the current environment. Such arrangement is useful when the user environment has important customizations. However, it can lead to security glitches on an enterprise level once the shell is not able to detect if the command bash under a customized environment is or is not a safe interpreter.

When putting everything together at this point, the script looks like:

#!/usr/bin/bash

while inotifywait -r -e modify,create,delete originFolder
do
rsync -avz origenFolder/ destinationFolder
done

How to use arguments in a Bash Script?

What separates this script from a total functionality is how it is defining the origin and the destination folder. For instance, it is necessary to find a way to show what are those folders. The faster mode to solve that question is using arguments and variables.

Here is an example of the correct way to refer to the script:

$ ./liveBackup.bash /home/user/origin /home/user/destination

The shell loads any of those arguments typed after the script name and passes them to the script loader as variables. For example, the directory /home/user/origin is the first argument, and you can access it inside the script using the $1. Thus, $2 has a value of /home/user/destination. All these positional variables can be accessed using the dollar sign ($) followed by a n-number ($n), where n is the position of the argument where the script is called.

The dollar sign ($) has a very special meaning and implications inside shell scripts; in other articles, it will be discussed in depth. For now, the puzzle is almost solved.

#!/usr/bin/bash

while inotifywait -r -e modify,create,delete $1

do
rsync -avz $1/ $2
done

Note: to deal with too many arguments using only positional parameters ($n) can quickly lead to bad designs and confusion in shell scripts. A more elegant way to solve that problem is to use the getopts command. This command also helps you to create alerts of misusage, what can be useful when other users have access to the script. A fast search on the internet can show different methods of using getopts, what can improve the current script if you need to give more usage options to other users.

Making it executable

Only one more thing needs to be done now: to make the file liveBackup.bash executable. It can be easily performed with the chmod command.

Go to the folder containing the script and type:

$ chmod +x liveBackup.bash

Then, type the dot-slash sign (./) before the script name. The dot means, in this context, the current directory and the slash defines a relative path to the file in the current directory. With this in mind, you also need to type the origin folder as the first argument, followed by the destination folder as the second, such as:

$ ./liveBackup.bash /home/user/origin /home/user/destination

Alternatively, you can call the scripts by its name placing its folder location in the environment PATH or calling it a subshell, like:

$ bash liveBackup.bash /home/user/origin /home/user/destination

The first option is a secure choice though.

Real Life example

In a real-world scenario, manually running a backup script each time you boot the system, can be tedious. A good choice is to use a cronjob or timers/service units with systemd. If you have many different folders to backup, you can also create another script that sources the liveBackup.bash; thus, the command needs to be called only once in a .service unit. In another article, this feature can be discussed in more detail.

If you are using the Windows Subsystem for Linux, it is possible to create a basic task to run your script using the “Task Scheduler” that is triggered by the system startup. To use a batch file to call the bash.exe with a list of commands is a good choice. You can also use a Visual Basic script to launch the batch file in the background.

What a pro bash script looks like

Here is an example of a script designed by the author that can read more sophisticated command-line arguments.

<pre>#!/usr/bin/env bash
#
#########################################################################################
#########################################################################################
#
# SCRIPT: syncFolder.bash
# AUTHOR: Diego Aurino da Silva
# DATE: February 16, 2018
# REV: 1.0
# LICENSE: MIT (https://github.com/diegoaurino/bashScripts/blob/master/LICENSE)
#
# PLATFORM: WSL or GNU/Linux
#
# PURPOSE: small script to sync left-to-right changes from two folders
# under WSL or GNU/Linux (requires inotify-tools)
#
#########################################################################################
#########################################################################################

##################
# GENERAL SETTINGS
##################

bold=$(tput bold)
normal=$(tput sgr0)

origen=""
destination=""

##################
# OPTIONS SECTION
##################

if [ $# -eq 0 ]
then
printf "\n%s\t\t%s\n\n" "Use ${bold}-h${normal} for help."
exit 1
else
while getopts ":h" option
do
case ${option} in
h )
printf "\n%s\t\t%s\n\n" "Usage: ./syncFolder.bash ${bold}/origen/folder${normal} -o ${bold}/destination/folder${normal}"
exit 0
;;
\? )
printf "\n%s\n\n" "${bold}Invalid Option for${normal} $(basename $0)" 1>&2
exit 1
;;
esac
done
shift $((OPTIND -1))
origen=$1
shift
while getopts ":o:" option
do
case ${option} in
o )
destination=$OPTARG
printf "\n%s\n\n" "The following folders will be left-right synced:"
printf "\tOrigen:\t\t\t%s\n" "${bold}$origen${normal}"
printf "\tDestination:\t\t%s\n\n" "${bold}$destination${normal}"
;;
\? )
printf "\n%s\n\n" "${bold}Invalid Option for${normal} $(basename $0): -$OPTARG." 1>&2
exit 1
;;
: )
printf "\n%s\n\n" "${bold}The option${normal} -$OPTARG requires a directory as argument." 1>&2
exit 1
;;
* )
printf "\n%s\n\n" "${bold}Unkown option for${normal} $(basename $0): -$OPTARG." 1>&2
exit 1
;;
esac
done
shift $((OPTIND -1))
fi

##################
# SYNC SECTION
##################

while inotifywait -r -e modify,create,delete $origen
do
rsync -avz $origen/ $destination --delete --filter='P .git'
done</pre>

Challenges

As a challenge, try to design two more versions of the current script. The first one needs to print a logfile that stores every change found by the inotifywait command and every incrementation done by rsync. The second challenge is to create a two-direction synchronization system using only a while loop as the previous script. A piece of advice: it is easier than it appears.

You can share your findings or questions on twitter @linuxhint