The revenge of rsync

This is a repost of a text first published on June 2013, in my previous blog, erroneousthoughts.org (now decommissioned). It illustrates how just one tiny character, can make a world of difference… #rsync #source-folder #forward-slash

Just the day after wrote about rsync, I made one of my biggest programming blunders—and precisely while using rsync.

I use a simple python script to automate my backups.1 One of the things it does, is synchronise the con­tents of several large folders with their counter­parts in an external hard drive. For instance, say you have five big folders, a, b, c, d and e on your local machine, that you need to keep synchronous copies of. The way I use the script to solve this problem is to run a command of the form

$ rsync <options> <local_folder> <external_hdd>/backups

for each local folder to be backed up. The way this is supposed to work is that rsync will create folders a, b, etc inside the backups folder, in the external hard drive. In addition, one of the options to rsync I was using was ––delete-during, which is supposed to delete from the destination folder files that are not found in the source folder. This is to avoid keeping a backup of files I had genuinely deleted.

So far so good: everything works as expected. But now sup­pose that you make a small mistake, and instead of d, you write, in the list2 of folders to be backed up, d/. Seems fairly innocuous, right? Wrong! If you go back to the fine manual, you’ll see that

$ rsync <options> d <external_hdd>/backups

works as you’d expect—it creates a folder backups/d in the external hard drive if one does not already exist, and keeps it synchronised with the local one. But

$ rsync <options> d/ <external_hdd>/backups
does not work as you expect—because what it does is to copy the contents of the d folder to the backups folder, regardless of whether the latter contains a sub­folder named d.

But it gets worse. How? Remember that the one of the options to rsync is ––delete-during which deleted from the destination folder any­thing that is not present in the source folder. And using d/ as the source in the rsync command means that the destination folder is no longer backups/d, but rather backups, which means any­thing in there that is not contained inside the source folder d will be deleted. Like, for instance, all the folders a to c backed up previously. Or any­thing else in the backups folder that wasn’t also in the d folder. And thus, I lost backups dating almost an year back. And the only reason this is not an irreparable mistake is because I, being the paranoid nut that I am, had the really important stuff—think password files and encryption keys—backed up in another location.

Even so, for someone who’s been doing this for years, to make such a blunder… it is—to put it gently—a humbling experience. The solution of course is to make sure, before running the rsync command, that the source folder does not end with a slash. As so much else in programming, it’s obvious with hindsight…

March 19, 2024. Got feedback? See the contact page.