The dangers of abstraction

This is a repost of a text first published on June 2013, in my previous blog, erroneousthoughts.org (now decommissioned). It offers a good illustration of what can lurk hidden under a software layer of abstraction… #rsync #sshfs #ssh

Say you have a folder xpto/ on machine A, and it contains about 1Gb of data. You mostly work on these files on machine A, but you want to keep a backup in remotely (SSH) accessible machine B. So naturally, you copy the folders from A to B, using a tool like scp, for example. After a while, you will have done some work, thus modifying the files in A, and now you want to sync them with their counter­parts in B. How to proceed?

Never fear, rsync comes to the res­cue! But because you’re in a hurry—who isn’t these days, right?—you just follow the quick and dirty approach that you first think of: mount the folder that sits in B remotely in A, using the great sshfs program—thus making the xpto folder in B appear to be a local folder in A. Of course, appear is the key term here, because when you then run rsync as if it that was in fact a local folder, the modification times of all files are going to be mismatched, because the two machines had slightly different times (a few seconds)—even files that had not been changed since being copying to B.

Now, rsync for efficiency reasons doesn’t just blindly check­sum every file to see if it has changed: rather, it assumes that if the files have matching sizes and modifications times, they are unchanged. Otherwise (with the options I was using) it computes check­sums. In the scenario above, this meant computing check­sums of all files! But it got even worse: remember I was running rsync as if both folders were local. Which meant that to compute the check­sum on the remote file, sshfs will have to copy it to local machine (A) and only then compute the check­sum! In other words, it had to copy all the files! Not cool.

The solution of course, is to use rsync over ssh directly (instead of through sshfs). We want to push changes to the remote location, so according to the fine manual (page), the command is: $ rsync [OPTION…] SRC… [USER@]HOST:DEST.

This way, even if check­sums have to be computed, they can be done without having to previously trans­fer any file. Abstractions are some­times necessary… but beware nonetheless: for the devil is truly in the (abstracted!) details.

Debunked with the excel­lent help of the Arch Community!

March 19, 2024. Got feedback? See the contact page.