The dangers of abstraction
This is a repost of a text first published on June 2013, in my previous blog, erroneousthoughts.org
(now decommissioned). It offers a good illustration of what can lurk hidden under a software layer of abstraction…
Say you have a folder xpto/
on machine A
, and it contains about 1Gb of data. You mostly work on these files on machine A
, but you want to keep a backup in remotely (SSH) accessible machine B
. So naturally, you copy the folders from A
to B
, using a tool like scp
, for example. After a while, you will have done some work, thus modifying the files in A
, and now you want to sync them with their counterparts in B
. How to proceed?
Never fear, rsync
comes to the rescue! But because you’re in a hurry—who isn’t these days, right?—you just follow the quick and dirty approach that you first think of: mount the folder that sits in B
remotely in A
, using the great sshfs
program—thus making the xpto
folder in B
appear to be a local folder in A
. Of course, appear is the key term here, because when you then run rsync
as if it that was in fact a local folder, the modification times of all files are going to be mismatched, because the two machines had slightly different times (a few seconds)—even files that had not been changed since being copying to B
.
Now, rsync
for efficiency reasons doesn’t just blindly checksum every file to see if it has changed: rather, it assumes that if the files have matching sizes and modifications times, they are unchanged. Otherwise (with the options I was using) it computes checksums. In the scenario above, this meant computing checksums of all files! But it got even worse: remember I was running rsync
as if both folders were local. Which meant that to compute the checksum on the remote file, sshfs
will have to copy it to local machine (A
) and only then compute the checksum! In other words, it had to copy all the files! Not cool.
The solution of course, is to use rsync
over ssh
directly (instead of through sshfs
). We want to push changes to the remote location, so according to the fine manual (page), the command is: $ rsync [OPTION…] SRC… [USER@]HOST:DEST
.
This way, even if checksums have to be computed, they can be done without having to previously transfer any file. Abstractions are sometimes necessary… but beware nonetheless: for the devil is truly in the (abstracted!) details.
Debunked with the excellent help of the Arch Community!
March 19, 2024. Got feedback? See the contact page.