Let's say that in that 10GB directory tree, a few files change during the day, resulting in 30MB of total changes. When your nightly rsync runs, it will create hard links to the files that have not changed and will copy over the 30MB in files that have changed, resulting in the new backup directory appearing as if it has all 10GB worth of data, when in reality it has a whole lot of hard links and only the files that have changed since the last pass. If you're synchronizing nightly and you want to keep 14 days of backups, that means you're only storing 10GB once and much smaller file changes in every subsequent pass. Instead of storing and transmitting 140GB, you might only have to store and transmit 15GB, depending on how much churn is in that directory tree. If you're interested in knowing more about using rsync for backups, here's a good example you might want to read.
Of course, as anyone who uses kernel.org is likely to know, rsync can function as a server and a client. In fact, this is how many "prosumer" NAS storage arrays handle backup functions, by running rsync servers that can listen for incoming requests for data. It's about as easy as it sounds. An rsync listener is run on one system, and certain directories or file systems are exposed for synchronization. Instead of requiring authentication via the normal SSH transport, rsync can use its native transport based on predetermined secrets to allow authenticated or anonymous synchronization requests. This is best suited to nonsensitive data because the transport itself is not secured. If secured transmission is desired, you can use either SSH and public keys or a VPN.
There are other goodies in rsync, such as the ability to limit bandwidth consumption during transfers to reduce the impact on network connections, or the ability to use fuzzy matching to determine if a file has a twin or version with a different name or checksum on the target, and thus can be used as the basis of a rolling checksum transfer, and other elements that only serve to increase this utility's usefulness.
Regardless, it's clear why Tridgell has more faith in the longevity of rsync than Samba. Once the sands of time have washed the SMB protocol away for good and Samba is resting in peace alongside NetBEUI and NetWare, we'll still have the need for fast and fluid file synchronization throughout our infrastructures. Luckily, we already have the solution.
This story, "Why you should be using rsync," was originally published at InfoWorld.com. Read more of Paul Venezia's The Deep End blog at InfoWorld.com. For the latest business technology news, follow InfoWorld.com on Twitter.