Date last modified: Fri Apr 14 2017 8:23 AM
'Time spent in reconnaissance is seldom wasted' - The Art of War, Sun Tzu, 400BC
This page records my researches and conclusions about backup software - primarily for Windows-based machines, and was mostly written during 2008.
- Push or Pull? What I call a 'pull' backup relies on the backup server pulling the information from the operating computers, whereas with a 'push' backup the operating computer pushes its data to the backup server. In general I think a push backup system is better because it is more secure: the backup server does not need to be able to gain access to any of the operating computers.
- 3D or 4D? By '3D Backup' I mean that a backup archive is essentially a snapshot of your files as they were at one time; 4D Backup offers you the chance to recover early versions of files going back in time, both files that no longer exist on your operating computer and files that still exist but have changed. So it adds the 4th dimension (time) to a backup archive. But it must do so elegantly, not just by transferring every file each time it changes (which requires huge bandwidth) and then storing it in full (requiring huge backup storage). To win my 4D Backup badge a solution will provide backup history and will optimise backup both for transfer and for backup storage - for example by using so-called 'diffs' or 'deltas', or by deduplication.
- Local or Remote (Offsite)? Local backup is good but what if a fire destroys the premises? The best combination is to have both: a primary onsite backup and a secondary offsite backup. If the backup can be encrypted before it is transferred to the secondary backup then this could be a third-party facility e.g. a commercial offsite backup service. Having two backups also obviates the need for RAID.
- Complete or important data only? Although it seems attractive to have a 'complete' backup of operating computers so that they could be restored seamlessly if they encountered any sort of problem, this is not a very practical approach. The amount of data to be backed up is huge (involving a lot of duplication of system files - a clever backup system such as BackupPC can avoid this overhead), and in any case in many 'wipeout' situations there will be no alternative to using a new installation. My conclusion was that for Windows computers in normal use you might for example only need to backup the 'Documents' folder (and subfolders), 'Desktop', perhaps email files, and some other specific folders for specific applications. At present for 11 computers we are using 80GB of backup space.
- Backup encryption. Some backup solutions offer encryption of the backup storage, so that if it is stolen it cannot be read by a third party. More complete protection is also possible if even the administrator of the backup server cannot read the backed-up files.
This is a list of backup software that wins my 4D Backup badge, free or with a free version, which I discovered as I searched for my optimal solution:
- TimeDicer. This is a free 4D Backup product. I have to declare an interest because I wrote it! It is the result of the researches recorded here and uses rdiff-backup with dedicated primary and secondary backup servers to give easy and high-level backup storage for Windows computers. A TimeDicer Server can also be used for backup from Linux or any other OS that can run rdiff-backup.
- tarsnap. This is a commercial 4D Backup product which is (I think) fully optimised for transfer and storage. Data is stored in the cloud (on Amazon AWS apparently) in encrypted form and the keys are known (apparently) only to the user. The charging model is by prepayment and you are billed based on storage and bandwidth used, however it claims that because of deduplication the amount of storage space is much less than you might expect. It is designed only for UNIX-like operating systems (BSD, Linux, MacOS X, Cygwin, etc), although it might be usable for data under Windows via Cygwin or 'Bash on Ubuntu on Windows'.
- Ahsay Backup. This is a commercial 4D Backup product (except AhsayACB) which is (I think) fully optimised for transfer and storage. It appears to work with 'forward diffs' i.e. it stores an original copy of a file and then stores incremental or differential files which can be combined with the original to create a later version of the same file. The backup server conducts regular checks to ensure consistency of the data. They offer much more powerful options (including replication servers) and are apparently the backbone for remote backup services offered by thousands of providers around the world - they even provide a list of them! They state that all files are first zipped and encrypted with the operating computer's defined encrypting key before being sent to Ahsay backup server, so they are safe even from the prying eyes of the backup server administrator.
- Box Backup is a 4D Backup open source automatic on-line backup system primarily for Unix platforms (including Linux) but with a Windows client. All backed up data is stored on the server in files on a filesystem, all data is encrypted and the administrator of the server has no access to the saved data. A backup daemon copies encrypted data to the server when it notices changes. Only changes within files are sent to the server, just like rsync, and old versions of files on the server are stored as changes from the current version (reverse diffs), so old versions and deleted files are available. Backup behaviour can be optimised for document or server backup, and it is designed to be easy and cheap to run a server with a portable implementation, and optional RAID implemented in userland for reliability without complex server setup or expensive hardware. There is a separate project for a GUI front-end to Box Backup called Boxi. I admit I only discovered this solution long after fixing on rdiff-backup (below). [Does Box Backup use rdiff-backup or librsync under the hood? I don't know.]
- rdiff-backup. "rdiff-backup backs up one directory to another, including between machines over a network. The target directory ends up a copy of the source directory, but extra reverse diffs are stored in a special subdirectory of that target directory, so you can still recover files lost some time ago. The idea is to combine the best features of a mirror and an incremental backup. rdiff-backup also preserves subdirectories, hard links, dev files, permissions, uid/gid ownership, modification times, extended attributes, acls, and resource forks. Also, rdiff-backup can operate in a bandwidth efficient manner over a pipe, like rsync. Thus you can use rdiff-backup and ssh to securely back a hard drive up to a remote location, and only the differences will be transmitted. Finally, rdiff-backup is easy to use and settings have sensical defaults." This is a fully-qualified 4D Backup solution; however it is a command-line utility and there is no elegant GUI. [Note: a new arrival on the scene [October 2010] JBackPack offers a GUI for rdiff-backup.] Rdiff-backup is now  stable at version 1.2.8 but not under active development/improvement, and is available for Windows. It does not directly offer encryption of the backup data, though this can be achieved by workarounds with some limitations.
- duplicity. This free Linux utility is similar to rdiff-backup but backs up to encrypted tar-format volumes. Duplicity grew out of rdiff-backup, however rdiff-backup's archives are meant to be as easy to view as possible, while duplicity's are as hard to view as possible and can be encrypted with GnuPG. Duplicity saves data in the more conventional full+forward delta format instead of rdiff-backup's mirror+reverse deltas, and rdiff-backup requires another copy of rdiff-backup on the remote destination, while duplicity can access remote locations with scp or ftp (other backends may be supported later). Its primary advantage is the encryption of the archives, and there is also a space saving from the compressed volumes (whereas rdiff-backup stores the most recent copy of each file uncompressed). Duplicity wins a 4D Backup badge, but it is not (at the time of writing) available for Windows except by using cygwin or, perhaps, 'Bash on Ubuntu on Windows'. Duplicity is currently (2017) actively maintained. The use of forward delta format means that you cannot delete very old backups and that recovering a recent backup depends upon having a complete and perfect set of backups from the original until the recent date; in order to reduce this dependency and the associated risk many duplicity users carry out new full backups every month or so, but this means you lose all your older backups (or if you retain them and create a parallel new archive you lose most of the advantage of the delta storage).
Here are some other good backup packages which do not however count as 4D Backup solutions:
- Bacula (Wikipedia page here) is a set of open source programs that manage backup, recovery, and verification of computer data across a network of computers of different kinds. There are some 5 components to the system and you are warned off if you are not a Unix expert; however a client is available for Windows. It offers comprehensive job control, stores lists of backed up files in a database for faster retrieval, and can backup to disk, DVD, and tape. It supports encrypted transfer and encrypted storage. Looks a bit intimidating to me, but certainly has a lot of features. It does not (at the time of writing) use diffs (deltas) for file storage or transfer and so it does not (on my definition) qualify as a 4D Backup solution.
- LogMeIn Backup is a commercial backup product which offers a remote server (i.e. you don't provide your own), and at the time of writing costs $40 for one year for up to 100,000 files, but without any 4D backup.
- Windows Home Server. This commercial product is very impressive, but it is not intended for offsite backups.
- rsync. This free Linux utility is widely used for backup, including remote backup. But although rsync is very efficient for transferring data because it uses a 'delta' system to work out the differences in files and then sends only these, it does not get involved in storing files - so if you want the chance to recover multiple earlier versions of a file that has changed over time you need to keep multiple copies of the same file, which eats disk space on the backup server. Mike Rubel has a page of impressive scripts and this led me on to:
- rsnapshot. Nathan Rosenquist's free utility is also based on rsync but is more like a polished application, and well documented. It is basically a 'pull' solution though. Some information about making it work with Windows can be found here. It has also been adapted to work with rdiff-backup instead of rsync - see here.
- DeltaCopy. This free Windows software is based around a port of rsync, with a GUI. You install a DeltaCopy Server on the Windows backup server and DeltaCopy clients on the Windows operating computers, but DeltaCopy clients can also connect to a Linux rsync-based backup computer. It is a 'push' solution - the operating computers initiate and control the backup process.
- BackupPC. As recommended on the rdiff-backup 'related' page, this is a well-considered package essentially aimed at intranet backup of Windows and Linux computers on a network. Where it scores is ease of use and its ability to recognise multiple copies of the same file - coming from different computers - and avoid storage duplication. If you are backing up system files as well as user files this can make a huge difference to the size of the backup. It can use rsync for transfer.
rdiff-backup uses reverse incremental diffs: each time that a backup of a changed file is performed the current version of the file is retained 'clean' on the backup server (i.e. a 'mirror' of the original), a diff file is created (in a subdirectory) to allow for retrospective migration to the previous saved copy, and then the previous saved clean copy is deleted. As the file changes and is backed up over time these diff files accumulate. It is a reverse diff because it is used to go back in time to an earlier version of the file; it is incremental because in order to get a given version of the file you apply each diff file in reverse chronological sequence.
By contrast, a 'differential' rather than 'incremental' backup strategy allows recovery to any file version with only 2 sources - the 'original' and the 'differential diff' file for that version. This approach will lead to increasingly large and repetitious diff files as the current files diverge from an original, and indeed it is not a practicable solution for reverse diffs (but see below regarding LVM snapshots).
The processing of the diff files is handled by rdiff-backup 'under the hood' - the user does not have to understand what is going on. (For a utility which creates diff files and allows you to apply them manually, see rdiff, or for a way to view an rdiff-backup archive as if each historical fileset existed in full, see rdiff-backup-fs.) If a diff file is missing or corrupted for a given day then because of the incremental nature of the backups any versions of the original file for that day or for any earlier day cannot be recovered. Although this presents obvious dangers, it is at least for most purposes a safer approach than forward incremental diffs, where the corruption of one older diff will mean that more recent versions of the file cannot be recovered.
Linux's LVM2 'logical volume manager' makes the 'volume groups' (which can then be formatted to provide useable storage space) flexible and independent of the physical storage medium. Here is information about why LVM is good, and it omits to mention the availability in LVM of snapshots, which are particularly useful for making backups. [An LVM snapshot could be considered a local reverse differential backup of the corresponding volume group, but it cannot be used tout court as a long-term backup solution, only as a way of getting a temporarily-frozen fileset from which a backup can proceed. Logic suggests that a long-term LVM snapshot would get very large and impact significantly on the performance of the filesystem, and multiple snapshots would be even worse.]
LVM has the ability to 'revert' a backup: you can take an LVM snapshot, make changes to the data from which the snapshot was taken, and then revert back to the snapshot, discarding all the changes. Someone tested it here.
- TimeDicer - how to setup Linux primary and secondary backup servers, and a Windows script for automating backups using rdiff-backup
- Free'n'Easy Windows File Server (Devil-Linux / Samba) / Network Drive / NAS Drive / Network Storage
- Differential Backup in Windows with Delta Files - Using 7-Zip and RDiff
- Choosing a new backup solution - page rather like this one by someone who decided in the end to use Duplicity
- Backing up Linux and other Unix(-like) systems - by someone who does not recommend rdiff-backup!
- Automated Backups with rdiff-backup - how to set up ssh keys for communication between Linux machines
- Backup Ubuntu using rdiff-backup - a quick'n'dirty guide
- rdiff-backup wiki: Backup from Windows to Linux
- BackupNinja - centralized backup for Linux machines via configuration files (can use rdiff-backup and other backup systems)
- Vshadow - how to use the Windows volume shadow (= snapshot) service with Windows XP to copy files while they are in use (you can download vshadow.exe here)
- TimeDicer - Onsite/offsite data backup for Windows (uses rdiff-backup)
- Web Scraping How To - extracting data from web sites
Here is a selection of some (other) programs I have written, most of which run from the command line (CLI), are freely available and can be obtained by clicking on the links. Dependencies are shown and while in most cases written for a conventional Linux server, they should run even on a Raspberry Pi, and many can run under Windows using Cygwin. Email me if you have problems or questions, or if you think I could help with a programming requirement.
- TimeDicer - Onsite/offsite data backup for Windows (uses rdiff-backup) [ GNU/Linux & MS Windows©: 2008-16 ]
- rdiffweb-install - GNU/Linux script to install rdiffWeb. [ GNU/Linux: 2011-16 ]
- rdiff-backup-regress - GNU/Linux script to regress an rdiff-backup archive. [ GNU/Linux: 2012-16 ]
Debian/Ubuntu kernel and LVM Utilities
- kernel-remove - GNU/Linux script to list the installed GNU/Linux kernels in a Debian-based distro (e.g. Ubuntu), and can be used to remove an unwanted kernel and related packages, updating grub appropriately. (Ubuntu Tweak can do the same but kernel-remove.sh is a command-line script so does not require GUI.) [ GNU/Linux-Debian/Ubuntu: 2010-15 ]
- kernel-update - GNU/Linux script to install/update Ubuntu kernel (also optionally btrfs-progs and duperemove) with latest version. [ GNU/Linux-Ubuntu: 2015-16 ]
- lvm-usage - GNU/Linux script to show available disk space and how it is used; run as cron job to warn if usage is above a set percentage. Provides additional information if LVM is in use. [ GNU/Linux-Debian/Ubuntu: 2012-16 ]
- lvm-delete-snapshot - GNU/Linux script to remove LVM snapshot that has been left over by another process. [ GNU/Linux-Debian/Ubuntu: 2012-16 ]
Dellmont / Three / Giffgaff / Vodafone - VoIP and Mobile Phone Account Utilities
- dellmont-credit-checker - GNU/Linux script to check credit balance on many Dellmont / Finarea / Betamax portals such as voicetrading.com and voipdiscount.com. [ GNU/Linux: 2008-17 ]
- sms-sender - GNU/Linux script to send text messages using Dellmont’s voicetrading.com. [ GNU/Linux: 2012-16 ]
- get-vt-cdrs - GNU/Linux script to download CDRs (call detail records) from Dellmont’s voicetrading.com or voippro.com. [ GNU/Linux: 2010-17 ]
- saynoto0870 - For people in UK, a GNU/Linux script which performs automated lookup of the www.saynoto0870.com database, finding cheap or free geographic number replacements for expensive non-geographic (087* or 084*) numbers. [ GNU/Linux: 2012-12 ]
- three-credit-checker - GNU/Linux script which checks credit/calls/text/data remaining on a mobile phone account with three.co.uk. [ GNU/Linux: 2014-16 ]
- giffgaff-credit-checker - GNU/Linux script which checks credit/calls/text/data remaining on a mobile phone account with giffgaff.com. [ GNU/Linux: 2014-17 ]
- vodafone-compile-bills - GNU/Linux script which reprocesses downloaded call record 'csv' files from vodafone.co.uk so that they can be easily analysed via spreadsheet - including analysis of bundled minutes which even Vodafone do not seem able to perform! [ GNU/Linux: 2012-16 ]
- sleepwalker - Windows© program which can be run from a remote machine to 'wake up' a Windows© machine behind a router, wait for it to start and then initiate Remote Desktop session. [MS Windows©: 2008-14]
- pass - GNU/Linux local program for easy entering of decrypt passphrase on a remote machine which has root dm-crypt+LUKS. [ GNU/Linux: 2017-17 ]
- unlock - GNU/Linux remote program for easy entering of decrypt passphrase on a remote machine which has root dm-crypt+LUKS. [ GNU/Linux: 2017-17 ]
- nano-update - GNU/Linux program to check/configure/make/install editor nano to the latest stable version found at http://www.nano-editor.org. [ GNU/Linux: 2015-16 ]
- pdf-compress - GNU/Linux program to create smaller b/w pdf file from an original large pdf file, especially when original resulted from scanning. [ GNU/Linux: 2016-17 ]
- form-extractor - GNU/Linux program to extract form tags from a web page or downloaded file. [ GNU/Linux: 2012-16 ]
- 123-dns-manager - GNU/Linux program for automated 123-Reg.co.uk Advanced DNS management. [ GNU/Linux: 2016-17 ]
- 123-dns-sync - GNU/Linux program to update DNS record at 123-Reg.co.uk to match external ip. [ GNU/Linux: 2016-17 ]
- recover-space - GNU/Linux program to enable a virtual disk volume to be compacted. [ GNU/Linux: 2014-15 ]
- tiny-device-monitor - GNU/Linux program to test webpages (including password-protected) or machines to check they are live; use as a cron job for your own websites, for hardware presenting a webpage, or for any machines with a presence on your local LAN or on the internet. [ GNU/Linux: 2009-16 ]
- dutree - GNU/Linux program to show a tree-style list of files and directories at the specified location and greater than the specified size (default 1GB). [ GNU/Linux: 2012-15 ]
- disk-wiper - GNU/Linux script to wipe a disk drive comprehensively and also check it for bad blocks. For use on a surplus drive (not SSD, not GPT) before passing to a third party. [ GNU/Linux: 2011-16 ]
- myip-upload - GNU/Linux and Windows (Cygwin) script to obtain external ip and upload it to remote site/file by ftp. [ GNU/Linux & MS Windows©: 2014-16 ]
- man2text - GNU/Linux one-liner program to convert man page output to straightforward text. [ GNU/Linux: 2012-12 ]
- Accounts - Multi-business multi-currency accounting software, uses Access [MS Windows©: 1996-2016]
- Rents Program - Residential lettings/landlord front office program, with many special features for UK market [MS Windows©: 1991-2016]