2 Comments

I wrote earlier some philosophical thoughts on backups.

This was inspired by several concerns, including:

  • Too many people have too lax an approach to backups, particularly considering how important their data is too them. (I’m thinking in particular Ph.D. theses and research data, but it applies more broadly of course.)
  • In the Apple Mac world, too many people don’t seem to appreciate the limitations of Time Machine.â„¢ (Don’t get me wrong: it’s an excellent solution for what it does, it’s just that it has limitations given that.)

Here I’m going to look quickly at (some of) the limitations of Time Machine,â„¢ so that users might become aware if they need additional solutions.

TM panel

Time Machineâ„¢, abbreviated here as TM (not to be confused with ‘trademark’!), is Apple’s backup solution that comes with Mac OS X (from version 10.5, onwards). It is designed to work with the Time Capsule, or Firewire or USB (external) hard drives. (I have no experience of using it with a Time Capsule.)

Backups want to be easy to do (so you don’t procrastinate and not do them) and easy to extract the old files from. The latter is important and often overlooked in choosing and testing a solution. Too often people install backup software (good) only to not really practice recovering data from backup sets until they need to (not very good).

TM’s great strength is that it makes both backups and recovery easy. This makes it a good solution for the consumer or the inherently lazy (or both…)

It is not the ‘perfect’ solution for everyone’s needs. Like much of Apple’s in-house software, the default is suited for a ‘typical consumer’. If your needs a little more you might find you need more than a TM backup.

In particular, people need to be aware of TM’s limitations so that they might decide if they need additional solutions. Note the word ‘additional’: I personally would suggest continuing to use it, just have additional solutions to compensate for it’s weaknesses.

4.1 Some general tips for using TM

Before I deal with the drawbacks of using TM, a few general tips:

  • Use an external hard disk with at least twice the storage of the files that you wish to backup. External USB hard disks are cheap today and the default internal hard drives that come with Apple computers are usually fairly modest in size, so this shouldn’t be too difficult.
  • I suggest first turning TM off, then get the disk you want have TM backup to set up, then mark the disk as not to be indexed by Spotlightâ„¢, and then turn TM back on. The step of disabling Spotlightâ„¢ indexing is one you won’t see in manuals, optional and is being extra cautious, but I am aware of a few people having trouble seemingly because Spotlightâ„¢ – Apple’s disk indexing and search software – is trying to index the backup disk while a backup is in progress.
  • While it is possible to use the ’extra’ space on TM disks to store files, I don’t recommend this. The easiest solution is just to use the disk for only backups, but if you must also store ’loose’ files on it, I strongly suggest partitioning the disk, so that TM has it’s own partition that is only used for it’s backups and you use the other partition for loose files. If you do this, back up the files in the non-TM partition(s) to a different disk drive, i.e. not to the TM backup on the same drive. (If the drive packs in, you’ll lose both the original and the backup otherwise!) You can explicitly tell TM not to backup the other partitions on the disk, rather than rely on the default settings.
  • Don’t try delete files from the TM archive yourself in an attempt to make more room. (Unless you really know what you’re doing: TM archives may look like normal disk contents but they’re not.) Do it wrongly and you can corrupt the backup.

4.2 Weaknesses of TM

4.2.1 The backup loses finer increments over time

Time Machine is an incremental backup but using a sliding scale of increments that diminishes over time: in older the backups, the more finely grained increments are dropped.

You cannot recall every hourly increment of a file that is older than a day, every daily increment of a file older than a week, or every weekly increment of a file older than a month.

While this is suitable for most consumer use, it makes Time Machine unsuitable for files that might have at one point have a ’right’ state that subsequently regresses but is not necessarily identified as incorrect for some time.

To give a simple example, imagine you use the standard setup with hourly backups, on the hour, with the daily backups taken at 10pm. You are working on a file and just before 3pm Monday, you have a version that is good. It stays unchanged until past the hour and gets saved by TM in this state. You carry on working only to realise on the Wednesday that you introduced an error later on the Monday. TM only holds hourly increments for the past 24 hours, so there are no hourly versions for Monday anymore, only the ‘broken’ end-of-day copy from 10pm.

This applies to all data you edit, notes you write, that thesis, etc.

These types of projects need to use one of the ‘traditional’ industry-grade backup solutions that keep every incremental update for all time. An alternative I want to add – part of the reason behind this series – is the use of version control software.

Let me quickly get ahead of my series and briefly remark on these additions.

Industry-grade incremental backup solutions can keep a copy of every increment you archive for all time. One drawback to most of these applications is that it can take a long time to recover specific files at a specific point in time as the ‘trackback’ procedures are fairly complex. This discourages end users from making use of the data recovery – ideally not something you want in a backup solution. Another issue is that generally speaking they are more complex to run, as they are usually designed with system administrators running servers or larger systems in mind, rather than end users.

Using version control software, the user can define versions of their files. The version control software maintains a trail of the older versions on hard disk. Various different methods are used, but the most common is to save a record of the differences between two versions, which is more compact than saving the full files. You are then free to maintain the old versions as you wish. If the versions (differences) are stored as uniques files, TM will keep all versions. (I emphasise files to distinguish this from where the differences are stored in a relational database.) Although this was developed with software development in mind, it can be used for other applications such as writing a book or thesis. A downside is that it relies on the user intervening to mark up versions.

4.2.2 Updates to large files make for large backup sets

The ‘unit’ TM backs up are whole files. Even small changes to large files will result in a copy of the whole file being stored on the backup set. This can quickly add up if the files are very large and the small changes are frequent. Examples of large files are (relational) databases or virtual machines, both of which are better suited to being backed up other ways. Some email software stores all the emails in a single database file. (Apple’s Mail stores them as individual files.) Other large files might include videos, extensive image collections, music, etc.

There are a number of alternatives, which I’m not going to present here.

4.2.3 Running out of disk space causes TM to drop older files

If TM runs out of space on the disk it is saving files to, rather than stop backing up it keeps going by removing older incremental backups. (You only lose the oldest versions of files that have newer versions.)

This may be a useful compromise for consumers, but it is not suitable if you are intending to keep really old versions of things ’forever’. It is a useful reminder that TM is is not suited to long-term archival of data.

4.2.4 Backing up or recovering to a different model of Mac

Don’t try directly recover system files to another computer yourself (unless you know what you’re doing). Install each machines operating system, then use Setup Assistant / Migration Assistant, or Utiltities > Restore from backup after powering up from the install DVD (hold down the ‘C’ key at startup to boot off the CD/DVD drive) to move your data over. It’s painless and a very nice effort on Apple’s part.

4.2.5 Not really intended for offsite backups

This is very important.

Backups that sit next to your computer have limited value. While normal use of TM will mitigate against hard drive failure, if a fire thief, or other major event happens, you will almost certainly lose your backup along with the computer.

You want an off-site backup. Time Machine is really intended for within-office backups, to an external hard drive attached to the computer. (For most people this can just involve rotating the backup between work and home.)

4.3 Further reading

Useful descriptions of Time Machine can be found at these locations:

http://en.wikipedia.org/wiki/Time_Machine_(software)

http://superuser.com/questions/33503/are-time-machine-backups-incremental-and-is-time-machine-any-better-on-snow-leop


Other articles on Code for life:

Backups, part I

Choosing an algorithm — benchmarking bioinformatics

The roots of bioinformatics

Find transcription factor motifs in genomes better: add histone acetylation data

GoPubMed — PubMed browsing using ontologies