3 Comments

Backups.

Even if you hate them, don’t do without them.

It’s obvious, but it’s amazing how many people skimp on backups.

Interior of hard drive (Source: WIkimedia Commons.)

Interior of hard drive (Source: WIkimedia Commons.)

This is the first of a short series (most likely just two articles) on backups intended for end-uers.

I’d like to discuss a few of the issues involved in the hope of encouraging others to think about backups.

Unless asked, I’m not going to show the steps to making backups using particular software: there is little point in duplicating the existing pages available on-line.

The thoughts in this article apply to all operating systems. While the general thoughts in my next post are applicable to other systems, the practical advice will focus on Mac OSX. I’m not going to write about backing up ’to the cloud’.

How to learn to value backups

The at once best and worst way to learn the value of a backup is to not have one and lose data.

There’s nothing to beat it. Trust me, I know.

It’s best to learn this early, when you haven’t as much to lose and a longer time to benefit.

I learnt it as an undergraduate computer science student. My year-long project was on the university’s mainframe computer – a Burroughs, if anyone cares – that was backed up on our behalf.

Supposedly.

I managed to accidentally delete my code. This sort of thing happens at 1am when you’re furiously trying to multitask with your mind crashing towards sleep…

Somehow the central backup hadn’t backed my files up.

My ‘other’ backup – as was common then – was a ‘hard copy’, a print-out of the original. (It’d be rare to see someone with a physical print-out of their code these days. I still have a copy of that print-out somewhere in my files. I get it out about once a decade to gaze at for a few minutes.)

I got to type the entire source code back in by hand. My typing wasn’t so hot back then, either.

It was a memorable way of learning the value of backups. Conveniently without really having to entirely blame myself, too!

What is your data worth to you?

Ask yourself this: how much do you think the stuff on your computer is worth to you?

If it’s your research, it might be ’pretty much everything’.

Now ask yourself: what do you know of how it’s backed up?

If you’re at a large institution that provides an institutional backup, and my experience as a student is anything to go by, you might want to make your own copy. Just in case. (Institutional backups should be more reliable today but another copy never hurts.)

General thoughts / advice / ideas for discussion

Let’s kick off with a few thoughts on backups. You’re welcome to share your thoughts in the comments.

  • The cost of the backup solution should be considered an integral part of the cost of a computer. Backups are not an ‘extra’.
  • Backups want to be easy to do, so you don’t procrastinate and not do them, and easy to recover the old files from. The latter is important and often overlooked. Ideally making the backup wants to be as automated as possible to reduce human error.
  • Practice recovering data from backup sets. Backups in the end are only as good as your ability to get the data back. You’ll have more confidence in your backup if you practice saving, then recovering.
  • Verifying backups is a good idea. (Good backup software will give you the option of verifying the backup as been done. Independent verification is still a good idea.)
  • Most modest-sized backups today are built around either external hard disks or, to a lesser extent now, DVDs or Blu-ray disks. If you are using an external hard disk, use one with at least twice the storage of the files that you wish to backup. DVDs hold less data and are slower to write; they are more suited to ’permanent’ archives, frozen ‘snapshots’ of datasets.
  • Try keep the backup scheme simple. If it’s too confusing, you’ll make a mistake sooner or later, and with that defeat the point of the backup.
  • Understand the limitations of the backup approach you have chosen. All backup strategies have their good and bad points. In some cases these will mean some types of data are in fact not backed up, or are lost over time. (I’m going to deal with one or two of these issues in the next article.)
  • Consider backing up to more than one type of media using more than one program. This will mean making the backup scheme more complex, breaking the ‘keep it simple’ rule, but if done carefully it can help cover for issues within any one approach. (Those in large institutions where the institution already does a backup should instead understand the strengths of the institution’s approach and seek to complement them.)
  • Keep a copy off-site. A backup is no good if it’s sitting next to the computer. Why? If there is a fire or other major disaster, you’ll lose both the original (on the computer) and the backup at the same time. There is a place for quick-recovery backups to be handy, but if you do this don’t make this your only backup.
  • Refresh your backups. Most media today has a long(ish) shelf life, but no backup media lasts forever. Hard drives will fail: it’s not if, but when. Optical disks (e.g. DVDs), too, have a limited lifetime.
  • Ideally the format the backup files are in should be ‘open’ so that software other than the backup software can read it, otherwise old data formats can cause the backup to be unreadable.
  • Make sure you have a copy of the backup/restore software itself in a ‘plain vanilla’ format that can be used ‘as is’. Most operating systems these days will have this as part of the original installation DVD.

There are many, many more tips I could add, but I’m trying to offer here the wider points, leaving specifics for discussion or a later article.

File types

When thinking about a backup solution it can help to think about the nature of what is being backed up. My backup solutions are more complex than most users will want, but I find it helps to think about the items as being in one of these five categories:

  • Operating system and settings
  • Application software
  • Work files :-
    • Smaller files that change frequently
    • Large files or collections of files that change a little, infrequently
    • Large datasets that remain ‘fixed’

While you can simply back them all up, thinking about what you have will help you prioritise and to understand the different requirements of each data type. Different types of files favour different backup strategies. (I may return to these classes of files in my next article.)

If you haven’t spent much time thinking about your backups before, I hope these thoughts encourage you too spend a little thinking it through. There’s not much worse than losing something you value forever.

(Revised with minor editing updates.)


Other articles in Code for life:

Choosing an algorithm – benchmarking bioinformatics

Thoughts towards a human brain neural connection map

Coiling bacterial DNA

GoPubMed – PubMed browsing using ontologies

The roots of bioinformatics