No Comments

Over there is the 01-02-2010. (Not that they write the date numerically much, apparently.)

Over here, we’ll wait until the 1st of February.

But this feels like cheating, with the padding of the dates and months with zeros… hence the ‘sort-of’.

This reminds me about dates in computer files and I get (yet again) to wondering if there are any scientific data screw-ups over an American seeing a British numerical-form date and vice versa… Seems hard to imagine that there haven’t been any.

Just to get it off my chest and out of the way, I have to admit I don’t like the American date system. At least with day-month-year each is part is a portion of the unit that follows it. Day of month of year is more natural to me. There, done.

The “reverse” order system (year-month-day) makes logical sense from the point converting it to a continuous number range as the elements are ordered with the most rapidly varying occurring on the right-hand side. (It’s the approach used in a lot of computer applications and is the basis of the international ISO 8601 standard.)

Wearing my computer programmer’s hat and using todays’s date as an example my preferred system for storing dates in (local) archival data files (as opposed to internal storage within a program) is: 3-JAN-2010, optionally with the dashes dropped if space is an issue.

It reads unambiguously, is easy to parse manually and is reasonably compact given you’re trying to keep it human-readable. (By manually parse, I mean without resorting to the C or POSIX notations.)

The ISO international standard (8601) reads, for today: 2010-01-03. While the leading year is a hint it’s in “descending” order, I prefer to make it explicit in archival files, e.g. 2010-JAN-03.

I’m not that fussy about which order, 3-JAN-2010 or 2010-JAN-03, as long as the month is in characters to avoid confusion. The former reads more naturally to me, so it’s the one I tend to use locally, even if this technically means I’m flouting international standards.

I’m one of these people that like (archival) data to be “self-documenting”, that is for the data to be obvious what it is just looking at the data. Ever tried looking at a data file that’s just a pile of numbers and tried to work out what the numbers actually “mean”? Especially months (or years) after you created the data, or if it’s someone else’s data.

It can also be useful for that dreaded manual debugging.

I know there are other solutions that we can argue about forever… and like always there’s specific applications that I’d do otherwise myself.

Smart readers will note that my solution favours English over other languages. I’ll excuse myself for this inexcusable non-PC horror on the grounds that English is the current language of science… (More seriously, this is meant to be a local solution.)


Other posts on bioinformatics on Code for life:

Retrospective–The mythology of bioinformatics

Bioinformatics — computing with biotechnology and molecular biology data

Computational biology: Natural history v. explanatory models