As many of my readers will be aware, my work area and research interests are in computational biology[1] – the wider field is better known as bioinformatics.

Biology has long been a core subject at high school.

Computer science is now well established too.

Bioinformatics takes computer science and applies it to biological questions. Given that the basic principles are straight-forward and can be allied to basic biology, it seems reasonable that it could be taught at a high school level too.

I can imagine doing this myself, not that I’ll ever be in the position of doing it.

Two articles in PLoS Computational Biology address teaching bioinformatics at high school:

Both are straight-forward reads.

I particularly liked the first in that it tries to pick apart what students were struggling with, or not, and why.

A theme that emerges from this, that resonates with me, is a need for the focus to be on concepts underlying the tools, not the tools per se.These are biological concepts at heart. Get these across and I can imagine that the conflicted paraphrased quotes of students the authors offer– ’I’m interested in biology, not computer science. Why should I care about this stuff?’ or ’But as long as these algorithms work, why do I need to understand how they work? I can just use them.’–would ease, as the authors point out.

I wonder a little if BLAST is the best tool to be presenting for illustrating the basic biological concepts underlying sequence comparison for the first-time.[2] The tools themselves change over time, but these concepts change more slowly. (I’d be interested in lecturers’ experiences here.) Perhaps it’s better to chose the methods and approach that best illustrate these concepts, rather than the one that may or may not have the current mind-share–? Less trendy for students, admittedly!

As an alternative (and fairly traditional) approach I can imagine by starting with looking at a protein family, say one known to be linked to a genetic disease, as a lead for later class studies. I’d start by showing what a protein looks like and how the active residues are arranged in space to act on the substrate, just in general terms–3-D graphics is a good attention-getter!–then show a multiple sequence alignment showing how those few active residues are spaced in the linear sequence. You can expand on this to show the seequence-structure-function relationship in broad brush strokes. Next you could show how searching for just those few residues as a pattern will tend to find mostly members of that protein family. (Add to this residues typical of an overall fold if you want to challenge better students.) Having established that a key concept is looking for the few residues that characterise an activity on a fold, introduce the older pairwise sequence alignment methods and how they search for a best path.

This seems like a lot of work, but it lays out the underlying biological concepts and I think if done clearly should be approachable.[3] The authors do briefly mention protein structure and function under future directions. Personally I’ve always felt a basic understanding of molecular structures and their evolution ought to precede looking at sequence comparisons or phylogenetics.

I appreciate the aim of teaching the underlying thinking happening in the articles. Presenting algorithms ‘blindly’ as tools will make them seem that it ‘like magic’ and that ‘the computer is always right’.[4] The basic concepts behind the algorithms are quite simple. It’s the details of the implementation–both the exact maths, etc., and the coding–that are complex; the overall concepts are generally fairly simple. I also like that they link the work to larger things like disease and so on.

The obvious catch to spending time on the algorithms, of course, is that it might mean it’s hard to both show the underlying game that’s being played, and relate it to the larger scene of genomes, disease, etc. The larger picture is important too, and should be linked to their overall curriculum.

Whatever your views, these articles are good food for thought.


1. Let’s not bother with the distinction between the two here, it doesn’t matter for this article. The curious can track back to articles I’ve written distinguishing the two about two years ago.

2. The full reasons I am wary of BLAST for this and other purposes would take a long post and probably draw criticism. Maybe another day.

3. If I had more time I’d lay this out as notes covering this. What it mostly needs are figures illustrating the concept. Once seen visually, it’s straight-forward. Described in words, as I have here, you really need to already know what it is that would be shown.

4. I someones worry that some members of the biological research community doesn’t always fully appreciate these either!

References ResearchBlogging.org

Gallagher, S., Coon, W., Donley, K., Scott, A., & Goldberg, D. (2011). A First Attempt to Bring Computational Biology into Advanced High School Biology Classrooms PLoS Computational Biology, 7 (10) DOI: 10.1371/journal.pcbi.1002244

Form, D., & Lewitter, F. (2011). Ten Simple Rules for Teaching Bioinformatics at the High School Level PLoS Computational Biology, 7 (10) DOI: 10.1371/journal.pcbi.1002243

Other articles in Code for life:

You still have to know how the tools work

In the near future: genome sequencing for the masses

Bioinformatics — QC, reproducible, statistical and sequence-oriented

Not Darwin’s tree of life

Developing bioinformatics methods: by who and how

Retrospective–The mythology of bioinformatics