Teaching bioinformatics at high school

By Grant Jacobs 01/11/2011

As many of my readers will be aware, my work area and research interests are in computational biology[1] – the wider field is better known as bioinformatics.

Biology has long been a core subject at high school.

Computer science is now well established too.

Bioinformatics takes computer science and applies it to biological questions. Given that the basic principles are straight-forward and can be allied to basic biology, it seems reasonable that it could be taught at a high school level too.

I can imagine doing this myself, not that I’ll ever be in the position of doing it.

Two articles in PLoS Computational Biology address teaching bioinformatics at high school:

Both are straight-forward reads.

I particularly liked the first in that it tries to pick apart what students were struggling with, or not, and why.

A theme that emerges from this, that resonates with me, is a need for the focus to be on concepts underlying the tools, not the tools per se.These are biological concepts at heart. Get these across and I can imagine that the conflicted paraphrased quotes of students the authors offer– ’I’m interested in biology, not computer science. Why should I care about this stuff?’ or ’But as long as these algorithms work, why do I need to understand how they work? I can just use them.’–would ease, as the authors point out.

I wonder a little if BLAST is the best tool to be presenting for illustrating the basic biological concepts underlying sequence comparison for the first-time.[2] The tools themselves change over time, but these concepts change more slowly. (I’d be interested in lecturers’ experiences here.) Perhaps it’s better to chose the methods and approach that best illustrate these concepts, rather than the one that may or may not have the current mind-share–? Less trendy for students, admittedly!

As an alternative (and fairly traditional) approach I can imagine by starting with looking at a protein family, say one known to be linked to a genetic disease, as a lead for later class studies. I’d start by showing what a protein looks like and how the active residues are arranged in space to act on the substrate, just in general terms–3-D graphics is a good attention-getter!–then show a multiple sequence alignment showing how those few active residues are spaced in the linear sequence. You can expand on this to show the seequence-structure-function relationship in broad brush strokes. Next you could show how searching for just those few residues as a pattern will tend to find mostly members of that protein family. (Add to this residues typical of an overall fold if you want to challenge better students.) Having established that a key concept is looking for the few residues that characterise an activity on a fold, introduce the older pairwise sequence alignment methods and how they search for a best path.

This seems like a lot of work, but it lays out the underlying biological concepts and I think if done clearly should be approachable.[3] The authors do briefly mention protein structure and function under future directions. Personally I’ve always felt a basic understanding of molecular structures and their evolution ought to precede looking at sequence comparisons or phylogenetics.

I appreciate the aim of teaching the underlying thinking happening in the articles. Presenting algorithms ‘blindly’ as tools will make them seem that it ‘like magic’ and that ‘the computer is always right’.[4] The basic concepts behind the algorithms are quite simple. It’s the details of the implementation–both the exact maths, etc., and the coding–that are complex; the overall concepts are generally fairly simple. I also like that they link the work to larger things like disease and so on.

The obvious catch to spending time on the algorithms, of course, is that it might mean it’s hard to both show the underlying game that’s being played, and relate it to the larger scene of genomes, disease, etc. The larger picture is important too, and should be linked to their overall curriculum.

Whatever your views, these articles are good food for thought.


1. Let’s not bother with the distinction between the two here, it doesn’t matter for this article. The curious can track back to articles I’ve written distinguishing the two about two years ago.

2. The full reasons I am wary of BLAST for this and other purposes would take a long post and probably draw criticism. Maybe another day.

3. If I had more time I’d lay this out as notes covering this. What it mostly needs are figures illustrating the concept. Once seen visually, it’s straight-forward. Described in words, as I have here, you really need to already know what it is that would be shown.

4. I someones worry that some members of the biological research community doesn’t always fully appreciate these either!

References ResearchBlogging.org

Gallagher, S., Coon, W., Donley, K., Scott, A., & Goldberg, D. (2011). A First Attempt to Bring Computational Biology into Advanced High School Biology Classrooms PLoS Computational Biology, 7 (10) DOI: 10.1371/journal.pcbi.1002244

Form, D., & Lewitter, F. (2011). Ten Simple Rules for Teaching Bioinformatics at the High School Level PLoS Computational Biology, 7 (10) DOI: 10.1371/journal.pcbi.1002243

Other articles in Code for life:

You still have to know how the tools work

In the near future: genome sequencing for the masses

Bioinformatics — QC, reproducible, statistical and sequence-oriented

Not Darwin’s tree of life

Developing bioinformatics methods: by who and how

Retrospective–The mythology of bioinformatics

0 Responses to “Teaching bioinformatics at high school”

  • Grant, I think that bioinformatic should be introduced at the level where it is currently being taught at the moment, ie, in tertiary institutions. My opinion is that, kids probably need to familiarize themselves with fundamental mathematics/statistics and at best familiarize with algorithms/codings before they start on bioinfomatic which they can apply their math/stats knowledge.

    If the subject is introduced without any quantitative component into it, then I think it would be good. It may act as a reverse catalyst. If kids are not interested in coding/maths/stats first, then doing bioinformatics at high school may drive them to take coding/maths/stats seriously so as to be able to do bioinformatic further up at advanced level.

  • Falafulu Fisi,

    My idea was to focus on the underlying stuff, which is really biological concepts, not bioinformatics in the ‘true’ analytical sense that scientists use.

    and at best familiarize with algorithms/codings before they start

    Perhaps you’re confusing using and developing? Research biologists know little about algorithms at a coding level, yet use bioinformatics. Certainly there is no need to learn coding itself to use the tools. Understanding the algorithms at a conceptual level is essential for their proper use, but for initial teaching I think it’s best that people understand the basic evolutionary & molecular biological concepts these are built on first – hence my focus. Those can be taught without much quantitative work if necessary, but putting numbers to it will show they are more than hand-waving or hearsay.

    (What’s not easy to see from my brief thoughts are that you can do a lot visually; I didn’t have time to put together illustrations to make that clear.)

    I like how these authors are trying to show how bioinformatics fits into the larger framework of modern biology and, as you say, it may give the kids something to aspire to.

  • Further to my comment above, I’m currently coaching 3 primary school kids in the evening in maths. One of them who is a year 6 at primary school has just started his exams for year-13 pure maths CIE (Cambridge International Exam), which he’s taking 2 papers this year. His first exam that he sat was Friday last week and his 2nd one is on 17th Nov.

    The other 2 kids are not doing any CIE this year but they’re being prepared for next year (year 11 and year 12 CIE maths). He’s interested in physics/engineering & electronics (may be robotics), since he loves his lego which he wants to build something similar but as autonomous machines.

    I recommended his parents at the beginning of this year to buy him a student copy of the Maple computer algebra software which can definitely help him in his math learning and so they bought him one.

    At this stage, he barely have limited knowledge of where the concepts are being applied in the real work, but there is no doubt that as he progressed, he will learn more of where thos concepts are being applied. One of his CIE pure math paper (CIE P2) covers 2 familiar topics to mathematicians who are frequent readers here on this blog is “trapezium” or “trapezoid” numerical integration. It is a numerical technique that is used to find a numerical approximation to an equation that a closed-form solution can’t be found. The other topic is the “fixed-point” numerical iteration to finding roots of an equation. Eg: solve :

    x^2 + x*sin(x) – 4 = 0

    re-arrange the above into a fixed-point form as 3 different equations shown below:

    #1) x = sqrt(4 – x*sin(x))

    and its iteration form is:

    x(n+1) = sqrt(4 – x(n)*sing(x(n)))

    given an initial starting value of say , x(1) = 0.5 , ie, x = 0.5 at n = 1 (first iteration), the left-hand-side, x(n+1) converges to the root and iteration must be terminated because a root is found. Convergence is reached when subsequent values of x(n+1) doesn’t change. There is a certain number of n iterations that x(n+1) is the same afterwards, indicating it is a root.

    #2) x = (4 – x^2)/sin(x)

    iteration form:

    x(n+1) = (4 – x(n)^2)/sin(x(n))

    #3) x = arcsin((4 – x^2)/x)

    iteration form:

    x(n+1) = arcsin((4 – x(n)^2)/x(n))

    He has only been using maple to do symbolic computations (differentiation, integration, solving algebraic equations, etc,..) including plottings, but never wrote any procedure (Maple’s term for algorithm/routines) until the beginning of last month. I went thru with him on how to write a Maple procedure for numerical integration using “trapezium”. Maple had already a built-in function called “trapezoid”, but the exercise was aimed at getting him to understand of how to program.

    Once he started to understand of how to write a program in Maple (his first one the “trapezium”) I then encouraged him to write a root-finder procedure (ie, root finding via iteration). He wrote his first one and it worked (except some minor conditions that some extra lines of coding had to be added, such as the detection if it is converging or diverging, if diverges, then stop the iteration after a pre-set number , say (n=100) otherwise, the procedure will keep going forever without being stopped. If converges, then detect when the roots is found by adding some tolerance say (tol = 1E-6) if differences in values of subsequent iterations is equal to or small than tolerance otherwise, the iteration will never stopped.

    Anyway, the whole point of my post is that, even the “math-kid” above had no idea of what sort of computations or mathematics that he’s learning at the moment for his CIE math papers, he is being introduced to concepts that will enable him to be a self-driven and apply those concepts to a varieties of disciplines in the future. This is why I believe that kids should be taught maths/stats first, then bio-informatics, physics, engineering or any of those related disciplines will follow them & their interests (considering that they already familiar with the fundamental concepts).

  • I haven’t even time to read this yet, but aren’t you now arguing against your own earlier thought?: “If the subject is introduced without any quantitative component into it, then I think it would be good. It may act as a reverse catalyst. If kids are not interested in coding/maths/stats first, then doing bioinformatics at high school may drive them to take coding/maths/stats seriously so as to be able to do bioinformatic further up at advanced level.”

    Plenty of people prefer to see the objective before the tools lest the tools (e.g. maths) be abstract. I can’t see why it’s ‘wrong’ to present it this way if it’s possible to.

    Biology has a lot of concepts that are first descriptive (i.e. derived from observation) that are only later cast as mathematical models – they can be naturally approached that way.

    I can’t help thinking you may still be thinking of those who would develop bioinformatics methods, as opposed to use them. Or perhaps that your background is mathematical with (relatively) little biology.

  • Thanks for this! I will pass it along to my students in the Masters of Science Teaching program course I taught on the Human Genome and Bioinformatics!!!

    I am paying “blog calls” to each @scio12 attendee to say “Hi” and give your blog a shoutout on twitter (I’m @sciencegoddess). I look forward to meeting you in a few weeks!

  • Interesting thoughts about what you really want in schools when it comes to bioinformatics. Bioinformatics can help you to get some notion about the protein you’re interested in, before even starting to experiment.

    One of the most powerful tools to get some kind of notion about the protein you’re interested in is the hydrophobic moment plot as developed by Eisenberg and co-workers. This approach allows you to get insight in the particular regions in the primary sequence of the protein (whether they are surface seeking and so on).
    See for details:
    Eisenberg, D., Schwarz, E., Komaromy, M. and Wall, R.: Analysis of membrane and surface protein sequences with the hydrophobic moment plot. J. Mol. Biol., 15, 125-42 (1984).

    One disadvantage of this approach is the fact that in most papers using this approach there is no tool presented by which you can repeat or utilize this yourself (often there is a specialized person in maths or computer programming involved who developed something or some kind of expensive commercial program is used). To my knowledge there is only one freely avalabe program which gives you some opportunities to perform a hydrophobic moment analysis:
    The HydroMCal program (see web-site http://www.bbcm.univ.trieste.it/~tossi/HydroCalc/HydroMCalc.html#sequence).
    Recently an interesting paper appeared which allows you to perform the hydrophobic moment analysis easily:
    Keller, RCA (2011b) New user-friendly approach to obtain an Eisenberg plot and its use as a practical tool in protein sequence analysis. Int J Mol Sci 12: 5577-5591
    For me personally I find this a very useful tool and I am planning to introduce this in my lessons.

  • Welcome Erica 🙂

    “in most papers using this approach there is no tool presented by which you can repeat or utilize this yourself”

    Funny you should mention this. I wrote a tool to do much simpler hydrophobicity plots as a Ph.D. student but never published it. It produced Postscript illustrations as output, which has ‘the’ vector graphical output of the day. (PostScript in many ways is a precursor to PDF, but a programming language with powerful plotting outputs rather than a file format.)

    Reading your comment it makes me think I could have reprised this for others’ use (re-written it from scratch more likely in reality given the age of the code) but, then again, there are lots of things I could do like that if it weren’t that I need to make/find an income!

    I agree that bioinformatics/computational biology can be used to gain an understanding of molecular structure (and evolution and function…); there’s a fair bit you can do!

    (Now that your first comment has been approved, you can comment at will – first-time are screened as a measure against spam.)

  • Somewhat appropriate to this article, it’s come to my attention that Kathy Reich’s young adult book, Virals, has the characters’ class assignment to be bioinformatics:

    “Our project was to compare human DNA to that of several animal species to determine which are our closest relatives.”

    One student takes on the comparing the human cystic fibrous gene to chimps, gorillas and orangutans. A second student tackles “Leptin counts for cows, dogs and horses”. The lead character (Tory) gets to “handle the bone-growth protein sequences” including for pigs, rabbits and sheep.

    Hmm. A homework blog post?