The best place to begin any quest is with a graph. Here is a graph showing all 104 of Chris’s innings in chronological order. On it is represented the scores when he was Out (red lines) and the scores when he was Not Out (blue lines). Funnily enough he was out and not out exactly 52 times each. We can see immediately that the peak of his batting performance was a score of 12 Not Out which occurred approximately half-way through his career. His best form seems to be innings 30 to 34 where he went undefeated in 5 successive innings scoring 17 runs. On the other hand he had several bad runs where he was Out for zero (red marks below the zero line). One of the interesting things is that his first 4 innings may have given a false impression of his batting prowess. In his first innings he scored 7, well above his eventual average of 2.36. In his 2nd and 4th innings he was 0 Not Out. In between he was 5 Not Out. This coincided with his peak average every, 12 (orange triangles). This allows us to note an important feature of statistics. Let us pretend for a moment the average of 2.36 was “built-in” to Chris Martin from the beginning. This means that it was inevitable that after many innings he would end up with that average. But it is not inevitable that any one innings taken at random is equal to that mean. Importantly, with only a few samples (ie the first few innings) the average at that point can be a long way from the “real” average. This is a phenomenon caused by sampling from a larger population. It is why we have to be very cautious with conclusions drawn from a small sample population. For example, if General Practitioners throughout the country see on average 5 new leukemia cases a year, but we sample only three General Practitioners from Christchurch who saw 8, 9 and 14 then we would be quite wrong to conclude that Christchurch has a higher average leukemia rate than other regions. We need a much larger sample from Christchurch to get a reasonable estimate of Christchurch’s average. There are statistical techniques for deciding what proportion of General Practitioners should be sampled and what the uncertainty is in the average we arrive at. Graphs also help… we can see with Chris that after only 10% of his innings he is within 1 of his average and stays that way throughout the rest of his career (orange triangles).

That’s it for today. More on the legend of Chris Martin in the weeks ahead.

Tagged: batting, Chris Martin, cricinfo, Cricket, ESPN CricInfo, graphs, Statistics

]]>Not just for scientists, either. Graphs are used ubiquitously, after all.

On-line there is some excellent material on presenting data. One example is a handout for a presentation^{[1]} *Communicating data clearly* by Naomi Robbins, who writes *Effective Graphs* at Forbes blogs. (Some of us here take down poor science coverage in the media. Among other things Naomi writes, she takes down poor graphs in the media!)

Many forms of presentation fail to convey the data well; some are simply confusing. This handout is limited to key points, but I hope that it might encourage readers to think about their data presentation more carefully.

Those familiar with the topic will know that part of the issue is how we perceive visual elements such as angles and areas. In this way presentation of graphical information intersects with cognitive neuroscience – another example of practical applications from what might otherwise be perceived as ‘blue skies’ science. As you might expect, Naomi’s handout includes some of these aspects, too.

**Footnotes**

Another short post while I continue to tackle the tax return.

1. As part of NYC data week. Computational biologists might recognise this as including O’Reilly Media’s Strata + Hadoop World Conference.

**Other articles on Code for life:**

**On vetting TED(x) events – a suggestion**

**Thoughts on scientific abstracts also a science writing check-list**

**One example of why all those genomes from different species are useful to biologists**