I wrote last year about how, once you adjust for mismatches between teams and where the game has been played, there wasn’t much evidence in the data for a general trend towards increasing first-innings scores. Taking all games from the start of the English 2002 season through to the end of the World Cup, controlling for team ability, home-field advantage, and the ground being used, first innings scores since Oct 2012 are only 12 runs higher on average than in the 10 years before Oct 2012. (For data geeks, I describe the exact model at the bottom.)
The following graph illustrates the lack of a trend. The small red dots are the difference between the first-innings score and a prediction based on the team batting, the team bowling, which team (if any) was playing at home, the ground at which the game was played, and allowing for a 12-run premium for the current rules. The solid red dots are a 25-game smoothed moving average, to take out some of the random variation and make any trends clearer. Although there appears to be a bit of an upward trend over the period since October 2012, scores by the end of this period were still only 28 runs higher than in the 2002-2012 period, suggesting that 328 is the new 300!
Editor’s note: Originally published at Offsetting on 19 June, so before the weekend’s test.
But now look at the four blue dots. These are the out-of-sample prediction errors for the first four ODIs between England and NZ in the current series. These predictions take into account England’s and New Zealand’s recent (since Oct 2012) batting and bowling form, England’s home-field advantage, and how high scores typically are at those four grounds. The prediction, actual score, and prediction error are as follows:
Ground
|
Predicted Score
|
Actual Score
|
Prediction Error
|
Edgbaston
|
235
|
408
|
173
|
The Oval
|
256
|
398
|
142
|
The Rose Bowl
|
269
|
302
|
33
|
Trent Bridge
|
232
|
349
|
117
|
Chester-le-Street
|
225 (Eng) 228 (NZ)
|
?
|
?
|
The point here is that rather than there having been a world-wide trend in the past few years that England have only now come to grips with; the current series has been extraordinary in every respect even in comparison to recent history. So what is going on? I can think of four hypotheses:
- I have made a massive coding error in my database.
- There has been a structural break in conditions: The four English groundsmen have produced very different pitches than in the past, ones much more favourable to high scores.
- There has been a structural break in team quality: Both NZ and England have better batting and/or worse bowling in this series than they had in the recent past.
- These four games have been black-swan events; and things will return to normal soon.
- There has been a strategic mindset shift in both New Zealand and England.
When things look extraordinary, coding errors are always a good bet, and I wouldn’t rule this out, but the raw predictions don’t look too far out from my own intuition, so I don’t think this is the problem here.
I can’t comment on whether conditions were very different from usual in the four games so far, but I haven’t seen any commentary from England suggesting that the groundsmen have been producing untypical pitches, so I suspect hypothesis 2 is not the right one.
There is probably some truth to hypothesis 3. I don’t think the batting is too much different, but the bowling is quite possibly weaker. New Zealand have lost Vettori, Anderson and Milne from their World Cup bowling line-up, and have had Southee and Boult together for only one of the four matches. England have rested Anderson and Broad. Even so, I would put my money on the final two hypotheses explaining most of the data.
The idea that teams are not aggressive enough when batting is something Scott Brooker and I have been saying for a long time. Back when he was writing his thesis, Scott experimented with what an average team would be able to achieve if they applied the optimal level of aggression. Based on the strike rates and dismissal rates that we can observe batsmen having in different game situations (e.g. conservative batting in the middle overs versus aggression in the death overs), he constructed a set of frontiers describing the trade-off between risk and return for typical batsmen in positions #1-#11, and then simulated optimal behaviour. He found that scores could be roughly 30 runs higher if batting teams were more aggressive, but that there would be more variance in scores and a higher probability of not batting out the overs. This was based on data from before 2007. It is likely that under the new rules, the value of extra aggression is even higher.
What I think we are seeing in the current series is two teams who keep pushing the boundaries of this approach, forcing the other team to react in kind, and so all kinds of previously unrealised potential that has existed for a while is now being revealed. Contrary to the conventional wisdom, I don’t think this has been New Zealand’s approach before now. Rather, I think they have emphasised retaining wickets during the middle overs in preparation for an all-out assault in the final 10. New Zealand’s famous aggression in the World Cup was mostly seen in its approach to bowling, putting an emphasis on wicket taking rather than containment.
If I am right, that there has been a mind-set change for both teams in the current series, I am mindful that Scott’s conclusion was that the additional 30 runs on average would come alongside a big increase in variance. This brings me back to the black-swan hypothesis. We have seen scores more than 100 runs in excess of what recent form would have predicted on average. It is likely that in each game, there was a degree of luck. Batsmen got away with taking risks on these occasions, but we could just as easily have seen scores that were quite low. Even allowing for the fact that dropped catches are more likely when batsmen are hitting the ball hard, the catching in the current series does seem to have been below par. Realistically, 370 might be the new 300, but equally 180 the new 200.
Method:
My prediction model was based on a database of all non-rain-affected ODIs involving the top-8 countries since May 2002, using only games played on grounds where there were at least 10 matches played (but also including Chester le Street, as that is the venue for the final ODI between England and NZ).
The model was an OLS regression of first innings score on a dummy variables for the batting-team country, dummy variables for the bowling-team country, a dummy variable for each of the 53 grounds, a dummy variable for when the batting team was playing at home, and for when the bowling team was playing at home. Finally, I added a dummy variable for matches played since Oct 2012, and more dummy variables for this recent-era interacted with the batting and bowling team dummies.
These interaction teams mean that only post-Oct-2012 data is used to determine the effect of team ability on scores; the only reason for pooling the data with the pre-Oct-2012 era is to provide enough data to estimate ground effects. Essentially, the model is assuming that the relative impact on scores of being at a particular ground and the relative impact of home-field advantage has not changed from before the Oct 2012 to after.
In the 25-game moving average shown in the chart above, the data is broken around the structural break of Oct 2012, so that the smoothed line before the break is not influenced by games played after the break and vice versa.