Education regressions

By Eric Crampton 26/09/2012

Luis at Quantum Forest has been doing some great work with the schools data. I’m picking up on it with a few regressions. Raw performance data for the schools isn’t all that instructive as schools have very different raw materials with which to work. But it would be nice to know how well a school does given the decile and ethnic mix of students coming into the classroom. So let’s check that.

I’m using Luis’s data here, modified slightly to work in Stata: I replaced the NA cells with . so that Stata would read things as numeric variables rather than strings. My do file and dta file are up at Dropbox.

Like Luis, I started by generating a variable giving the proportion of students either meeting or exceeding the standard in each of reading, writing, and maths. I then ran a few simple linear regressions with analytic weights equal to the total school roll: the dependent variable is an average and the schools average over different numbers of students.

Covariates are decile, decile squared, the number of students per full time teacher equivalent, proportion of each of {Maori, Pacific, Asian, International, Melanesian [MELAA, which could be an acronym for Middle-East, Latin America and Africa, says Kiwi Poll Guy in comments; he’s likely right], Other} students (European dropped), indicator variables for each of {minor urban area (town), secondary urban area (suburb), main urban area (rural dropped)}, indicator variables for single sex boys and girls schools (co-ed dropped), an indicator variable for state schools (integrated schools dropped), an indicator for boarding schools, and indicator variables for each of the main types of school {composite, contributing, intermediate, secondary (full primary dropped)}.

Results are in Table 1, below. But first, a caveats. As best I understand things, these grades are not moderated. So any effects here could be saying either that some schools do a better job in teaching, or that some schools engage in grade inflation.

Table 1: Full sample: Reading, Writing, and Math

(1) (2) (3)
Reading Writing Math

decile 0.0485*** 0.0363*** 0.0437***
(6.53) (3.67) (5.27)
Decile squared -0.00224*** -0.00110 -0.00195***
(-4.39) (-1.62) (-3.41)
Students per teacher 0.00282* 0.00362* 0.00215
(2.41) (2.35) (1.65)
Proportion of Maori students -0.0664* -0.0848* -0.112***
(-2.20) (-2.13) (-3.34)
Proportion of Pacific students -0.115*** -0.114** -0.0676
(-3.52) (-2.63) (-1.86)
Proportion of Asian students -0.0796** -0.0787* 0.00448
(-2.63) (-1.97) (0.13)
Proportion of International students 0.290 0.863** 1.510***
(1.23) (2.79) (5.76)
Proportion of MELAA students 0.0952 -0.0934 -0.0722
(0.61) (-0.45) (-0.41)
Proportion of students Other ethnicity 0.428 1.043* 1.040**
(1.35) (2.50) (2.92)
Minor Urban Area (Rural dropped) 0.00817 -0.0104 -0.0257
(0.62) (-0.60) (-1.76)
Secondary Urban Area (Rural dropped) -0.0189 -0.0283 -0.0416**
(-1.33) (-1.52) (-2.63)
Major Urban Area (Rural dropped) 0.00890 -0.00366 -0.0186
(0.80) (-0.25) (-1.50)
Boys school (co-ed dropped) 0.101*** 0.118*** 0.205***
(3.90) (3.48) (6.87)
Girls school (co-ed dropped) 0.125*** 0.181*** 0.142***
(5.60) (6.16) (5.44)
State school (integrated schools dropped) -0.0246* -0.0594*** -0.0219
(-2.49) (-4.58) (-1.96)
Composite (Year 1-15) (Full Primary dropped) -0.0221 -0.0566* -0.0384
(-1.14) (-2.23) (-1.79)
Contributing (Year 1-6) (Full Primary dropped) 0.0127 0.0345*** 0.0338***
(1.77) (3.64) (4.23)
Intermediate (year 7 and 8) (Full Primary dropped) -0.0673*** -0.0974*** -0.107***
(-6.07) (-6.64) (-8.65)
Secondary (Year 7-15) (Full Primary dropped) -0.0939*** -0.110*** -0.158***
(-6.78) (-6.06) (-10.15)
Boarding school -0.00755 -0.0716** -0.0475*
(-0.39) (-2.79) (-2.01)
Constant 0.574*** 0.527*** 0.569***
(15.48) (10.75) (13.78)

Observations 1006 996 1000
Adjusted R2 0.528 0.467 0.532

t statistics in parentheses

* p < 0.05, ** p < 0.01, *** p < 0.001

Decile matters greatly. All else equal, a school one decile higher has about a four percentage point increase in pass rates. But, decile matters at a decreasing rate: moving from Decile 2 to Decile 3 correlates with a 3.3 percentage point increase in maths pass rates while moving from Decile 8 to Decile 9 only improves pass rates by one percentage point.

Class size matters: schools with more students per teacher have higher pass rates. I suspect reverse causation here: for a fixed budget, those schools that are able to run larger classes are likely those that have fewer discipline problems and so are able to put those resources to other uses.

Ethnicity matters. A standard deviation increase in the proportion of Maori students reduces aggregate pass rates by 1.3 percentage points in reading and 2.2 percentage points in math. Similar trends exist for Pacific Island student ratios. I’d be pretty cautious in interpreting this one: if you run things decile-by-decile, the effects mostly disappear. The biggest negative effect seems to hold in high decile schools, but by the time you get to Decile 10 schools, the median school has only 5.9% Maori students. Results then may be a bit sensitive to a few outliers on the right hand side. Like Luis, I’ll refrain from doing much more until the official results come out.

Single sex schools seem to do well; boarding schools seem to do poorly.

I generated residuals from each of the three specifications above. The residual tells us whether a school had a higher or lower pass rate than we would have expected given its characteristics. This either tells us how good (or bad) the school is at teaching, or how good (or bad) it is at grade inflation. Without external moderation, it’s hard to tell. The residuals from the three specifications correlate strongly with each other: schools that are good (or grade inflate) tend to do so across the board. The lowest pairwise correlation was 0.57; the highest was 0.63. I averaged the residuals to get a composite score. A high residual means that the school’s actual pass rate was higher than what we would have expected given its characteristics.

I’m not confident enough in the model to put up my own league table of residuals. But I will put this up. This is a scatterplot of the residuals showing just how much school performance varies after we have corrected for decile, ethnicity, and everything else in the above model. That can point to its being a bad model, the underlying data being bad, strong differences in teaching quality across schools, or a combination of all three.

There are decile 1 schools providing pass rates twenty percentage points or more above what we’d expect, given their characteristics (that’s the 0.2 number on the y-axis); there is one decile ten school providing pass rates more than twenty percentage points below what we would expect given its characteristics. Differences in school performance simply do not come down only to decile. Decile’s the most important thing. But differences in performance among schools of the same decile by definition have to be about something other than decile. I can’t tell from this data whether it’s differences in stat-juking, differences in unobserved characteristics of entering students, differences in school pedagogy, or something else. But there’s something here that bears explaining.

0 Responses to “Education regressions”

  • I was wondering if you could perhaps explain things a little more for those of us who are not stats gurus. I understand that results in the table without asterisks are not significant. Those with asterisks are significant, with the more asterisks the better (1 asterisk => less than 1 in 20 chance of such a result occurring at random, 2 asterisks => 1 in 100, 3 asterisks => 1 in 1000). However, I do not understand what the actual values are and how to interpret them. The bigger the number in brackets the better ?

  • The coefficients on continuous variables tell you how many percentage points (in decimal terms: 100% is 1.0) you expect things to rise with a unit increase in the continuous variable, like decile (but note that you have to adjust for the squared term on that one as well). The coefficients on indicator variables tell you the effect of “switching on” the indicator. So Intermediate schools have pass rates 6.7 percentage points below Full Primary schools, all else equal.

    The number in the brackets is the t-statistic, which tells you how confident you can be that the coefficient isn’t really zero. The asterisks tell you when that probability is less than 5%.