I’m using Luis’s data here, modified slightly to work in Stata: I replaced the NA cells with . so that Stata would read things as numeric variables rather than strings. My do file and dta file are up at Dropbox.
Like Luis, I started by generating a variable giving the proportion of students either meeting or exceeding the standard in each of reading, writing, and maths. I then ran a few simple linear regressions with analytic weights equal to the total school roll: the dependent variable is an average and the schools average over different numbers of students.
Covariates are decile, decile squared, the number of students per full time teacher equivalent, proportion of each of {Maori, Pacific, Asian, International, Melanesian [MELAA, which could be an acronym for MiddleEast, Latin America and Africa, says Kiwi Poll Guy in comments; he’s likely right], Other} students (European dropped), indicator variables for each of {minor urban area (town), secondary urban area (suburb), main urban area (rural dropped)}, indicator variables for single sex boys and girls schools (coed dropped), an indicator variable for state schools (integrated schools dropped), an indicator for boarding schools, and indicator variables for each of the main types of school {composite, contributing, intermediate, secondary (full primary dropped)}.
Results are in Table 1, below. But first, a caveats. As best I understand things, these grades are not moderated. So any effects here could be saying either that some schools do a better job in teaching, or that some schools engage in grade inflation.
Table 1: Full sample: Reading, Writing, and Math


(1)  (2)  (3)  
Reading  Writing  Math  


decile  0.0485^{***}  0.0363^{***}  0.0437^{***} 
(6.53)  (3.67)  (5.27)  
Decile squared  0.00224^{***}  0.00110  0.00195^{***} 
(4.39)  (1.62)  (3.41)  
Students per teacher  0.00282^{*}  0.00362^{*}  0.00215 
(2.41)  (2.35)  (1.65)  
Proportion of Maori students  0.0664^{*}  0.0848^{*}  0.112^{***} 
(2.20)  (2.13)  (3.34)  
Proportion of Pacific students  0.115^{***}  0.114^{**}  0.0676 
(3.52)  (2.63)  (1.86)  
Proportion of Asian students  0.0796^{**}  0.0787^{*}  0.00448 
(2.63)  (1.97)  (0.13)  
Proportion of International students  0.290  0.863^{**}  1.510^{***} 
(1.23)  (2.79)  (5.76)  
Proportion of MELAA students  0.0952  0.0934  0.0722 
(0.61)  (0.45)  (0.41)  
Proportion of students Other ethnicity  0.428  1.043^{*}  1.040^{**} 
(1.35)  (2.50)  (2.92)  
Minor Urban Area (Rural dropped)  0.00817  0.0104  0.0257 
(0.62)  (0.60)  (1.76)  
Secondary Urban Area (Rural dropped)  0.0189  0.0283  0.0416^{**} 
(1.33)  (1.52)  (2.63)  
Major Urban Area (Rural dropped)  0.00890  0.00366  0.0186 
(0.80)  (0.25)  (1.50)  
Boys school (coed dropped)  0.101^{***}  0.118^{***}  0.205^{***} 
(3.90)  (3.48)  (6.87)  
Girls school (coed dropped)  0.125^{***}  0.181^{***}  0.142^{***} 
(5.60)  (6.16)  (5.44)  
State school (integrated schools dropped)  0.0246^{*}  0.0594^{***}  0.0219 
(2.49)  (4.58)  (1.96)  
Composite (Year 115) (Full Primary dropped)  0.0221  0.0566^{*}  0.0384 
(1.14)  (2.23)  (1.79)  
Contributing (Year 16) (Full Primary dropped)  0.0127  0.0345^{***}  0.0338^{***} 
(1.77)  (3.64)  (4.23)  
Intermediate (year 7 and 8) (Full Primary dropped)  0.0673^{***}  0.0974^{***}  0.107^{***} 
(6.07)  (6.64)  (8.65)  
Secondary (Year 715) (Full Primary dropped)  0.0939^{***}  0.110^{***}  0.158^{***} 
(6.78)  (6.06)  (10.15)  
Boarding school  0.00755  0.0716^{**}  0.0475^{*} 
(0.39)  (2.79)  (2.01)  
Constant  0.574^{***}  0.527^{***}  0.569^{***} 
(15.48)  (10.75)  (13.78)  


Observations  1006  996  1000 
Adjusted R^{2}  0.528  0.467  0.532 


t statistics in parentheses
^{*} p < 0.05, ^{**} p < 0.01, ^{***} p < 0.001 
Decile matters greatly. All else equal, a school one decile higher has about a four percentage point increase in pass rates. But, decile matters at a decreasing rate: moving from Decile 2 to Decile 3 correlates with a 3.3 percentage point increase in maths pass rates while moving from Decile 8 to Decile 9 only improves pass rates by one percentage point.
Class size matters: schools with more students per teacher have higher pass rates. I suspect reverse causation here: for a fixed budget, those schools that are able to run larger classes are likely those that have fewer discipline problems and so are able to put those resources to other uses.
Ethnicity matters. A standard deviation increase in the proportion of Maori students reduces aggregate pass rates by 1.3 percentage points in reading and 2.2 percentage points in math. Similar trends exist for Pacific Island student ratios. I’d be pretty cautious in interpreting this one: if you run things decilebydecile, the effects mostly disappear. The biggest negative effect seems to hold in high decile schools, but by the time you get to Decile 10 schools, the median school has only 5.9% Maori students. Results then may be a bit sensitive to a few outliers on the right hand side. Like Luis, I’ll refrain from doing much more until the official results come out.
Single sex schools seem to do well; boarding schools seem to do poorly.
I generated residuals from each of the three specifications above. The residual tells us whether a school had a higher or lower pass rate than we would have expected given its characteristics. This either tells us how good (or bad) the school is at teaching, or how good (or bad) it is at grade inflation. Without external moderation, it’s hard to tell. The residuals from the three specifications correlate strongly with each other: schools that are good (or grade inflate) tend to do so across the board. The lowest pairwise correlation was 0.57; the highest was 0.63. I averaged the residuals to get a composite score. A high residual means that the school’s actual pass rate was higher than what we would have expected given its characteristics.
I’m not confident enough in the model to put up my own league table of residuals. But I will put this up. This is a scatterplot of the residuals showing just how much school performance varies after we have corrected for decile, ethnicity, and everything else in the above model. That can point to its being a bad model, the underlying data being bad, strong differences in teaching quality across schools, or a combination of all three.