What is a Gini?

By Matt Nolan 21/07/2014


Everywhere we turn nowadays people are talking about Ginis.  And sadly, they are not misspelling Genie, they are talking about Gini coefficients.

The reason for this interest in Gini coefficient stems from the fact they are used to measure “inequality” in an income distribution – with books such as the Spirit Level made hay discussing the relationship between Gini coefficients and other social outcomes.

Now I’ve spent a bunch of time talking about the claims (eg for the Spirit Level directly I wrote this and this), but I’ve never written anything directly about the Gini coefficient.  There is a good reason for this, while I understand it is a measure of dispersion in a distribution I still had to (and still need to) learn things about the measure and other measures.

However, let me discuss what the Gini coefficient is – or at least one of a multitude of different ways we can view a Gini coefficient.

Forget for a second that the variable of interest is income – and let us just think that there is some data we are trying to discuss.  Many of you will be used to the idea of the mean of our data and the variance of our data – and the relationship this has to the “population” values.

We can think of these measures more generally in terms of moment generating functions – where these moments in some sense describe the distribution in terms of the mean, variance, skew, and kurtosis.  This is very useful stuff, and can give us a solid understanding of what sort of data we have in front of us.

However, when we want summary indicators we know we have a bit of an issue with the data set we are looking at – it is right skewed.  In other words, there is a very long right tail for our income distribution, and (if the data is unimodal – which it isn’t ;) ) we will have a situation where mode<median<mean.  In this case, the central tendency we are interested in discussing for the distribution may not be appropriately described by the mean – hence why it is so common to discuss median income when talking about income distributions.

Cool.  However, given this it is still very common for us to turn around and think in terms of “variance” when discussing the statistical dispersion of a series – even when we know we have a situation where the data has a right tail and we discuss the median as our summary statistic for centrality.

Although the variance is commonly defined as the squared difference between an expected random variable from the distribution and its mean, the variance can be rewritten to be independent of the mean – as a result, it is common to keep using the variance to describe statistical dispersion.  However, there are a couple of reasons we may not want to use the variance when discussing income inequality as a type of statistical dispersion.

  1. The most common complaint is that we are interested in “relative” inequality as it is “objective” (this term is in inverted commas on purpose – read it as saying the value judgments are clear, making analysis easier).  As a result, we want our measure to be normalised (dimensionless or scale invariant).  In this case, it would be common to use the coefficient of variation.
  2. The variance uses a different “distance function” than other measures of dispersion – in some cases, the implied distance function used by other measures of dispersion is more appropriate.

An alternative in this case is to look at the expected absolute difference between two realisations of a random variable from the given distribution.  This is the Gini mean difference.

Both the variance and the GMD can be written in terms of weighted average of the difference between adjacent observations.  However, it turns out the distance function that is used by the variance puts greater weight on extreme observations than the Gini mean difference does (where the Gini coefficient is a normalised version of the GMD) – similar to the concern we had about using the mean instead of the median.

This alone doesn’t tell us we should use the GMD instead of the variance – but in the same way we may sometimes prefer the median to the mean to give us a summary measure of something about the distribution, there are times where the GMD (and the Gini coefficient) gives us a more useful summary measure of dispersion than the variance.

For those interested in more details about why we’d use the GMD instead of the variance in some economic applications, and for those who actually want to look through the working, I suggest having a peek at the Chapter I linked at the start of this post.

Yawn

The key point I’m trying to get across here is that Gini measures are not mystical values that tell us what is fair or just – they are a certain measure of statistical dispersion, one that bears a relationship to measures such as the variance.  We can only interpret what these mean if we can understand the mechanism behind them – namely, why is income distributed in this way.  This is the most important step, and yet often seems to be the one people are most happy to just ad hocly throw together ;)

In other words, the income distribution is an outcome from some process which involves individual choice and policy.  We need to understand “how” this happened in order to describe what the counterfactual position would be if we changed policy.  This gives us our description of policy.  It is only then by applying value judgments that we can say one outcome is better to another.  Merely running around after a measure of statistical dispersion misses the point that there may well be trade-offs between some outcomes and the inequality measure, and that some inequality associated with this measure may be “good”.