By Andreea Calude 23/04/2019

We are sitting on his bed, and my son pulls out all the different ponies – they are of different colours and sizes – and then he says “I know, I know a good idea [ideas are things that he knows, not things that come to him], let’s organise the ponies by colour”.

My best friend is aptly impressed, and I laugh – yes, my four-year old is already on his way to organising the world. Rather like a good little linguist – right? Well, not necessarily …

The Linguistic Unit: units everywhere

The first lecture I ever took in linguistics blew my organisational-mind away. I can still see the wonderfully-entertaining and knowledgeable late Scott Allan pacing up and down a small stage underneath the Auckland University library, telling us how sounds are formed in English. If anyone could make sounds exciting it was Scott.

He explained that we use various parts of our mouths together with the air pushed out of our lungs to form individual acoustic contrasts – the gateway to all spoken language. From mere puffs of air and skilful management of our articulators, /b/ came out differently to /p/, /d/ sounded distinct from /t/, and /k/ contrasted with /g/. For decades, these minimal contrasts have enabled English speakers to distinguish between bats and pats, darts and tarts, crates (spelled with a ‘c’ but pronounced as /k/) and grates. Neatly arranged in our mouths, from the tips of our lips all the way to the back of our throats, contrasts like these allow the mighty spoken English language to take acoustic shape.

By the time we got to morphology, the picture became even neater. From the ‘sound’ up, we would build the English language in that introductory course to linguistics (LING101), one unit at a time. Bits of words were put together to make new words. -ER attached to verbs like “read” to make a person who “reads” (reader), or like “walk” to make the person who “walks” (walker), or “sing” to make singer, and on it went; teacher, worker, writer, player, slayer, cutter, scriber. Productively. Efficiently. Once we got to words, it was only a small step to the scaffolding of syntax. It just got better.

Units! It was all about units of language. Language became an organised system divisible into (more or less clearly) identifiable, logically-arranged units, like a perfectly written IKEA-kit-set instruction booklet (you can probably tell I have never put an IKEA-kit-set together myself, but you get the gist). This system of units is convenient for analysts (linguists), but also for learners because it enables the acquisition of the code. And so, I was hooked. I have been dealing in linguistic units ever since. If you have taken my classes, so have you.

But what if we can do it without units?

And then I read this (Ramscar & Port, 2015). And now, a new way of looking at the system that I have been busy splitting into units of various kinds is beginning to emerge. Now, I feel like someone opened a different door to the room I have been living in for the past twenty years. Suddenly, I see a different perspective of that familiar space in which I could find my way around so easily, even in the dark (ok, on good days).

The idea is simple but elegant and seductively clever, clever enough that it might just be right! Even though, as the authors humbly admit, this view of language is neither entirely new, nor entirely their own (like much knowledge, it has incrementally accumulated through bits and pieces from various scholars and theories), reading their paper has me quite excited. They managed to present these ideas in a way which has struck a chord with me, which is why I decided to share it here!

What if I told you, … they say … units have nothing to do with it? Now, if you asked me to summarise the major goals of linguistics I would struggle immensely (well, duh!), but somewhere high on that list, I’d have to put (1) understanding how speakers decode messages, and (2) understanding how learners acquire languages. Despite centuries of research, linguists are yet to reach any real consensus with respect to these (and in fact, many other) questions.

So, put your units away, the authors say! Put away your categorisation tools, do away with your neat phoneme inventories, morpheme taxonomies and syntactic constructions, to answer the two questions above; they are not needed this time because units will not help you understand how we decode message, nor how we learn language. Eh, say what?

Instead, decoding language is about minimising and dealing with uncertainty by eliminating irrelevant contrasts; by discarding possible, but irrelevant interpretations of a given message. Using Shannon’s deductive model of communication (Shannon, 1948), Ramscar and Port explain that natural language parsing and language learning can be understood in a very similar way. What Shannon’s model of information theory and natural language parsing have in common is that neither system tries to build a full representation of the information presented, but instead, they try to choose the correct interpretation from a set of possible alternatives.

Put simply, by eliminating what someone is not trying to say from a series of possible options, hearers figure out what someone is actually trying to say (the intended message). It’s all done by maximising contrasts for those things that we talk about a lot through minimising effort in detecting those contrasts – and here we have the ingredients of an efficient language. This is for instance, why Zipfian distributions can be found everywhere in the language system, e.g., a small handful of words are used very frequently while most others are hardly used at all. Predictability ranks immensely high in this system because predictable contrasts are learned through exposure – that sounds pleasingly familiar to emergentists and usage-based linguists, (yes, we all nod in agreement) – but the discrete units themselves are not (again, eh?).

What is more, this maximisation of contrasts and elimination of irrelevant alternatives can explain a lot about language. Let’s consider a real example, for instance, take irregular past tense forms. Ever noticed how it is precisely those commonly used verbs that are irregular in English? (So have many others – among which, Lieberman et al 2007 – I could cite so many others here). We often thought that the irregular forms are recalcitrantly sticky because our minds remember them: they are so frequently used, that we can’t help by remember them, we are not trying to, but we just can’t help it (it turns out that the principle of “do not change what is frequent”, or immutable frequent forms holds more broadly than just past tense forms in English, see work by Pagel et al in the same issue). Like a bug in the system, we use and re-use irregular past tense forms, so they stick and get passed on to the next generation of speakers, and now, no one knows why on earth the past tense of go is went.

But no, quite the opposite, Ramscar and Port argue that the irregular forms are well-adapted, selected items which have evolved precisely to make language more efficient (not a bug but a feature). Irregular forms are irregular precisely because they are used so much and it makes sense to have a salient and easily identifiable contrast between these forms – that leads to quick identification of the contrast. Of course, it would be silly to only have irregular past tense forms because that would lead to cognitive insanity (irregular past tense forms are bad enough for second language learners thank-you-very-much); yet devising irregular forms for the really frequent verbs makes perfect linguistic sense.

The ideas presented in this article raise so many questions for me, and I am definitely not ready to abandon my cherished units of analysis, which have helped organise so much grammatical chaos in my head, but I am still quietly enjoying my mind-blown moment.


Lieberman, E., Michel, J. B., Jackson, J., Tang, T., & Nowak, M. A. (2007). Quantifying the evolutionary dynamics of language. Nature449(7163), 713.

Pagel, M., Atkinson, Q. D., & Meade, A. (2007). Frequency of word-use predicts rates of lexical evolution throughout Indo-European history. Nature449(7163), 717.

Ramscar, Michael and Port, Robert (2015). How spoken languages work in the absence of an inventory of discrete units. Language Sciences, 53, 58-74.

Shannon, Claude E. (1948). A mathematical theory of communication. Bell System Technical Journal, Vol. 27 (pp. 379–423), 623–656.