By Guest Work 06/07/2017

This article was originally posted on Royal Society Te Apārangi’s Past and Future series where, as part of 150th anniversary celebrations, early career researchers are invited to share discoveries in their fields from days gone by or give us a glimpse into where their research may take us in the future.

This article is by Dr Andreea Calude, a Senior Lecturer in Linguistics at the University of Waikato.

Whom is out. Ought to is practically gone. Hangry is very much on the rise. Like the tide, language never stands still; it’s always on the move.

Andreea Calude

Using real human language extracts from New Zealand English obtained over the past 130 years, two researchers just published a study1 showing that words and their use are in a symbiotic, co-adaptative relationship. Words which are used frequently are becoming shorter. Conversely, words that are high in informational content are becoming longer. Words that occur at the end of sentences are lengthening. The dynamic nature of this process can only be captured through actual language data and computer power – I am notoriously bad at remembering what I’ve had for breakfast the day before, let alone what I actually said or heard over the last 10 years. And 10 years in ‘language-change time’ may only be equivalent to a small bleep.

Luckily, we don’t need to remember because looking inside our minds has just become easier by other means! One of the most intriguing human innovations – language – can be harnessed to take a peek at the secret life of the human brain. Owing to advances in computer technology, and to a paradigm shift in the last 20-30 years towards empirical investigations of language, linguistics (the science of language) can now illuminate patterns in our minds in an objective and quantifiable way. This new framework, called corpus linguistics, is fast becoming the key approach in what has previously been a heavily introspective landscape (aka armchair linguistics).

From this new framework, we have learnt that words and phrases are akin to highways for thought patterns. Unlike Robert Frost’s poem “The Road Not Taken”, words and phrases do lead us down the same “roads” over and over again, and not down novel paths. Looking at a dictionary might give you the impression that language is an infinite Lego-set of word-combination possibilities. After all, we can understand sentences we have never heard before, and we can produce sentences we have never produced or heard before. But the reality is that we simply don’t.

Experience shapes our perception of the world

Experience shapes the way we view and understand the world, and the ways in which we talk about it, right from the beginning of our lives2. In turn, the ways in which we put reality into words betray habitual thought patterns. Certain words attract others (termed collocations), so that recurring combinations of words behave more like single words than like separate words: “butter” will be recognised, understood, and recalled faster if following “bread” than if following “dish”. In some cases, the relationship is directional: “bonsai” attracts “tree”, but “tree” does not always occur with “bonsai”. Attraction between words can be observed by looking directly before/after a word or within the larger text that it is part of. Words can become “coloured” by their companions, for example, when we talk about “causing” something in English, it is invariably a negative thing: “cause an accident”, “cause a disaster”, “cause a commotion”, and so is “commit”: “commit a crime”, “commit an offence”, “commit an atrocity” – though there is no inherent reason why this should be so. While the scaffolding of language allows immense possibilities, actual use proves much more restricted.

Not every word combination is equal

Some are cognitively more appealing, for example “nice clean plaster” is preferred to “clean nice plaster” (we prefer more discriminating adjectives to precede less discriminating ones), but also “a nice intelligent man” sounds better than “an intelligent nice man” (longer adjectives precede shorter adjectives), and “a good young green tree” seems more natural than “a green young good tree” (following a  hierarchy of dimension < before age < before colour). The picture gets yet more complex when we consider interactions between constraints3. We encode certain events as unfolding spontaneously, from their own accord (events like “dry”, “melt”, “freeze”), while the grammar pushes us to treat others as being caused by an external agent (such as “break”, “open”, “split”), and this appears to hold true for many different languages4. The data tell us that some patterns are specific to individual languages, while others hold more generally, perhaps across all human languages.

As one of the youngest varieties of English, New Zealand English is in a most privileged position to embrace the empirical turn in linguistics. Being such a new dialect means that recordings of its origins are still within reach, allowing us a window into our linguistic past. At the same time, we can also look to the future to examine the development of New Zealand English, unfolding as a blending of cultures and identities. Over the next years, my own research will analyse various corpora to study empirically how words of Māori origin are used in speech, newspapers, the internet in order to better understand why some Māori words are used more than others, by whom, and how this use might be changing over time. Do we tend to use Māori words that are shorter than their comparative New Zealand English equivalents? And is that use about ‘us’ (for instance, to signal our own positioning with respect to Te Reo Māori) or about the person we are talking to (signalling affinity with an addressee who is themselves of Māori ethnicity, or signalling distance from one who is not likely to be familiar with Māori words)?

What does the future hold?

Challenges for future work using corpus linguistics methods will be to populate the corpus landscape with a variety of languages, to sharpen the tools available for building and studying corpora (both statistically and computationally), and to combine forces with neighbouring fields of psychology, neurology, computer science, and biology in a bid to increase our knowledge of the human mind.

Like genes, languages are intricate pieces of the puzzle of human history. Studying aspects of language can have implications for deeper questions about who we really are as a species. How different are the languages we speak from each other? In what ways does growing up speaking one language shape our world view compared to speaking another? These questions remain – for now – unanswered, but we are only just getting started! 

External links:

ONZE Corpus –

Corpus Linguistics MOOC –


Reading list (in the order mentioned):

1.         Sóskuthy, M., & Hay, J. (2017). Changing word usage predicts changing word durations in New Zealand English. Cognition, 166, 298-313.

2.         Tomasello, M. (2009). The usage-based theory of language acquisition. In The Cambridge handbook of child language (pp. 69-87). Cambridge Univ. Press.

3.         Wulff, S. (2003). A multifactorial corpus analysis of adjective order in English. International Journal of Corpus Linguistics, 8(2), 245-282.

4.         Haspelmath, M., Calude, A., Spagnol, M., Narrog, H., & Bamyaci, E. (2014). Coding causal–noncausal verb alternations: A form–frequency correspondence explanation. Journal of Linguistics, 50(03), 587-625.

Site Meter