U-Lingua | Winter 2022, Issue 7 | The Change Issue

Page 38

Anatomy of a Linguist

Anatomy of a Linguist What Keeps Us Up at Night

The Philosophical Value of Big Data: A Plea for Intention Tracking In this edition of Anatomy of a Linguist, columnist T. R. Williamson tries to shed new light on the use of big data in linguistic research. Understanding the philosophical potential of corpus linguistics may lead to a wealth of opportunities, academic and beyond.

L

inguistics departments across the world have a secret. Somewhere in their buildings — perhaps in a basement, a disused teaching room, or an old office — they keep confidential equipment. You may have seen such areas yourself. Perhaps you noticed a suspicious DO NOT ENTER sign, spotted boarded-up windows, or even walked by as a whisp of ghostly, cold vapour seeped from underneath a door. What are these departments doing? What are they hiding? Had you followed this eerie, gaseous substance and entered where you were not permitted, you’d have uncovered their covert operations. For decades, linguists have been conspiring with government agents to cryogenically freeze language and store it away from prying, undergraduate eyes. They hold words and sentences captive in massive vats, frozen with nitrous oxide, and run dangerous tests on them. From checking parts of speech to counting bigram frequency and everything in between, these evil language scientists poke and prod without their specimens’ consent. They must be stopped. And what is the name of this malevolent discipline? Corpus linguistics.

38

This is, of course, nonsense: corpus linguists are lovely, they do not conspire with the government, and words have no feelings. Yet, certain parts of this depiction seem to present interesting parallels with the work corpus linguistics actually involves. It is the field that concerns itself with the collection, organisation, and presentation of instances of language use for further analysis[1]. A corpus linguist might digitalise old documents or record audio from real-world conversations with the aim of collecting that data and presenting it in a searchable format. Want to know how many times the term fake news was said by Donald Trump in tweets from 20162020? You can make a Twitter corpus and find out[2]. Want to know how many times the word bastard was used informationally or offensively throughout the entirety of Shakespeare’s works? You needn’t make your own corpus; it’s already been done[3]!


Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.