
12 minute read
McLanguage: Language as
McLanguage: Language as Evidence and Defining Linguistic Authorship
Joshua Richardson, 3rd Year: Forensic Linguistics
Advertisement
This essay will explore linguistic methods of determining authorship, analysing
their methodological effectiveness and shortcomings particularly within the
law, highlighting specific cases, and how linguistic evidence can support a
criminal trial.
Discovering the true authorship and origin of a text can be a necessary task in
proving one’s innocence or guilt in modern courts, calling upon linguists with
a skillset that readily adapts to the ever-changing idea of authorship. Indeed, as
society’s dictate for administration, record-keeping, and self-expression shifts
simultaneously with technological advance, specific linguistic analysis is
increasingly called for. The wider challenges linguists must face involve writing
style and genre, their sociocultural influences, context, and choice; as well as
more modern issues of who owns spoken or written language (Olsson &
Luchjenbroers, 2014). Authors - individuals, groups, or companies - can form
patterns of “recurrent” (McMenamin, 2010, p.488 in Coulthard & Johnson,
2010) language choice through their exposure to wider linguistic norms within
sociocultural contexts, in turn assuming their own stylised, linguistic identity
89
through conforming to or deviating from the established linguistic structures
acceptable within a community. Authorship then appears to be a complex,
multifaceted product of one’s community and identity, and that author
attribution requires reliable and robust approaches to determine, for example, a
suspect’s idiolect or written identity to then use as evidence in court; or, to aid
in assigning ownership of trademarks, slogans, or lexemes.
Alas, like any other piece of evidence, linguistic evidence is naturally supplied
in tandem with other forms of evidence to build a greater picture; but
linguistic evidence can still present to be a tipping point for legal cases. For
example, style markers within stylistic analysis are minute linguistic traits that
can be useful in determining idiolect through construction of an author profile
and the author’s exclusive typical written patterns and habits. This method can
utilise corpus linguistics and can naturally be processed through the aid of a
computer to determine the likelihood of an author’s attribution to a text
through compared “consistency and distinctiveness” (Coulthard et al, 2017,
pp.157-158); i.e., deviations from a pre-established linguistic standard or
personal style that can then be presumed to be evidence of authorship. This
includes errors or non-standard linguistic deviations, particularly if the errors
are consistent (Coulthard, 2010).
90
However, while stylistic analysis can aid authorship attribution, linguists must
often deal with written or transcribed evidence that varies widely in length thus
affecting corpus reliability; be given little comparative content; or find
themselves being dripfed texts from police force during long-term
investigations or searches which uncover new material as time goes on,
ultimately hampering any progress made as new comparative data is
introduced. Additionally, the nature of handwritten evidence may mean an
additional step of digitisation is required, which also increases the duration of
an analysis and puts greater time pressures on linguists who must work within
the timeframes of how long suspects or prisoners can legally be held for
(Cotterill, 2010, in O’Keefe & McCarthy, 2010). This ultimately leaves stylistic
analysis open to qualitative analysis from the forensic linguist as they are left
with little headroom, especially when there may be few linguistic databases to
consult due to the limited nature of the data (Chaski, 2005). Additionally,
owing to the relationship between authorship and genre within society - that
genre is a reactive force in the wake of sociocultural events, appearing transient
and unfixed - it appears difficult to use perceived differences “between authors’
works” (Olsson & Luchjenbroers, 2014, p.49) as a benchmark of strict stylistic
consistency. In essence, genre affects changes in authorship, and authorship
affects changes in genre - which makes stylistic analysis appear more open to
interpretation.
91
The sensitivity of qualitative authorship attribution can be observed clearly
within the 1953 case of Derek Bentley, where corpus techniques were first
observed within a legal context. The case resulted in Bentley’s execution before
the guilty verdict was posthumously overturned due to disputed linguistic
evidence. This evidence proposed inconsistencies pertaining to dual authorship,
where Bentley’s answers to police questions were transcribed as statements and
were ultimately misinterpreted. It affected both the series of events and the
semanticity of Bentley’s answers, particularly regarding what was inferred by
him. In this case, it was the response to a policeman’s line of questioning
regarding the gun, which was used as evidence to convict Bentley on the basis
that he knew of the gun but not a gun - a mere shift in article. Through later
comparative corpus analyses of witness and police statements and dictionaries,
it was deemed that Bentley’s syntactic usage, frequency, and placement of
words such as then did not correspond with his typically spoken style, nor many
typical spoken usages of then. Instead, Bentley’s statements matched the
utterance frequency and syntactic placement within institutional register of
trained police officers, implying authorship interference during the
transcription or interrogation process (Coulthard et al, 2016).
The above highlights the importance of recorded oral evidence, as analysing
transcribed statements can lead to misappropriated convictions, even if
delivered through institutional agents such as the police. Indeed, prior to and
92
shortly after the Police and Criminal Evidence Act (1984), which enforced
mandatory tape recordings during police interviews. Officers had been seen to
construct “dramatisions” (Coulthard, 2002 in Cotterill, 2002, p.23) during cases
based on interview notes and memory in attempt to project authority and
assume control of the narrative by portraying a friendlier atmosphere,
especially where the suspect might have appeared unreasonable in the face of
fabricated police warmth. Without recordings, both linguists and the jury
would have left with transcribed discourse that is of questionable authorship.
However, comparative concordancing shows that there is a defined difference
between institutionally trained, “top-down” (Olsson & Luchjenbroers, 2014,
p.78) language and casual, typical speech style markers; and, in the case of
Derek Bentley, justice could have been served more appropriately if greater
linguistic resources were allocated to fully verify the authorship of statements
through these stylistic markers.
When linguists are presented with a more content to analyse, Grant (2012)
stresses the emergent importance of stylometric analysis versus stylistic analysis.
This is more of a statistical approach that encompasses differences “between
authors than within authors” (p.470); i.e., a quantifiable, frequency-based
analysis through having a larger background population sample, but one that
may not necessarily be more accurate or “well defined” (p.471) as features are
discovered rather than preidentified, and the approach harbours potential
93
biases that inherently exist within larger populations. This is the alternative to
creating a stylistic profile using only one writer’s text(s). Overall, stylometric
approaches mostly consist of lexical counts and frequencies, meaning
linguistics - analysing phrasal verbs, grammatical habits, syntactic order,
etcetera - mostly takes a backseat (Chaski, 2005). Corpora-like approaches can
still be used within stylometric analysis, even if count methods appear
somewhat superficial. But the absence of linguistic explanation and elaboration
within the approach can mean it is less explainable in court, and thus can have
less demonstrable usage or influence in legal settings (Wright, 2014).
Grant (2012) sought to combine stylistic and stylometric methodology by using
pairwise correlations instead of larger populations to determine the difference
between two different authors’ SMS messages - those of Amanda and
Christopher Birks - during the Birks murder case. Additionally, any selection
biases were mitigated, meaning all words were input into the concordancer to
produce a “replicable process” (p.476) where similarities or deviations would be
correlated statistically between two authors, rather than being matched based
on personal linguist intuition or interpretation. By performing non-parametric
Mann-Whitney U tests, the null hypothesis - that there was no significant
difference between the Birks’ authorship - was successfully rejected and thus a
replicable process was discovered.
94
However, the author acknowledged the inability to display with full confidence
that Christopher had produced messages on Amanda’s phone (to confuse
authorities) because the two corpora that were devised contain messages from
each of their phones. This is problematic, as due to the statistical nature of the
analysis, it was deemed that some of Amanda’s composed messages on the day
of her death correlated with the stylistic choices of Christopher. Additionally,
when compared to an “independent corpus” (p.486) of a larger population’s
text messages, the pairwise approach is no longer viable because it is not known
how many people share Christopher’s or Amanda’s style and whether this
aspect is statistically relevant to the wider case. Further, while author style was
an explanation of the orthographical differences between message
compositions, context may also have been an influencing factor that is
inherently difficult to measure in terms of linguistic impact. For example, the
style marker of textual kisses may have varied due to a differing “relationship
with the recipient” (Grant & MacLeod, 2018, p.91). In other terms, the
evidence garnered in this approach is chance based rather than being of great
linguistic interpretation or conviction, and still requires some elaboration
regarding ambiguous pairs, especially when concerning timestamps of
messages before and after the murder where authorship is certainly unclear.
Thus, this evidence is instead used to support the case rather than to strongly
determine its outcome. In the least, it is somewhat reliable even if its linguistic
95
strength is questionable when dealing with “authorship disguise and imitation”
(Olsson & Luchjenbroers, 2014, p.78).
While stylistics and stylometric approaches aid authorship cases involving
murder and can also influence capital punishment outcomes, trademark
linguistics explores litigation cases where businesses feel as if their intellectual
property or brand has been threatened due to authorship disputes. This
includes breaches concerning orthography, phonetics (including accent and
pronunciation, case outcomes of which can differ between nations), and
semanticity; where spellings, pronunciations, and connotations of words are
seen to “act as psycholinguistic agents for distinguishing the trademark”
(Butters, 2010, p.355) that can then lead to a confusion of brand awareness.
Specifically, the linguist must verify real possibilities of “trademark dilution”
(Butters, 2007, p.508) through research of mark awareness and prominence
across wider society, generating a measurement of frequency and quantity of
disputed terms. This leads to the question of use, as excessive usage of
trademarks can and has led to “genericide” (p.514) amongst brands (such as
Kleenex), resulting in a detachment between a company and its product,
ultimately rendering the mark fit for universal application to other similar
products. Qualifications for the criteria of genericide can include proper noun
to mass noun conversion through a dropping of capital letters; and noun to
verb conversion, where the term is used for actions rather than an exclusive
96
item. Ultimately, courts use linguists to expedite the designation of “weak” and
“strong” (Shuy, 2012, p.450 in Tiersma & Solan, 2012) marks across a scale,
where stronger marks are protected by law, but weaker ones are not. These
classifications rely on clinical linguistic analyses, or testimonies where linguistic
expertise can verify, for example, whether homophones may dilute trademarks
due to dialectal or regional phonetic homogeneity (Tiersma & Solan, 2002).
Some ethical risks surround trademark linguistics, such as the institution of law
and its subsequent courts dictating who or what can or cannot use terms or
marks, which may naturally affect the course of language shift on a
sociocultural level (Shuy, 2012, in Tiersma & Solan, 2012). On an ownership
level, questions must also be asked as to whether the state is utilising the
linguist to reinforce unfair or oppressive cases for institutional gain, which may
cause the linguist to question their own moral compass (Butters, 2007). These
challenges become realised when courts call for linguist impartiality during
difficult trademark dispute cases.
In the 1988 McDonald’s (restaurant) vs Quality Inn (hotel chain) case,
McDonald’s utilised authorship attribution to assert ownership of the Mc prefix
through their demonstrating of collective morphological rules that were
present in the names in their main product line: Mc + lexeme. Due to the
popularity and reputation of McDonald’s, their great presence in the market
97
dictated that Quality Inn could not utilise the prefix Mc to market their own
products, namely due to the possible syntactic and pragmatic connection
between restaurants and hotels which customers might conceive; as well as
McDonald’s coining their own McLanguage concept for product awareness
purposes (Lentine and Shuy, 1990). Quality Inn utilised corpus linguistics to
locate usage of Mc outside of hotel and restaurant chains, with a defence that
insisted the prefix Mc had a wider, more established meaning that implied good
value across different industries. Nevertheless, it proved fruitless against market
research which dictated McDonald’s was at the commercial forefront of Mc
associations, and thus genericide was implied to be a direct risk to the
company’s products (Coulthard, 2005).
In conclusion, linguistic evidence can tilt a case in some ways - for example the
Birks murder case - but it can also be wildly misinterpreted by authorities, as
demonstrated in the unwarranted execution of Derek Bentley. Further, even if
linguistic evidence appears strong in theory, courts must listen to both sides in
any given case, as in McDonald’s vs Quality Inn; and, like the cases dealing with
loss of human life, authorship attribution of marked lexemes or phrases is not a
perfect endeavour, instead acting only as a supporting element of any given
legal defence or construction. In essence, linguistic approaches to authorship
attribution can offer alternative perspectives to the courts, but viewing it in an
absolute manner is clearly a questionable route for juries to take.
98
References
Butters, R. R. (2007). A linguistic look at trademark dilution. Santa Clara Computer & High Tech. LJ, 24, 507.
Butters, R. R. (2020). Trademark linguistics: Trademarks: language that one owns. In The Routledge handbook of forensic linguistics (pp. 364-381). Routledge.
Chaski, C. E. (2005). Who’s at the keyboard? Authorship attribution in digital evidence investigations. International journal of digital evidence, 4(1), 1-13.
Cotterill, J. (2010). How to use corpus linguistics in forensic linguistics. In The Routledge handbook of corpus linguistics (pp. 578-590). Routledge.
Cotterill, J. (Ed.). (2002). Language in the legal process. Springer.
Coulthard, M. (2005). The linguist as expert witness. Linguistics and the Human Sciences, 1(1), 39.
Coulthard, M. (2010). Forensic Linguistics: the application of language description in legal contexts. Langage et société, (2), 15-33.
Coulthard, M., & Johnson, A. (Eds.). (2010). The Routledge handbook of forensic linguistics (pp. 473-486). London: Routledge.
Coulthard, M., Johnson, A., & Wright, D. (2016). An introduction to forensic linguistics: Language in evidence. Routledge.
Grant, T. (2012). TXT 4N6: method, consistency, and distinctiveness in the analysis of SMS text messages. JL & Pol'y, 21, 467.
99
Lentine, G., & Shuy, R. W. (1990). Mc-: Meaning in the marketplace. American speech, 65(4), 349-366.
Olsson, J., & Luchjenbroers, J. (2014). Forensic linguistics. Bloomsbury Publishing.
Tiersma, P. M., & Solan, L. M. (Eds.). (2012). The Oxford handbook of language and law. OUP Oxford.
Tiersma, P., & Solan, L. M. (2002). The linguist on the witness stand: forensic linguistics in American courts. Language, 221-239.
Wright, D. (2014). Stylistics versus Statistics: A corpus linguistic approach to combining techniques in forensic authorship analysis using Enron emails (Doctoral dissertation, University of Leeds).