McLanguage: Language as

from The Swansea Applied Linguistics Journal Issue 5 (Summer, 2022)

by lingjournalswansea

Research Methods for ELT

McLanguage: Language as Evidence and Defining Linguistic Authorship

Joshua Richardson, 3rd Year: Forensic Linguistics

This essay will explore linguistic methods of determining authorship, analysing

their methodological effectiveness and shortcomings particularly within the

law, highlighting specific cases, and how linguistic evidence can support a

criminal trial.

Discovering the true authorship and origin of a text can be a necessary task in

proving one’s innocence or guilt in modern courts, calling upon linguists with

a skillset that readily adapts to the ever-changing idea of authorship. Indeed, as

society’s dictate for administration, record-keeping, and self-expression shifts

simultaneously with technological advance, specific linguistic analysis is

increasingly called for. The wider challenges linguists must face involve writing

style and genre, their sociocultural influences, context, and choice; as well as

more modern issues of who owns spoken or written language (Olsson &

Luchjenbroers, 2014). Authors - individuals, groups, or companies - can form

patterns of “recurrent” (McMenamin, 2010, p.488 in Coulthard & Johnson,

2010) language choice through their exposure to wider linguistic norms within

sociocultural contexts, in turn assuming their own stylised, linguistic identity

89

through conforming to or deviating from the established linguistic structures

acceptable within a community. Authorship then appears to be a complex,

multifaceted product of one’s community and identity, and that author

attribution requires reliable and robust approaches to determine, for example, a

suspect’s idiolect or written identity to then use as evidence in court; or, to aid

in assigning ownership of trademarks, slogans, or lexemes.

Alas, like any other piece of evidence, linguistic evidence is naturally supplied

in tandem with other forms of evidence to build a greater picture; but

linguistic evidence can still present to be a tipping point for legal cases. For

example, style markers within stylistic analysis are minute linguistic traits that

can be useful in determining idiolect through construction of an author profile

and the author’s exclusive typical written patterns and habits. This method can

utilise corpus linguistics and can naturally be processed through the aid of a

computer to determine the likelihood of an author’s attribution to a text

through compared “consistency and distinctiveness” (Coulthard et al, 2017,

pp.157-158); i.e., deviations from a pre-established linguistic standard or

personal style that can then be presumed to be evidence of authorship. This

includes errors or non-standard linguistic deviations, particularly if the errors

are consistent (Coulthard, 2010).

90

However, while stylistic analysis can aid authorship attribution, linguists must

often deal with written or transcribed evidence that varies widely in length thus

affecting corpus reliability; be given little comparative content; or find

themselves being dripfed texts from police force during long-term

investigations or searches which uncover new material as time goes on,

ultimately hampering any progress made as new comparative data is

introduced. Additionally, the nature of handwritten evidence may mean an

additional step of digitisation is required, which also increases the duration of

an analysis and puts greater time pressures on linguists who must work within

the timeframes of how long suspects or prisoners can legally be held for

(Cotterill, 2010, in O’Keefe & McCarthy, 2010). This ultimately leaves stylistic

analysis open to qualitative analysis from the forensic linguist as they are left

with little headroom, especially when there may be few linguistic databases to

consult due to the limited nature of the data (Chaski, 2005). Additionally,

owing to the relationship between authorship and genre within society - that

genre is a reactive force in the wake of sociocultural events, appearing transient

and unfixed - it appears difficult to use perceived differences “between authors’

works” (Olsson & Luchjenbroers, 2014, p.49) as a benchmark of strict stylistic

consistency. In essence, genre affects changes in authorship, and authorship

affects changes in genre - which makes stylistic analysis appear more open to

interpretation.

91

The sensitivity of qualitative authorship attribution can be observed clearly

within the 1953 case of Derek Bentley, where corpus techniques were first

observed within a legal context. The case resulted in Bentley’s execution before

the guilty verdict was posthumously overturned due to disputed linguistic

evidence. This evidence proposed inconsistencies pertaining to dual authorship,

where Bentley’s answers to police questions were transcribed as statements and

were ultimately misinterpreted. It affected both the series of events and the

semanticity of Bentley’s answers, particularly regarding what was inferred by

him. In this case, it was the response to a policeman’s line of questioning

regarding the gun, which was used as evidence to convict Bentley on the basis

that he knew of the gun but not a gun - a mere shift in article. Through later

comparative corpus analyses of witness and police statements and dictionaries,

it was deemed that Bentley’s syntactic usage, frequency, and placement of

words such as then did not correspond with his typically spoken style, nor many

typical spoken usages of then. Instead, Bentley’s statements matched the

utterance frequency and syntactic placement within institutional register of

trained police officers, implying authorship interference during the

transcription or interrogation process (Coulthard et al, 2016).

The above highlights the importance of recorded oral evidence, as analysing

transcribed statements can lead to misappropriated convictions, even if

delivered through institutional agents such as the police. Indeed, prior to and

92

shortly after the Police and Criminal Evidence Act (1984), which enforced

mandatory tape recordings during police interviews. Officers had been seen to

construct “dramatisions” (Coulthard, 2002 in Cotterill, 2002, p.23) during cases

based on interview notes and memory in attempt to project authority and

assume control of the narrative by portraying a friendlier atmosphere,

especially where the suspect might have appeared unreasonable in the face of

fabricated police warmth. Without recordings, both linguists and the jury

would have left with transcribed discourse that is of questionable authorship.

However, comparative concordancing shows that there is a defined difference

between institutionally trained, “top-down” (Olsson & Luchjenbroers, 2014,

p.78) language and casual, typical speech style markers; and, in the case of

Derek Bentley, justice could have been served more appropriately if greater

linguistic resources were allocated to fully verify the authorship of statements

through these stylistic markers.

When linguists are presented with a more content to analyse, Grant (2012)

stresses the emergent importance of stylometric analysis versus stylistic analysis.

This is more of a statistical approach that encompasses differences “between

authors than within authors” (p.470); i.e., a quantifiable, frequency-based

analysis through having a larger background population sample, but one that

may not necessarily be more accurate or “well defined” (p.471) as features are

discovered rather than preidentified, and the approach harbours potential

93

biases that inherently exist within larger populations. This is the alternative to

creating a stylistic profile using only one writer’s text(s). Overall, stylometric

approaches mostly consist of lexical counts and frequencies, meaning

linguistics - analysing phrasal verbs, grammatical habits, syntactic order,

etcetera - mostly takes a backseat (Chaski, 2005). Corpora-like approaches can

still be used within stylometric analysis, even if count methods appear

somewhat superficial. But the absence of linguistic explanation and elaboration

within the approach can mean it is less explainable in court, and thus can have

less demonstrable usage or influence in legal settings (Wright, 2014).

Grant (2012) sought to combine stylistic and stylometric methodology by using

pairwise correlations instead of larger populations to determine the difference

between two different authors’ SMS messages - those of Amanda and

Christopher Birks - during the Birks murder case. Additionally, any selection

biases were mitigated, meaning all words were input into the concordancer to

produce a “replicable process” (p.476) where similarities or deviations would be

correlated statistically between two authors, rather than being matched based

on personal linguist intuition or interpretation. By performing non-parametric

Mann-Whitney U tests, the null hypothesis - that there was no significant

difference between the Birks’ authorship - was successfully rejected and thus a

replicable process was discovered.

94

However, the author acknowledged the inability to display with full confidence

that Christopher had produced messages on Amanda’s phone (to confuse

authorities) because the two corpora that were devised contain messages from

each of their phones. This is problematic, as due to the statistical nature of the

analysis, it was deemed that some of Amanda’s composed messages on the day

of her death correlated with the stylistic choices of Christopher. Additionally,

when compared to an “independent corpus” (p.486) of a larger population’s

text messages, the pairwise approach is no longer viable because it is not known

how many people share Christopher’s or Amanda’s style and whether this

aspect is statistically relevant to the wider case. Further, while author style was

an explanation of the orthographical differences between message

compositions, context may also have been an influencing factor that is

inherently difficult to measure in terms of linguistic impact. For example, the

style marker of textual kisses may have varied due to a differing “relationship

with the recipient” (Grant & MacLeod, 2018, p.91). In other terms, the

evidence garnered in this approach is chance based rather than being of great

linguistic interpretation or conviction, and still requires some elaboration

regarding ambiguous pairs, especially when concerning timestamps of

messages before and after the murder where authorship is certainly unclear.

Thus, this evidence is instead used to support the case rather than to strongly

determine its outcome. In the least, it is somewhat reliable even if its linguistic

95

strength is questionable when dealing with “authorship disguise and imitation”

(Olsson & Luchjenbroers, 2014, p.78).

While stylistics and stylometric approaches aid authorship cases involving

murder and can also influence capital punishment outcomes, trademark

linguistics explores litigation cases where businesses feel as if their intellectual

property or brand has been threatened due to authorship disputes. This

includes breaches concerning orthography, phonetics (including accent and

pronunciation, case outcomes of which can differ between nations), and

semanticity; where spellings, pronunciations, and connotations of words are

seen to “act as psycholinguistic agents for distinguishing the trademark”

(Butters, 2010, p.355) that can then lead to a confusion of brand awareness.

Specifically, the linguist must verify real possibilities of “trademark dilution”

(Butters, 2007, p.508) through research of mark awareness and prominence

across wider society, generating a measurement of frequency and quantity of

disputed terms. This leads to the question of use, as excessive usage of

trademarks can and has led to “genericide” (p.514) amongst brands (such as

Kleenex), resulting in a detachment between a company and its product,

ultimately rendering the mark fit for universal application to other similar

products. Qualifications for the criteria of genericide can include proper noun

to mass noun conversion through a dropping of capital letters; and noun to

verb conversion, where the term is used for actions rather than an exclusive

96

item. Ultimately, courts use linguists to expedite the designation of “weak” and

“strong” (Shuy, 2012, p.450 in Tiersma & Solan, 2012) marks across a scale,

where stronger marks are protected by law, but weaker ones are not. These

classifications rely on clinical linguistic analyses, or testimonies where linguistic

expertise can verify, for example, whether homophones may dilute trademarks

due to dialectal or regional phonetic homogeneity (Tiersma & Solan, 2002).

Some ethical risks surround trademark linguistics, such as the institution of law

and its subsequent courts dictating who or what can or cannot use terms or

marks, which may naturally affect the course of language shift on a

sociocultural level (Shuy, 2012, in Tiersma & Solan, 2012). On an ownership

level, questions must also be asked as to whether the state is utilising the

linguist to reinforce unfair or oppressive cases for institutional gain, which may

cause the linguist to question their own moral compass (Butters, 2007). These

challenges become realised when courts call for linguist impartiality during

difficult trademark dispute cases.

In the 1988 McDonald’s (restaurant) vs Quality Inn (hotel chain) case,

McDonald’s utilised authorship attribution to assert ownership of the Mc prefix

through their demonstrating of collective morphological rules that were

present in the names in their main product line: Mc + lexeme. Due to the

popularity and reputation of McDonald’s, their great presence in the market

97

dictated that Quality Inn could not utilise the prefix Mc to market their own

products, namely due to the possible syntactic and pragmatic connection

between restaurants and hotels which customers might conceive; as well as

McDonald’s coining their own McLanguage concept for product awareness

purposes (Lentine and Shuy, 1990). Quality Inn utilised corpus linguistics to

locate usage of Mc outside of hotel and restaurant chains, with a defence that

insisted the prefix Mc had a wider, more established meaning that implied good

value across different industries. Nevertheless, it proved fruitless against market

research which dictated McDonald’s was at the commercial forefront of Mc

associations, and thus genericide was implied to be a direct risk to the

company’s products (Coulthard, 2005).

In conclusion, linguistic evidence can tilt a case in some ways - for example the

Birks murder case - but it can also be wildly misinterpreted by authorities, as

demonstrated in the unwarranted execution of Derek Bentley. Further, even if

linguistic evidence appears strong in theory, courts must listen to both sides in

any given case, as in McDonald’s vs Quality Inn; and, like the cases dealing with

loss of human life, authorship attribution of marked lexemes or phrases is not a

perfect endeavour, instead acting only as a supporting element of any given

legal defence or construction. In essence, linguistic approaches to authorship

attribution can offer alternative perspectives to the courts, but viewing it in an

absolute manner is clearly a questionable route for juries to take.

98

References

Butters, R. R. (2007). A linguistic look at trademark dilution. Santa Clara Computer & High Tech. LJ, 24, 507.

Butters, R. R. (2020). Trademark linguistics: Trademarks: language that one owns. In The Routledge handbook of forensic linguistics (pp. 364-381). Routledge.

Chaski, C. E. (2005). Who’s at the keyboard? Authorship attribution in digital evidence investigations. International journal of digital evidence, 4(1), 1-13.

Cotterill, J. (2010). How to use corpus linguistics in forensic linguistics. In The Routledge handbook of corpus linguistics (pp. 578-590). Routledge.

Cotterill, J. (Ed.). (2002). Language in the legal process. Springer.

Coulthard, M. (2005). The linguist as expert witness. Linguistics and the Human Sciences, 1(1), 39.

Coulthard, M. (2010). Forensic Linguistics: the application of language description in legal contexts. Langage et société, (2), 15-33.

Coulthard, M., & Johnson, A. (Eds.). (2010). The Routledge handbook of forensic linguistics (pp. 473-486). London: Routledge.

Coulthard, M., Johnson, A., & Wright, D. (2016). An introduction to forensic linguistics: Language in evidence. Routledge.

Grant, T. (2012). TXT 4N6: method, consistency, and distinctiveness in the analysis of SMS text messages. JL & Pol'y, 21, 467.

99

Lentine, G., & Shuy, R. W. (1990). Mc-: Meaning in the marketplace. American speech, 65(4), 349-366.

Olsson, J., & Luchjenbroers, J. (2014). Forensic linguistics. Bloomsbury Publishing.

Tiersma, P. M., & Solan, L. M. (Eds.). (2012). The Oxford handbook of language and law. OUP Oxford.

Tiersma, P., & Solan, L. M. (2002). The linguist on the witness stand: forensic linguistics in American courts. Language, 221-239.

Wright, D. (2014). Stylistics versus Statistics: A corpus linguistic approach to combining techniques in forensic authorship analysis using Enron emails (Doctoral dissertation, University of Leeds).

McLanguage: Language as

Next Article

Research Methods for ELT

McLanguage: Language as Evidence and Defining Linguistic Authorship

Joshua Richardson, 3rd Year: Forensic Linguistics

89

90

91

92

93

94

95

96

97

98

References

99

More articles from this publication:

Research Methods for ELT

Mythbusting Monolingualism and Endangered Languages Martin questions

This article is from:

The Swansea Applied Linguistics Journal Issue 5 (Summer, 2022)