forensic linguistic

How to Cite

BLACKWELL, S. (2009). WHY FORENSIC LINGUISTICS NEEDS CORPUS LINGUISTICS. Comparative Legilinguistics, 1, 5–19. https://doi.org/10.14746/cl.2009.01.01


While corpus linguistics has existed since the 1960s, Forensic Linguistics is
a relatively new discipline, involving both linguistic evidence in court and wider applications of linguistics to legal texts and discourses. Computer corpora of natural language may be marked up in various ways, grammatically tagged, parsed, lemmatised and analysed with concordance, collocation and other specialist soft ware. In the relatively short history of forensic linguistics, its exponents have oft en employed corpus linguistics techniques in order
to throw light on questions like disputed authorship. However, the corpora employed have been general ones such as the Cobuild “Bank of English”, rather than purpose-built databases of language used in legal contexts, with the result that such research sometimes raises more questions than it answers. Conversely, corpus linguists have from time to time incorporated
data from legal settings into their collections; but they have tended to use these resources as the basis for sociolinguistic or historical linguistic research rather than as a means of exploring topics in language and law. This paper makes a plea for these two fi elds, which are both already cross-disciplinary, to join forces and create a purpose-built corpus for forensic linguistics. It illustrates how corpus techniques may be successfully applied to questions of disputed authorship, citing both hypothetical and actual examples. It ends with an outline of the kinds of texts which a proposed new corpus for Forensic Linguistics should contain and the tools required to exploit it eff ectively.



Berk-Seligson, S., 1990, The Bilingual Courtroom: Court Interpreters in the Judicial Process. Chicago: University of Chicago Press.

Blackwell, 2000, “Looking up look: Discourse Markers in the Bank of English”. In Kirk, J. ed., Corpora Galore: Analyses and Techniques in Describing English. Amsterdam: Rodopi.

Botley, S., McEnery, A. and Wilson, A. (eds). 2000, Multilingual Corpora: Teaching and Research. Amsterdam: Rodopi.

Burnard, L. ed., 1995, Users’ Reference Guide to the British National Corpus. Oxford: Oxford University Computing Service.

Collins Cobuild English Dictionary for Advanced Learners, 2001. London: Harper Collins.

Coulthard, M., 1994, “Powerful evidence for the defence: an exercise in forensic discourse analysis”. In Gibbons, ed., Language and the Law. London and New York: Longman.

Coulthard, M., 1996, “The official version: Audience manipulation in police records of interviews with suspects”. In Caldas-Coulthard and Coulthard (ed.s), Texts and Practices: Readings in Critical Discourse Analysis. London: Routledge.

Eades, D., 1994, “A case of communicative clash: Aboriginal English and the legal system.” in Gibbons, ed., Language and the Law, London: Longman.

Fox, G., 1993, "A Comparison of 'Policespeak' and 'Normalspeak': a Preliminary Study". In Sinclair, Hoey and Fox (ed.s), Techniques of Description: Spoken and Written Discourse. A festschrift for Malcolm Coulthard. London: Routledge.

Garside, Leech and Sampson ed.s, 1987, The computational analysis of English: a corpus-based approach. London: Longman.

Goutsos, D., 1995, Review article: “Forensic Stylistics”. In Forensic Linguistics vol. 2 no. 1.

Granger, S. (1994) “The Learner Corpus: A Revolution in Applied Linguistics”. English Today 39(10/3), pp. 25-29.

Greenbaum, S., ed., 1996, Comparing English worldwide: the International Corpus of English. Oxford: Clarendon Press.

Hockey, S. and Martin, J., 1987, "The Oxford Concordance Program Version 2", in Literary and Linguistic Computing, 2, pp. 125-131.

Huber, M., 2007, 'Playing tag with Old Bailey. Creating a corpus of 18th-century spoken English' . In Meurman-Solin, A. and Nurmi, A. (eds.): eVARIENG 1: Proceedings of the workshop on corpus annotation at ICAME 27, Helsinki, 24-28 May 2006. Helsinki: VARIENG.

Hymes, D., 1974, "Ways of Speaking", in R. Bauman and J. Sherzer (ed.s), Explorations in the Ethnography of Speaking. Cambridge: C.U.P.

Ihalainen, O., M. Kytö and M. Rissanen, 1987, "The Helsinki Corpus of English Texts: Diachronic and Dialectal: Report on work in progress", in Corpus Linguistics and Beyond. Proceedings of the Seventh International Conference on English Language Research on Computerized Corpora, ed. W. Meijs. Amsterdam: Rodopi.

Kirk, J., 1994, “Taking a Byte at Corpus Linguistics”, in Entering Text, edited by Flowerdew, L. and Tong, A.K.K., pp. 18-43.

Klemola, J. and Jones, M.J., 1999, “The Leeds corpus of English dialects – project”. In Leeds Studies in English 30: 17-30.

Kniffka, H., 2000, “Anonymous Authorship Analysis without Comparison Data? A Case Study with methodological impact”. In: Linguistische Berichte 182, 179-198.

Kucera, H. and Francis, W.N., 1967, Computational analysis of present-day American English. Providence, R.I.: Brown Univ. Press.

Kytö, M. 1994, Manual to the Diachronic Part of the Helsinki Corpus of English Texts: Coding Conventions and Lists of Source Texts, 2nd ed. Helsinki: Helsinki University Press for Department of English, University of Helsinki.

Lindsay, J. and O’Connell, D.C., 1995, “How do transcribers deal with audio recordings of spoken discourse?”, in Journal of Psycholinguistic Research, 2:101-115.

MacWhinney, B., 1995, The CHILDES-Project: Tools for Analyzing Talk. Second edition. Hillsdale, NJ: Lawrence Erlbaum

Nelson, Gerald (1996) The Design of the Corpus. In S. Greenbaum (ed.), pp. 27-35.

Okawara, M.H., 2006, A linguistic analysis of Some Japanese trademark cases. Ph.D. thesis, University of Sydney.

Okawara, M.H., forthcoming, “Legal Japanese viewed through the Unfair Competition Prevention Law”. In Gibbons, Prakasham and Tirumalesh, ed.s, Justice and Language. Delhi: Longman Orient.

Pajzs, J., 1991, “The Use of a Lemmatized Corpus for Compiling the Dictionary of Hungarian”, in Using Corpora: proceedings of the 7th annual conference of the UW Centre for the New OED and Text Research. Waterloo: UW Centre for the New OED and Text Research.

Renouf, A., 1987, “Corpus Development”. In Sinclair, ed.

Shapero, J.J., forthcoming, Ph.D. thesis, University of Birmingham.

Shuy, R., 1993, Language Crimes: the Use and Abuse of Language Evidence in the Courtroom. Cambridge, MA and Oxford: Blackwell.

Shuy, R., 2005, Creating Language Crimes: How Law Enforcement Uses (and Misuses) Language. Oxford: Oxford University Press.

Sinclair, J. 1982. “Reflections on computer corpora in English language research.” In Computer corpora in English language research, ed. Johansson, S.: 1-6. Bergen.

Sinclair, J.M., ed., 1987, Looking Up: An Account of the COBUILD Project in lexical computing. London: HarperCollins.

Solan, L., 1993, The Language of Judges. Chicago: U. of Chicago Press.

Stenström, Anna-Brita, Andersen, G., Hasund, K., Monstad, K. and Aas, H., 1998, User's manual to accompany the Bergen Corpus of London Teenage Language (COLT). Department of English, University of Bergen, Norway.

Svartvik, J. (1968) The Evans Statements: A case for forensic linguistics, Gothenburg: Gothenburg University Press

Tiersma, P.M., 1999, Legal language. Chicago: University of Chicago Press.

Walker, A.G., 1999, Handbook on Questioning Children: a Linguistic Perspective. Washington: American Bar Association.