Crossroads Corpus creation: Design and case study

Main Article Content

Abbie Hantgan-Sonko


This paper illustrates a methodological approach to the design of an annotated corpus using a case study of phonetic convergences and divergences by multilingual speakers in southwestern Senegal’s Casamance region. The newly compiled corpus contains approximately 183,000 annotations of multilingual, spoken data, gathered by eight researchers over a ten year span using methods ranging from structured lexical elicitation in controlled contexts to naturally occurring, multilingual conversations. The area from which the data were collected consists of three villages and their primary languages, and yet many more contribute to the linguistic landscape. Detailed metadata inform analyses of variation, the context in which a speech act took place and between whom, the speakers’ linguistic repertoires, trajectories, and social networks, as well as the larger language context. A potential path for convergence or divergence that emerged during data collection and in building and searching the corpus is the crossroads in the phonetic production of word-initial velar plosives. Word-initial [k] emerges in one language where only [ɡ] is present in the other; the third utilizes both. The corpus design makes it feasible, not only to identify areas of accommodation, but to grasp the context, enabling a sociolinguistically informed analysis of the speakers’ linguistic behavior.


Download data is not yet available.

Article Details

How to Cite
Hantgan-Sonko, A. (2017). Crossroads Corpus creation: Design and case study. Yearbook of the Poznań Linguistic Meeting, 3(1), 167-198.


  1. Achard, M. and S. Lee. 2016. “Toward a model of multilingual usage”. In: Ortega, L., A.E. Tyler, H.I. Park and M. Uno (eds.), The usage-based study of language learning and multilingualism. Washington, DC: Georgetown University Press. 255–275.
  2. Ayala, A. 2011. Phonetic convergence: A case study of a Puerto Rican Spanish speak-er (senior essay). New Haven: Yale University.
  3. Beyer, K. 2015. “Multilingual speakers in a West-African contact zone: An integrated approach to contact-induced language change”. In: Stell, G. and K. Yakpo (eds.), Code-switching between structural and sociolinguistic perspectives. Berlin: De Gruyter Mouton. 237–258.
  4. Biber, D. 1993. “Representativeness in corpus design”. Literary and Linguistic Compu-ting 8(4). 243–257.
  5. Boersma, P. and D. Weenink. 2017. Praat: doing phonetics by computer [computer program]. Retrieved from <>. (Version 6.0.29.)
  6. Chafe, W. (ed.). 1980. The pear stories: Cognitive, cultural, and linguistic aspects of narra-tive production. Norwood, NJ: Ablex.
  7. Chang, C. 2013. “A novelty effect in phonetic drift of the native language”. Journal of Phonetics 41. 520–533.
  8. Cobbinah, A. 2010. “Casamance as an area of intense language contact”. Journal of language contact THEMA 3. 175–201.
  9. Cobbinah, A. (2013). Nominal classification and verbal nouns in Baïnounk Gubëeher (PhD dissertation, SOAS, London.)
  10. Cobbinah, A., A. Hantgan, F. Lüpke and R. Watson. 2017. “Carrefour des langues, car-re¬four des paradigmes”. In: Auzeanneau, M. (ed.), Pratiques plurilingues, mobilités et éducation. Edition des Archives Contemporaines.
  11. Dreyfus, M. and C. Juillard, C. 2005. Le plurilinguisme au Sénégal: langues et identités en devenir. Paris: Karthala.
  12. Durand, J., U. Gut and G. Kristoffersen. 2014. The Oxford handbook of corpus phonology. Oxford: Oxford University Press.
  13. Flege, J. and W. Eefting. 1987. “Cross-language switching in stop consonant percep-tion and production by Dutch speakers of English”. Speech Communication 6(3). 185–202.
  14. Fowler, C., V. Sramko, D. Ostry, S. Rowland, and P. Hallé. 2008. “Cross language phonetic influences on the speech of French–English bilinguals”. Journal of Phonet-ics 36. 649–663.
  15. Gaved, T. and S. Salffner. 2014. “Working with ELAN and FLEx together: an ELAN-FLEx-ELAN teaching set”. <>
  16. Gibbon, D., R. Moore and R. Winski (eds.). 1997. Handbook of standards and re-sources for spoken language systems. Berlin: de Gruyter Mouton.
  17. Goodchild, S. 2016. “‘Which language(s) are you for?’ ‘I am for all the languages.’ Reflections on breaking through the ancestral code: Trials of sociolinguistic docu-mentation”. SOAS Working Papers in Linguistics 18. 75–91.
  18. Goodchild, S., M.P.S. Cooper, R. Watson and A. Cobbinah. 2013. New methods in the field and new data in the lab: Research methods in multilingualism. London: SOAS, University of London.
  19. Goodchild, S. and M. Weidl. 2016a. Documentation of speakers’ linguistic practices in two sociolinguistically diverse settings in the Casamance, Senegal. (Language Documentation and Linguistic Theory 5.)
  20. Goodchild, S. and M. Weidl. 2016b. “Translanguaging practices in the Casamance, Senegal”. Paper presented at the joint KPAAM-CAM and Crossroads workshop. SOAS, London.
  21. Green, D.W. and J. Abutalebi. 2013. “Language control in bilinguals: The adaptive con-trol hypothesis”. Journal of Cognitive Psychology 25(5). 515–530.
  22. Gries, S. and A. Berez. (To apprear.) “Linguistic annotation in/for corpus linguistics”. In: Ide, N. and J. Pustejovsky (eds.), Handbook of linguistic annotation. Berlin: Springer.
  23. Hantgan, A. 2016. “How foreign is accent? Expressions of peace in Casamance”. In: Voices from around the world, Special issue on multilingualism in the Global South. Cologne: University of Cologne: Global South Studies Center.
  24. Hantgan, A. 2017. “Choices in language accommodation at the Crossroads: conver-gence, divergence, and mixing”. Journal of the Anthropological Society of Oxford IX(1). 102–118.
  25. Himmelmann, N.P. 1998. “Documentary and descriptive linguistics”. Linguistics 36. 161–195.
  26. IBM SPSS statistics for Windows. 2016. Armonk, NY: IBM Corp. <> (Ver. 24.0.)
  27. Kennedy, G. 1998. An introduction to corpus linguistics. London: Longman.
  28. Lüpke, F. 2005. “Small is beautiful: contributions of field-based corpora to different linguistic disciplines, illustrated by Jalonke”. Language Documentation and Description 3. 75–105.
  29. Lüpke, F. 2016a. “Multiple choice: Language use and cultural practice in rural Casamance between convergence and divergence”. In: Knörr, J. and W.T. Filho (eds.), Creole languages and postcolonial diversity. Berghahn: Oxford.
  30. Lüpke, F. 2016b. “Perspectives on small-scale multilingualism”. Paper presented at the joint KPAAM-CAM and Crossroads workshop. SOAS, London.
  31. Lüpke, F. 2016c. “Towards a typology of small-scale multilingualism”. Critical Multi-lingualism Studies 4(2). 35–74.
  32. Lüpke, F. and A. Storch. 2013. Repertoires and choices in African languages. Berlin: De Gruyter Mouton.
  33. Mikhailov, M. and R. Cooper. 2016. Corpus linguistics for translation and contrastive studies: A guide for research. London: Routledge.
  34. Mosel, U. 2015. “Searches with regular expressions in ELAN corpora”. <>
  35. Newman, P. 2013. “The law of unintended consequences: How the endangered languages movement undermines field linguistics as a scientific enterprise”. Paper presented at the Linguistics Departmental Seminar Series. SOAS, University of London.
  36. O’Keeffe, A. and M. McCarthy (eds.). 2008. The Routledge handbook of corpus linguistics. London: Routledge.
  37. Ortega, L., A.E. Tyler, H.I. Park and M. Uno (eds.). 2016. The usage-based study of language learning and multilingualism. Washington, DC: Georgetown University Press.
  38. Pozdniakov, K. and G. Segerer. (In press.) “A new classification of Atlantic languages”. In: Lüpke, F. (ed.), The Oxford guide to the Atlantic languages of West Africa. Ox-ford: Oxford University Press.
  39. Rosch, E. 1973. “Natural categories”. Cognitive Psychology 4. 328–350.
  40. Sagna, S. 2008. Formal and semantic properties of the Gújjolaay Eegimaa (a.k.a Banjal) nominal classification system. (PhD dissertation, SOAS, London.)
  41. Sagna, S. 2016. “‘Research Impact’ and how it can help endangered languages”. Ogmios 59. 5–8.
  42. Sancier, M. and C. Fowler. 1997. “Gestural drift in a bilingual speaker of Brazilian Por-tuguese and English”. Journal of Phonetics 25. 421–436.
  43. Schmidt, T. and K. Wörner (eds.). 2012. Multilingual corpora and multilingual corpus analysis (Vol. 14). Amsterdam: John Benjamins.
  44. Segerer, G. and S. Flavier, S. 2011–2016. Reflex: Reference lexicon of Africa. Paris, Lyon. <>. (Version 1.1.)
  45. Silverstein, M. 2003. “Indexical order and the dialectics of sociolinguistic life”. Language and Communication 23. 193–229.
  46. Simons, G.F. and C.D. Fennig (eds.). 2017. Ethnologue: Languages of the world (20th edn.). Dallas, TX: SIL International. <>.
  47. Voormann, H. and U. Gut. 2008. “Agile corpus creation”. Corpus Linguistics and Linguistic Theory 4(2). 235–251.
  48. Watson, R. 2015. Verbal nouns in Joola Kujirerai. (PhD dissertation, SOAS, London.)
  49. Watson, R. 2017. “Deviation from the norm”. Paper presented at the Fourth Interna-tional Conference on Language Contact in Times of Globalization (LCTG4) work-shop. Greifswald, Germany.
  50. Watson, R. (In press.) Languages as categories: using prototype theory to create reference points for the study of multilingual data.