Challenges of annotation and analysis in computer-assisted language comparison: A case study on Burmish languages

Main Article Content

Nathan W. Hill
Johann­Mattis List


The use of computational methods in comparative linguistics is growing in popularity. The increasing deployment of such methods draws into focus those areas in which they remain inadequate as well as those areas where classical approaches to language comparison are untransparent and inconsistent. In this paper we illustrate specific challenges which both computational and classical approaches encounter when studying South-East Asian languages. With the help of data from the Burmish language family we point to the challenges resulting from missing annotation standards and insufficient methods for analysis and we illustrate how to tackle these problems within a computer-assisted framework in which computational approaches are used to pre-analyse the data while linguists attend to the detailed analyses.


Download data is not yet available.

Article Details

How to Cite
Hill, N. W., & List, J. (2017). Challenges of annotation and analysis in computer-assisted language comparison: A case study on Burmish languages. Yearbook of the Poznań Linguistic Meeting, 3(1), 47-76.


  1. Atkinson, Q. and R. Gray. 2006. “How old is the Indo-European language family? Illumination or more moths to the flame?” In: Forster, P. and C. Renfrew (eds.), Phylogenetic methods and the prehistory of languages. Cambridge, Oxford and Oakville: McDonald Institute for Archaeological Research. 91–109.
  2. Bagga, A. and B. Baldwin. 1998. “Entity-based cross-document coreferencing using the vector space model”. In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics. Association of Computational Linguistics. 79–85.
  3. Blevins, J. 2004. Evolutionary phonology. The emergence of sound patterns. Cambridge: Cambridge University Press.
  4. Burling, R. 1967. Proto-Lolo-Burmese. Bloomington: Indiana University Press.
  5. Butler, A. and W. Saidel. 2000. “Defining sameness: Historical, biological, and generative homology”. BioEssays 22. 846–853.
  6. Campbell, L. 2013. Historical linguistics. Edinburgh: Edinburgh University Press.
  7. Clerk, F. 1911. A manual of the Lawngwaw or Maru language, containing: the grammatical principles of the language, glossaries of special terms, colloquial exercises, and Maru–English and English–Maru vocabularies. Rangoon: American Baptist mission Press.
  8. Corel, E., P. Lopez, R. Méheust and E. Bapteste. 2016. “Network-thinking: Graphs to analyze microbial complexity and evolution”. Trends in Microbiology 24(3). 224–237.
  9. Covington, M. 1996. “An algorithm to align words for historical comparison”. Computational Linguistics 22(4). 481–496.
  10. Dixon, R. and A. Kroeber. 1919. Linguistic families of California. Berkeley: University of California Press.
  11. Dunn, M. (ed.). 2012. Indo-European lexical cognacy database (IELex). .
  12. Fox, A. 1995. Linguistic reconstruction. An introduction to theory and method. Oxford; Oxford University Press.
  13. François, A. 2008. “Semantic maps and the typology of colexification: Intertwining polysemous networks across languages”. In: Vanhove, M. (ed.), From polysemy to semantic change.Amsterdam: Benjamins. 163–215.
  14. Gabelentz, G. v. d. 1891. Die Sprachwissenschaft. Ihre Aufgaben, Methoden und bisherigen Ergebnisse. Leipzig: T. O. Weigel.
  15. Gabelentz, G. v. d. 1892. Handbuch zur Aufnahme fremder Sprachen [Hand-book for the description of foreign languages]. Berlin: Ernst Siegfried Mittler & Sohn.
  16. Greenhill, S., R. Blust and R. Gray. 2008. “The Austronesian Basic Vocabulary Database: From bioinformatics to lexomics”. Evolutionary Bioinformat-ics 4. 271–283.
  17. Haas, M. 1969. The prehistory of languages. Mouton: The Hague and Paris.
  18. Hammarström, H., R. Forkel and M. Haspelmath. 2017. Glottolog. Leipzig: Max Planck Institute for Evolutionary Anthropology.
  19. Holm, H. 2007. “The new arboretum of Indo-European ‘trees’. Can new algorithms reveal the phylogeny and even prehistory of Indo-European?” Journal of Quantitative Linguistics 14(2–3). 167–214.
  20. Huáng Bùfán 黃布凡 .1992. Zàngmiǎn yǔzú yǔyán cíhuì [A Tibeto-Burman lexi-con]. Zhōngyāng Mínzú Dàxué 中央民族大学 [Central Institute of Minorities]: Běijīng 北京.
  21. Jenny, M. and P. Sidwell (eds.). 2015. The handbook of Austroasiatic languages. Leiden and Boston: Brill.
  22. Kiparsky, P. 1988. “Phonological change”. In: Newmeyer, F. (ed.), The Cam-bridge Survey of Linguistics (vol. 1). Cambridge: Cambridge University Press. 363–415.
  23. Koerner, E. 1976. “Zu Ursprung und Geschichte der Besternung in der historischen Sprachwissenschaft. Eine historiographische Notiz”. Zeitschrift für vergleichende Sprachforschung 89(2). 185–190.
  24. Kondrak, G. 2000. “A new algorithm for the alignment of phonetic sequences”. In: Proceedings of the 1st North American chapter of the As-sociation for Computational Linguistics conference. 288–295.
  25. Koonin, E. 2005. “Orthologs, paralogs, and evolutionary genomics”. Annual Review of Genetics 39. 309–338.
  26. Kroonen, G. 2013. Etymological dictionary of Proto-Germanic. Leiden and Boston: Brill.
  27. Kürschner, W. 2014. “Georg von der Gabelentz’ Handbuch zur Aufnahme fremder Sprachen (1892). Entstehung, Ziele, Arbeitsweise, Wirkung“. In: Ezawa, K., F. Hundsnurscher and A. Vogel (eds.), Beiträge zur Gabelentz-Forschung. Tübingen: Narr. 239–259.
  28. Labov, W. 1981. “Resolving the Neogrammarian Controversy”. Language 57(2). 267–308.
  29. List, J.-M. 2012. “LexStat. Automatic detection of cognates in multilingual wordlists”. In: Proceedings of the EACL 2012 Joint Workshop of Visuali-zation of Linguistic Patterns and Uncovering Language History from Multilingual Resources. 117–125.
  30. List, J.-M., A. Terhalle and M. Urban. 2013. “Using network approaches to enhance the analysis of cross-linguistic polysemies”. In: Proceedings of the 10th International Conference on Computational Semantics – Short Papers. Association for Computational Linguistics. 347–353.
  31. List, J.-M., S. Nelson-Sathi, W. Martin and H. Geisler. 2014. “Using phylogenetic networks to model Chinese dialect history”. Language Dynamics and Change 4(2). 222–252.
  32. List, J.-M. 2014. Sequence comparison in historical linguistics. Düsseldorf: Düsseldorf University Press.
  33. List, J.-M. 2015. “Network perspectives on Chinese dialect history”. Bulletin of Chinese Linguistics 8. 42–67.
  34. List, J.-M., M. Cysouw and R. Forkel. 2016. “Concepticon. A resource for the linking of concept lists”. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation. 2393–2400.
  35. List, J.-M. and R. Forkel. 2016. LingPy. A Python library for historical linguis-tics. Jena: Max Planck Institute for the Science of Human History.
  36. List, J.-M. 2016. “Beyond cognacy: Historical relations between words and their implication for phylogenetic reconstruction”. Journal of Language Evolution 1(2). 119–136.
  37. List, J.-M., P. Lopez and E. Bapteste. 2016. “Using sequence similarity net-works to identify partial cognates in multilingual wordlists”. In: Proceedings of the Association of Computational Linguistics 2016. (Volume 2: Short Papers.) Association of Computational Linguistics. 599–605.
  38. List, J.-M., S. Greenhill and R. Gray. 2017. “The potential of automatic word comparison for historical linguistics”. PLOS ONE 12(1). 1–18.
  39. List, J.-M. 2017. “A web-based interactive tool for creating, inspecting, editing, and publishing etymological datasets”. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. System Demonstrations. 9-12.
  40. Luce, G.H. 1985. Phases of Pre-Pagán Burma: Languages and history. Oxford: Ox-ford University Press.
  41. Makaev, E. 1977. Obščaja teorija sravnitel’nogo jazykoznanija [General theory of comparative linguistics]. Moscow: Nauka.
  42. Malkiel, Y. 1954. “Etymology and the structure of word families”. Word 10(2–3). 265–274.
  43. Mann, N. 1998. A phonological reconstruction of Proto Northern Burmic. (MA thesis, the University of Texas at Arlington.)
  44. Matisoff, J. 2015. The Sino-Tibetan Etymological Dictionary and Thesaurus project. Berkeley: University of California.
  45. McMahon, A. and R. McMahon. 2005. Language classification by numbers. Oxford: Oxford University Press.
  46. Meier-Brügger, M. 2002. Indogermanische Sprachwissenschaft. Berlin: de Gruyter.
  47. Meiser, G. 1998. Historische Laut- und Formenlehre der lateinischen Sprache. Darmstadt: Wissenschaftliche Buchgesellschaft.
  48. Morrison, D. 2015. “Molecular homology and multiple-sequence alignment: an analysis of concepts and practice”. Australian Systematic Botany 28. 46–62.
  49. Nishi, Y. 1999. Four papers on Burmese: Toward the history of Burmese (the Myanmar language). Tokyo: Institute for the study of languages and cultures of Asia and Africa, Tokyo University of Foreign Studies.
  50. Norquest, P. 2007. A phonological reconstruction of Proto-Hlai. (PhD dissertation, The University of Arizona.)
  51. Okell, J. 1971. “K Clusters in Proto-Burmese”. Paper presented at the Sino-Tibetan Conference, October 8–9, 1971. Bloomington, IN.
  52. Payne, D. 1991. “A classification of Maipuran (Arawakan) languages based on shared lexical retentions”. In: Derbyshire, D. and G. Pullum (eds.), Handbook of Amazonian languages (vol. 3). Berlin: Mouton de Gruyter. 355–499.
  53. Prokić, J., M. Wieling and J. Nerbonne. 2009. “Multiple sequence alignments in linguistics”. In: Proceedings of the EACL 2009 Workshop on Language Technology and Resources for Cultural Heritage, Social Sciences, Humanities, and Education. 18–25.
  54. Ratliff, M. 2010. Hmong-Mien language history. Canberra: Pacific Linguistics.
  55. Schwink, F. 1994. Linguistic typology, universality and the realism of recon-struction. Washington: Institute for the Study of Man.
  56. Smoot, M., K. Ono, J. Ruscheinski, P. Wang and T. Ideker. 2011. “Cytoscape 2.8. New features for data integration and network visualization”. Bioinformatics 27(3). 431–432.
  57. Steiner, L., P. Stadler and M. Cysouw. 2011. “A pipeline for computational historical linguistics”. Language Dynamics and Change 1(1). 89–127.
  58. Sturtevant, E. 1920. The pronunciation of Greek and Latin. Chicago: University of Chicago Press.
  59. Swadesh, M. 1963. “A punchcard system of cognate hunting”. International Journal of American Linguistics 29(3). 283–288.
  60. Urban, M. 2011. “Asymmetries in overt marking and directionality in semantic change”. Journal of Historical Linguistics 1(1). 3–47.
  61. Vaan, M. 2008. Etymological dictionary of Latin and the other Italic languages. Leiden: Brill.
  62. Wannemacher, M. 2011. A phonological overview of the Lacid language. Chiang Mai: Linguistics Institute, Payap University.