Abstract
Therearenumerousformatsforwritingspell-checkersforopen-source systems and there are many lexical descriptions for natural languages written in these formats. In this paper, we demonstrate a method for converting Hunspell and related spell-checking lexicons into finite-state automata. We also present a simple way to apply unigram corpus training in order to improve the spellcheckingsuggestionmechanismusingweightedfinite-statetechnology.Whatwe propose is a generic and efficient language-independent framework of weighted finite-stateautomataforspell checkingintypicalopen-sourcesoftware,e.g.Mozilla Firefox, OpenOffice and the Gnome desktop.
References
Beesley, K.R.: Constraining separated morphotactic dependencies in finite-state grammars. pp. 118–127. Association for Computational Linguistics, Morristown, NJ, USA (1998)
Beesley, K.R., Karttunen, L.: Finite State Morphology. CSLI publications (2003).
Brill, E., Moore, R.C.: An improved error model for noisy channel spelling correction. In: ACL ’00: Proceedings of the 38th Annual Meeting on Association for Computational Linguistics. pp. 286–293. Association for Computational Linguistics, Morristown, NJ, USA (2000).
Garrido-Alenda, A., Forcada, M.L., Carrasco, R.C.: Incremental construction and maintenance of morphological analysers based on augmented letter transducers (2002)
Koskenniemi, K.: Two-level Morphology: A General Computational Model for Word-Form Recognition and Production. Ph.D. thesis, University of Helsinki (1983), http://www. ling.helsinki.fi/~koskenni/doc/Two-LevelMorphology.pdf.
Lindén, K., Silfverberg, M., Pirinen, T.: Hfst tools for morphology—an efficient open-source package for construction of morphological analyzers. In: Mahlow, C., Piotrowski, M. (eds.) sfcm 2009. Lecture Notes in Computer Science, vol. 41, pp. 28—47. Springer (2009).
Mohri, M., Riley, M.: An efficient algorithm for the n-best-strings problem (2002).
Pirinen, T.A., Lindén, K.: Finite-state spell-checking with weighted language and error models. In: Proceedings of the Seventh SaLTMiL workshop on creation and use of basic lexical resources for less-resourced languagages. pp. 13–18. Valletta, Malta (2010), http: //siuc01.si.ehu.es/~jipsagak/SALTMIL2010_Proceedings.pdf
Wilcox-O’Hearn, L.A., Hirst, G., Budanitsky, A.: Real-word spelling correction with trigrams: A reconsideration of the mays, damerau, and mercer model. In: Gelbukh, A.F. (ed.) CICLing. Lecture Notes in Computer Science, vol. 4919, pp. 605–616. Springer (2008).