Abstract
The paper presents a method for the automatic diachronic normalization of Polish texts – the procedure, which, for a given historical text, returns its contemporary spelling. The method applies finite-state transducers, defined in a sublanguage of the Thrax formalism. The paper discusses linguistic issues, such as evolution in spelling of the Polish language, as well as implementation aspects, such as efficiency or testing the proposed method.References
Allauzen C., Riley M., Schalkwyk J., Skut W. and Mohri M., OpenFst: A General and Efficient Weighted Finite-State Transducer LIbrary, Proceedings of the Twelfth International Conference on Implementation and Application of Automata, (CIAA 2007), Lecture Notes in Computer Science, Vol. 4783. pp. 11-23. Prague, Czech Republic. Springer.
Bronikowska R., Modrzejewski E. The enrichment of the lexical information and the corpus resources by using the results of the morphological analysis of historical texts, http://www.elexicography.eu/wp-content/uploads/2017/03/ (downloaded on 2017-06-15).
Graliński F., 2013, Polish digital libraries as a text corpus, in: Zygmunt Vetulani and Hans Uszkoreit (eds.), Proceedings of 6th Language & Technology Conference, pp. 509-513. Fundacja Uniwersytetu im. Adama Mickiewicza.
Klemensiewicz Z., 1963, (ed.), Pisownia polska. Przepisy – słowniczek, Warszawa – Kraków – Wrocław – Łódź, Zakład im. Ossolińskich.
Lisowski T., 2010, Economic calculation and Polish alphabetic writing, in: Sekiguchi T. (ed.), The International Academic Conference “Meetings of the Three Polish Studies Centres in Asia – China, Korea, Japan”, pp. 195–204. Malinowski M., 2012, Ortografia polska od II poł. XVIII wieku do współczesności. Kodyfikacja, reformy, recepcja; praca doktorska, Uniwersytet Śląśki w Katowicach.
Mykowiecka A., Rychlik P., Waszczuk J., Building an Electronic Dictionary of Old Polish on the Base of the Paper Resource, in: Petya Osenova, Stelios Piperidis, Milena Slavcheva Cristina Vertan (eds.), Proceedings of the Workshop on Adaptation of Language Resources and Tools for Processing Cultural Heritage at LREC 2012, European Language Resources Association (ELRA), 2012, pp. 16-21.
Tai T., Sproat R., Skut W., 2011, Thrax: An Open Source Grammar Compiler Built on OpenFst, in: Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, IEEE, Piscataway, NJ.