Developing Online Czech Proofreader Tool: Achievements, Limitations and Pitfalls

Main Article Content

Dana Hlaváčková
Hana Žižková
Klára Dvořáková
Martkéta Pravdová

Abstrakt

This paper deals with achievements, limitations and pitfalls of developing the Online Czech Proofreader Tool (OCPT). The Tool has been developed in cooperation with the Department of Czech language of the Faculty of Arts of Masaryk University, the Institute of Theoretical and Computational Linguistics of the Faculty of Arts of Charles University, the Czech Language Institute of the Czech Academy of Sciences and Seznam.cz since 2019. The article describes the linguistic data used and tools and modules that constitute the OCPT and indicates the limitations of using an online web-based proofreader tool, especially in areas where mere application of formal rules for language error detection is not sufficient. The article also brings up the drawbacks of developing the OCPT which include occurrence of false-positives.

Downloads

Download data is not yet available.

Article Details

Jak cytować
Hlaváčková, D., Žižková, H., Dvořáková, K., & Pravdová, M. (2022). Developing Online Czech Proofreader Tool: Achievements, Limitations and Pitfalls. Bohemistyka, (1), 122-134. https://doi.org/10.14746/bo.2022.1.7
Dział
ARTYKUŁY I STUDIA

Referencje

  1. Audy Masopustová, Markéta et al. (2021). Lingvista versus stroj: Rozdíl ve zpracování jazykových rovin – úskalí, možnosti a meze. In Wyraz i zdanie w językach słowiańskich, Wrocław.
  2. Language Enquiry Database („Databáze jazykových dotazů“). (2016–2022). Praha: ÚJČ AV ČR. Accessible at: https://dotazy.ujc.cas.cz.
  3. Garbe, Wolf. (2020). SymSpell, version 6.7. Available at: https://github.com/wolf-garbe/symspell.
  4. Hajič, Jan et al. (2020). MorfFlex CZ 2.0. Data/software, LINDAT-CLARIAH. Available at: http://hdl.handle.net/11234/1-3186.
  5. Internet Language Reference Book. (2008–2022). Praha: ÚJČ AV ČR. Accessible at: https://prirucka.ujc.cas.cz/.
  6. Kovář, Vojtěch. et al. (2011). Syntactic Analysis Using Finite Patterns: A New,Parsing System for Czech. In Human Language Technology. Challenges for Computer Science and Linguistics, pp. 161–171. Berlin/Heidelberg: Springer.
  7. Machura, Jakub et al. (2019). Comparing majka and MorphoDiTa for Automatic Grammar Checking. In Thirteen Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2019. Brno: Tribun EU, pp. 3–14.
  8. Michelfeit, Jan et al. (2014). Text Tokenisation Using unitok. In 8th Workshop on Recent Advances in Slavonic Natural Language Processing, pp. 71–75. Brno: Tribun EU.
  9. Opravy pravopisu a gramatiku v Dokumentech Google. (2021). Google. Available at: https://support.google.com/docs/answer/57859?co=GENIE.Platform%3DAndroid&hl=cs#zippy=.
  10. Petkevič, Vladimír. (2014). Kontrola české gramatiky (český grammar checker). Studies in Applied Linguistics, 2014(2), pp. 48–86.
  11. Pravdová, Markéta, Svobodová, Ivana (eds.). (2019). Akademická příručka českého jazyka. 2nd edition. Praha: Academia, 600 p.
  12. Straková, Jana et al. (2014). Open-Source Tools for Morphology, Lemmatization, POS Tagging and Named Entity Recognition. In Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 13–18, Baltimore, Maryland. Association for Computational Linguistics.
  13. Suchomel, Vít. (2018). csTenTen17, a Recent Czech Web Corpus. In Proceedings of the Twelfth Workshop on Recent Advances in Slavonic Natural Languages Processing, RASLAN 2018, pp. 111–123. Brno: Tribun EU.
  14. Svobodová, Ivana et al. (2015). Psaní velkých písmen v češtině. Praha: Academia, 350 p.
  15. Svobodová, Ivana. (2019). Věrohodnost elektronických zdrojů jazykových dat. Český jazyk a literatura, 2018–2019, 69(5), pp. 249–251.
  16. Šmerk, Pavel. (2008). K morfologické desambiguaci češtiny. Rigorózní práce. Masarykova univerzita, Fakulta informatiky. Brno.
  17. Šmerk, Pavel. (2014). Tools for Fast Morphological Analysis Based on Finite State Automata. In Eighth Workshop on Recent Advances in Slavonic Natural Language Processing, pp. 147–150. Brno: Tribun EU.
  18. Vernon, Alex. (2000). Computerized Grammar Checkers 2000: Capabilities, Limitations, and Pedagogical Possibilities. Computers and Composition, 17, pages 329–349.
  19. Vostřelová, Klára. (2019). Automatická detekce chyb v psaní velkých písmen v češtině. Brno: Masarykova univerzita.