Abstrakt
In this paper we discuss the problem of building the Polish lexicon for the Cyc ontology. As the ontology is very large and complex we describe semi-automatic translation of part of it, which might be useful for tasks lying on the border between the fields of Semantic Web and Natural Language Processing. We concentrate on precise identification of lexemes, which is crucial for tasks such as natural language generation in massively inflected languages like Polish, and we also concentrate on multi-word entries, since in Cyc for every 10 concepts, 9 of them is mapped to expressions containing more than one word.Bibliografia
Amaro, R., Chaves, R.P., Marrafa, P., Mendes, S.: Enriching Wordnets with new Relations and with Event and Argument Structures. In: Seventh International Conference on Intelligent Text Processing and Computational Linguistics. pp. 28 – 40 (2006).
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Ives, Z.: DBpedia: A nucleus for a web of open data. Machine Translation 14(2), 113–157 (2005)
Chrza˛szcz, P.: Automatyczne rozpoznawanie i klasyfikacja nazw wielosegmentowych na podstawie analizy haseł encyklopedycznych. Master’s thesis, UST, Cracow (2009)
Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press (1998).
Jurafsky, D., Martin, J.H.: Speech and language processing (second edition). Prentice Hall (2009).
Lenat, D.B.: CYC: A large-scale investment in knowledge infrastructure. Communications of the ACM 38(11), 33–38 (1995)
Nastase, V., Strube, M., Börschinger, B., Zirn, C., Elghafari, A.: WikiNet: A Very Large Scale Multi-Lingual Concept Network. In: Proceedings of the Seventh conference on International Language Resources and Evaluation, (LREC’10) (2010).
Piasecki, M., Szpakowicz, S., Broda, B.: A Wordnet from the Ground Up. Oficyna Wydawnicza Politechniki Wrocławskiej (2009)
Pisarek, P.: Słowniki komputerowe i automatyczna ekstrakcja informacji z tekstu, chap. Słownik fleksyjny, pp. 37–68. Uczelniane Wydawnictwo Naukowo-Dydaktyczne AGH (2009).
Pohl, A.: Automatic Construction of the Polish Nominal Lexicon for the OpenCyc Ontology, pp. 51–64. EXIT (2009)
Przepiórkowski, A.: The potential of the IPI PAN corpus. Pozna´n Studies in Contemporary Linguistics 41, 31–48 (2006)
Sarjant, S., Legg, C., Robinson, M., Medelyan, O.: “All You Can Eat” Ontology-Building: Feeding Wikipedia to Cyc. In: Web Intelligence’09. pp. 341–348 (2009)
Somers, H.: Review Article: Example-based Machine Translation. Machine Translation 14(2), 113–157 (2005).
Suchanek, F., Kasneci, G., Weikum, G.: YAGO: A Large Ontology from Wikipedia and WordNet. Web Semantics: Science, Services and Agents on the World Wide Web 6, 203– 217 (2008).
Woli´nski, M.: System znaczników morfosyntaktycznych w korpusie IPI PAN. Polonica XII, 39–54 (2004).
Woli´nski, M.: Morfeusz – a Practical Tool for the Morphological Analysis of Polish. In: Intelligent Information Processing and Web Mining, IIS:IIPWM’06 Proceedings. pp. 503 512,. Springer (2006).