Abstrakt
In this paper we present RefGen, a reference chains identification module for French. RefGen algorithm uses genre specific properties of reference chains and an accessibility measure to find the mentions of the referred entity. The module applies strong and weak constraints (lexical, morpho-syntactic, and semantic) to automatically identify coreference relations between referential expressions. We evaluate the results obtained by RefGen from a public reports corpus and we discuss the importance of the genre-dependent parameters to improve the reference chains identification.Bibliografia
F. Cornish, Références anaphoriques, références déictiques, et contexte prédicatif et énonciatif. Sémiotiques, 8, pp. 31–57, 1995.
C. Schnedecker, Nom propre et chaînes de référence. Recherches Linguistiques 21. Paris : Klincksieck, 1997.
V. Ng and C. Cardie, "Improving machine learning approaches to coreference resolution", in Proceedings of the ACL (Association For Computational Linguistics), Morristown, pp. 104 111, 2002.
V. Hoste, Optimization Issues in Machine Learning of Coreference Resolution. PhD thesis, 246 p, 2005.
S. Salmon-Alt, Référence et Dialogue finalisé : de la linguistique à un modèle opérationnel. PhD thesis, Université H. Poincaré, Nancy, 2001.
R. Mitkov, "Towards a more consistent and comprehensive evaluation of anaphora resolution algorithms and systems," Applied Artificial Intelligence: An International Journal, 15, pp. 253–276, 2001.
S. Hartrumpf, "Coreference Resolution with Syntactico-Semantic Rules and Corpus Statistics," in Proceedings of CoNLL (Computational Natural Language Learning Workshop), 2001.
A. Popescu-Belis, Modélisation multi-agent des échanges langagiers : application au problème de la référence et à son évaluation. PhD thesis, Université Paris-XI, 1999.
K. Bontcheva, M. Dimitrov, D. Maynard, V. Tablan, and H. Cunningham, "Shallow methods for named entity coreference resolution," in Proceedings of TALN 2002, 2002.
M. Ariel, Accessing Noun-Phrase Antecedents, London: Routledge, 1990.
W. Gegg-Harrison and D. Byron, "PYCOT: An Optimality Theory-based Pronoun Resolution Toolkit," in Proceedings of LREC 2004, Lisbonne, 2004.
C. Schnedecker, "Les chaînes de référence dans les portraits journalistiques : éléments de description," Travaux de Linguistique 51, pp. 85–133. Duculot, 2005.
G. Kleiber, Anaphores et Pronoms. Louvain-la-Neuve : Duculot, 1994.
H. Manuélian, Description Définies et Démonstratives : Analyses de Corpus pour la Génération de Textes. PhD thesis, Nancy 2, 2003.
B. J. Grosz, S. Weinstein, and A. K. Joshi, "Centering: a framework for modeling the local coherence of discourse," Computational Linguistics 21(2), pp. 203–225, 1995.
D. Beaver, "The optimization of discourse anaphora," Linguistics and Philosophy, 27(1): pp. 3–56, 2004.
D. Biber, "Representativeness in corpus design," Linguistica Computazionale, IX-X, Current Issues in Computational Linguistics: in honor of Don Walker, 1994.
L. Longo and A. Todirascu, "Une étude de corpus pour la détection automatique de thèmes," in Proceedings of the 6th journées de linguistique de corpus (JLC 09), Lorient, 2010.
R. Steinberger, B. Pouliquen, A. Widiger, C. Ignat, T. Erjavec, D. Tufis, and D. Varga, "The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages," in Proceedings of the 5th LREC Conference, pp. 2142–2147, 2006.
R. Ion, TTL: A portable framework for tokenization, tagging and lemmatization of large corpora. Bucharest: Romanian Academy, 2007.
N. Ide and J. Véronis, "MULTEXT (Multilingual Tools and Corpora)," in Proceedings of the 14th International Conference on Computational Linguistics, Kyoto, 1994.
D.Mcdonald, "Internal and External Evidence in the Identification and Semantic Categorization of Proper Names," in proceedings of Corpus Processing for Lexical Acquisition, MIT press, pp. 21–39, 1996.
I. Dagan, A. Itai. "A statistical filter for resolving pronoun references". In Y. A. Feldman and A. Bruckstein, editors, Artificial Intelligence and Computer Vision, pp. 125–135. Elsevier Science Publishers B.V, 1991.
Y. Mathet, A.Widlöcher, "La plate-forme d’annotation Glozz : environnement d’annotation et d’exploration de corpus," in Proceedings of theTALN 2009, Senlis, France, 2009.
G. Doddington, A. Mitchell, M. Przybocki, L. Ramshaw, S. Strassel, and R. Weischedel, "The Automatic Content Extraction (ACE) program – Tasks, data, and evaluation," in Proceedings of LREC 2004, pp. 837–840, 2004.