Genre-based Reference Chains Identification for French

Laurence Longo; Amalia Todirascu

doi:10.14746/il.2010.21.4

Vol. 21 (2010), Artykuły

Vol. 21 (2010)

Genre-based Reference Chains Identification for French

Artykuły

https://doi.org/10.14746/il.2010.21.4

Published 2010-06-15

Laurence Longo⁺⁻
Amalia Todirascu⁺⁻

Laurence Longo

Université de Strasbourg, laboratoire LiLPa 22 avenue René Descartes, 67084 Strasbourg cedex, France

Amalia Todirascu

Université de Strasbourg, laboratoire LiLPa 22 avenue René Descartes, 67084 Strasbourg cedex, France

PDF (Język Polski)

How to Cite

Longo, L., & Todirascu, A. (2010). Genre-based Reference Chains Identification for French. Investigationes Linguisticae, 21, 57–75. https://doi.org/10.14746/il.2010.21.4

Number of views: 446

Number of downloads: 406

Abstract

In this paper we present RefGen, a reference chains identification module for French. RefGen algorithm uses genre specific properties of reference chains and an accessibility measure to find the mentions of the referred entity. The module applies strong and weak constraints (lexical, morpho-syntactic, and semantic) to automatically identify coreference relations between referential expressions. We evaluate the results obtained by RefGen from a public reports corpus and we discuss the importance of the genre-dependent parameters to improve the reference chains identification.

https://doi.org/10.14746/il.2010.21.4

PDF (Język Polski)

Downloads

Download data is not yet available.

References

F. Cornish, Références anaphoriques, références déictiques, et contexte prédicatif et énonciatif. Sémiotiques, 8, pp. 31–57, 1995.

C. Schnedecker, Nom propre et chaînes de référence. Recherches Linguistiques 21. Paris : Klincksieck, 1997.

V. Ng and C. Cardie, "Improving machine learning approaches to coreference resolution", in Proceedings of the ACL (Association For Computational Linguistics), Morristown, pp. 104 111, 2002.

V. Hoste, Optimization Issues in Machine Learning of Coreference Resolution. PhD thesis, 246 p, 2005.

S. Salmon-Alt, Référence et Dialogue finalisé : de la linguistique à un modèle opérationnel. PhD thesis, Université H. Poincaré, Nancy, 2001.

R. Mitkov, "Towards a more consistent and comprehensive evaluation of anaphora resolution algorithms and systems," Applied Artificial Intelligence: An International Journal, 15, pp. 253–276, 2001.

S. Hartrumpf, "Coreference Resolution with Syntactico-Semantic Rules and Corpus Statistics," in Proceedings of CoNLL (Computational Natural Language Learning Workshop), 2001.

A. Popescu-Belis, Modélisation multi-agent des échanges langagiers : application au problème de la référence et à son évaluation. PhD thesis, Université Paris-XI, 1999.

K. Bontcheva, M. Dimitrov, D. Maynard, V. Tablan, and H. Cunningham, "Shallow methods for named entity coreference resolution," in Proceedings of TALN 2002, 2002.

M. Ariel, Accessing Noun-Phrase Antecedents, London: Routledge, 1990.

W. Gegg-Harrison and D. Byron, "PYCOT: An Optimality Theory-based Pronoun Resolution Toolkit," in Proceedings of LREC 2004, Lisbonne, 2004.

C. Schnedecker, "Les chaînes de référence dans les portraits journalistiques : éléments de description," Travaux de Linguistique 51, pp. 85–133. Duculot, 2005.

G. Kleiber, Anaphores et Pronoms. Louvain-la-Neuve : Duculot, 1994.

H. Manuélian, Description Définies et Démonstratives : Analyses de Corpus pour la Génération de Textes. PhD thesis, Nancy 2, 2003.

B. J. Grosz, S. Weinstein, and A. K. Joshi, "Centering: a framework for modeling the local coherence of discourse," Computational Linguistics 21(2), pp. 203–225, 1995.

D. Beaver, "The optimization of discourse anaphora," Linguistics and Philosophy, 27(1): pp. 3–56, 2004.

D. Biber, "Representativeness in corpus design," Linguistica Computazionale, IX-X, Current Issues in Computational Linguistics: in honor of Don Walker, 1994.

L. Longo and A. Todirascu, "Une étude de corpus pour la détection automatique de thèmes," in Proceedings of the 6th journées de linguistique de corpus (JLC 09), Lorient, 2010.

R. Steinberger, B. Pouliquen, A. Widiger, C. Ignat, T. Erjavec, D. Tufis, and D. Varga, "The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages," in Proceedings of the 5th LREC Conference, pp. 2142–2147, 2006.

R. Ion, TTL: A portable framework for tokenization, tagging and lemmatization of large corpora. Bucharest: Romanian Academy, 2007.

N. Ide and J. Véronis, "MULTEXT (Multilingual Tools and Corpora)," in Proceedings of the 14th International Conference on Computational Linguistics, Kyoto, 1994.

D.Mcdonald, "Internal and External Evidence in the Identification and Semantic Categorization of Proper Names," in proceedings of Corpus Processing for Lexical Acquisition, MIT press, pp. 21–39, 1996.

I. Dagan, A. Itai. "A statistical filter for resolving pronoun references". In Y. A. Feldman and A. Bruckstein, editors, Artificial Intelligence and Computer Vision, pp. 125–135. Elsevier Science Publishers B.V, 1991.

Y. Mathet, A.Widlöcher, "La plate-forme d’annotation Glozz : environnement d’annotation et d’exploration de corpus," in Proceedings of theTALN 2009, Senlis, France, 2009.

G. Doddington, A. Mitchell, M. Przybocki, L. Ramshaw, S. Strassel, and R. Weischedel, "The Automatic Content Extraction (ACE) program – Tasks, data, and evaluation," in Proceedings of LREC 2004, pp. 837–840, 2004.

Genre-based Reference Chains Identification for French

How to Cite

Download Citation

Abstract

Downloads

References