An approach to the automatic transfer of lexical units from English FrameNet to Spanish by using WordNet

Mario Crespo Miguel

Abstract

In the field of Natural Language Processing, linguistic resources are structured and detailed descriptions of a certain language. They are considered as key elements for studying languages and developing applications. However, these repositories are slow and difficult to build, and most of them focuses on English. This work tries to improve the lack of linguistic resources in Spanish by transferring part of the information encoded in the FrameNet project into Spanish. For this purpose, we developed an automatic procedure able to align the different frame predicates with the WordNet synsets that best represent them. Our system reaches an 88% precision and makes it possible to reuse this semantic resource for linguistic studies in Spanish.


Keywords

FrameNet; semantic analysis; semantic processing; WordNet; transferring of linguistic information; disambiguation

Full Text:

PDF

References

Arano, S. (2005). "Thesauruses and ontologies". Hipertext.net, 3. Disponible en https://www.upf.edu/hipertextnet/en/numero-3/tesauros.html

Baker, C., Fillmore, C. J. and Lowe, J. B. (1998). "The Berkeley FrameNet project", en C. Boitet and P. Whitelock (eds.), Proceedings of the Thirty-Sixth Annual Meeting of the Association for Computational Linguistics and Seventeenth International Conference on Computational Linguistics (86-90). San Francisco, California: Morgan Kaufmann Publishers. https://doi.org/10.3115/980845.980860

Bel, N., Bel, S., Espeja, S., Marimon, M., Villegas, M. (2008). "El proyecto CLARIN: una infraestructura de investigación científica para las humanidades y las ciencias sociales". Digithum (10). Artículo en línea]. https://doi.org/10.7238/d.v0i10.501

Benfeng, C., y Fung, P. (2004). "Automatic Construction of an English-Chinese Bilingual FrameNet". Proceedings of HLT-NAACL 2004: Short Papers. Boston, Massachusetts: ACL, 29-32.

Burchardt, A., Erk, K. y Frank, A. (2005). "A WordNet detour to FrameNet". Sprachtechnologie, mobile Kommunikation und linguistische Resourcen, 8, 408-421.

Burchardt, A., Erk, K., Frank, A., Kowalski, A., Padó, S. and Pinkal, M. (2006). "The SALSA Corpus: a German Corpus Resourcefor Lexical Semantics". Proceedings of Language Resources and Evauation Conference, 2006 (969-974). Genova: LREC. URL: http://www.lrec-conf.org/proceedings/lrec2006/pdf/339_pdf.pdf

Candito, M., Amsili, P., Barque, L., Benamara, F., de Chalendar, G., Djemaa, M., Haas, P., Huyghe, R., Yannick Mathieu, Y., Muller, P., Sagot, B., Vieu, L. (2014). "Developing a French FrameNet: Methodology and First results". Proceedings of the The 9th edition of the Language Resources and Evaluation Conference. Reykjavik: ELRA, 1-9.

Casas Gómez, Miguel (2014). "A Typology of Relationships in Semantics". Quaderni di semantica: Rivista Internazionale di Semantica Teorica e Applicata, Vol. 35 (2), 45-74.

Casas Gómez, M. (2020). "Conceptual relationships and their methodological representation in a dictionary of the terminological uses of lexical semantics". Fachsprache: Internationale Zeitschrift für Fachsprachenforschung-didaktik und Terminologie, 42/1-2, 2-26. https://doi.org/10.24989/fs.v42i1-2.1789

Civit Torruella, M., Aldezabal Roteta, I., Pociello Irigoyen, E., Taulé Delor, M., Aparicio Mera, J.J., Màrquez Villodre, L., Navarro Colorado, B., Castellví Vives, J. y Martí Antonín, M.A. (2005). "3LB-LEX: léxico verbal con frames sintáctico-semánticos". Procesamiento del Lenguaje Natural 35, 367-373.

Crespo, M. (2021). Automatic Corpus-based translation of a Spanish FrameNet medical Glossary. Colección Lingüística. Sevilla: Universidad de Sevilla

Cristea, D., y Pistol, I.C. (2012). "Multilingual linguistic workflows". Multilingual Processing in Eastern and Southern EU Languages. Low-resourced Technologies and Translation, Cambridge Scholars Publishing, UK, 228-246.

Ferrández, Ó., Ellsworth, M., Muñoz, R., y Baker, C. F. (2010). "Aligning FrameNet and WordNet based on Semantic Neighborhoods". Proceedings of the International Conference on Language Resources and Evaluation, LREC 2010. Malta: ELRA, 310-314.

Fillmore, C. J. (1977). "Scenes and Frames Semantics", en A. Zampolli (Ed.), Linguistic Structures Processing (55-82). Amsterdam: North Holland.

Friberg Heppin, K., y Toporowska Gronostaj, M. (2012). "The Rocky Road towards a Swedish FrameNet - Creating SweFN". Proceedings of the Eighth conference on International Language Resources and Evaluation (LREC-2012). Estambul: ELRA, -261.

Gilchrist, A. (2003). "Thesauri, taxonomies and ontologies-an etymological note". Journal of documentation, 59(1), 7-18. https://doi.org/10.1108/00220410310457984

Hayoun, A. y Elhadad, M. (2016). "The Hebrew FrameNet Project". Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016) (4341-4347). Portorož, Slovenia: European Language Resources Association (ELRA).

Hilera, J. R., Pagés, C., Martínez, J.J., Gutiérrez, J.A., y De-Marcos, L. (2010). "An evolutive process to convert glossaries into ontologies". Information technology and libraries, 29(4), 195-204. https://doi.org/10.6017/ital.v29i4.3130

Johansson, R., y Nugues, P. (2007). "Using WordNet to Extend FrameNet Coverage", en P. Nugues, y R. Johansson (Eds.), LU-CS-TR: 2007-240. Lund: Department of Computer Science, Lund University, 27-30.

Kim, J., Hahm, Y., y Choi, K. (2016). "Korean FrameNet Expansion Based on Projection of Japanese FrameNet". Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations. Osaka: ACL, 175-179.

Kipper, K., Trang Dang, H., Schuler, W., y Palmer, M. (2000). "Building a class-based verb lexicon using tags". Proceedings of the Fifth International Workshop on Tree Adjoining Grammar and Related Frameworks (TAG+5) (147-155). Paris: ACL.

Kurdi, M.Z. (2017). Natural language processing and computational linguistics 2: semantics, discourse and applications (Vol. 2). Hoboken, Nueva Jersey: John Wiley & Sons. https://doi.org/10.1002/9781119419686

Laparra, E., Rigau, G. Cuadros, M. (2010). "Exploring the integration of WordNet and FrameNet". Proceedings of the 5th Global WordNet Conference. Mumbai: Global WordNet Association, 1-6.

Liping, Y., y Kaiying, L. (2005). "Building Chinese FrameNet database". Proceedigs of the 2005 International Conference on Natural Language Processing and Knowledge Engineering. Wuhan: IEEE, 301-306. https://doi.org/10.1109/NLPKE.2005.1598752

López de Lacalle, M., Laparra, E., y Rigau, G. (2014). "Predicate Matrix: extending SemLink through WordNet mappings". Proceedings of the Ninth International Conference on Language Resources and Evaluation

Martí Antonín, M.A., y Taulé Delor, M. (2014). Computational Hispanic Linguistics. The Routledge Handbook of Hispanic Applied Linguistics. London: Taylor and Francis, (350-370).

McCrae, J.P., y Cimiano, P. (2015). "Linghub: a Linked Data based portal supporting the discovery of language resources". Proceedings of the 11th International Conference on Semantic Systems, Semantics, 1481. New York: Association for Computing Machinery, 88-91.

Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D., y Miller, K. (ed.) (1993). Five Papers on WordNet, cls report 43. Tecnical report. New Jersey: Cognitive Science Laboratory. Princeton University.

Miller, J. E., y Brown, K. (2013). The Cambridge dictionary of linguistics. Cambridge: Cambridge University Press.

Minsky, M. (1975). "A framework for representing knowledge". Psychology of Computer Vision. New York: McGrawHill, 211-277.

Nespore-Berzkalne G., Saulite, B., y Gruzitis, N. (2018). "Latvian FrameNet: Cross-Lingual Issue". Human Language Technologies - The Baltic Perspective, 307. Amsterdam: IOS Press, 96-103.

Ohara, K., Fujii, S., Ohori, T., Suzuki, R., Saito, H., y Ishizaki, S. (2004). "The Japanese FrameNet Project: An Introduction". LREC 2004: The Fourth International Conference on Language Resources and Evaluation (249-254). Lisbon: LREC.

Palmer, M., Gildea, D., y Kingsbury, P. (2005). "The Proposition Bank: An Annotated Corpus of Semantic Roles". Journal Computational Linguistics, 31, issue 1. MA: MIT Press Cambridge, 71-106. https://doi.org/10.1162/0891201053630264

Pennacchiotti, M., De Cao, D., Basili, R., Croce, D., Roth, M. (2008). "Automatic induction of FrameNet lexical units". Proceedings of the 2008 conference on empirical methods in natural language. Honolulu: ACL, 457-465. https://doi.org/10.3115/1613715.1613773

Pieterse, V., y Kourie, D. G. (2014). "Lists, taxonomies, lattices, thesauri and ontologies: paving a pathway through a terminological jungle". KO Knowledge Organization, 41(3), 217-229. https://doi.org/10.5771/0943-7444-2014-3-217

Powers, D. M. (2011). "Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation". Journal of Machine Learning Technologies, 2, No. 1. (2011), 37-63.

Ruppenhofer, J., Ellsworth, M., Petruck, M. R. L., Johnson, C. y Scheffczyk, J. (2006. FrameNet II: Extended Theory and Practice. URL: https://framenet2.icsi.berkeley.edu/docs/r1.7/book.pdf

Salomão, M. (2009). "FrameNet Brasil: um trabalho em progresso". Calidoscópio 7(3), 171-182. https://doi.org/10.4013/cld.2009.73.01

Subirats, C., y Petruck, M. R. L. (2003). "Surprise: Spanish FrameNet!". Proceedings of Proceedings of the Workshop on Frame Semantics at the XVII. International Congress of Linguists (CD-ROM). Prague: Matfyzpress.

Subirats, C. (2013). "La integración de la semántica de marcos y la semántica de simulación: aplicaciones al procesamiento semántico automático del español", en Mª Luisa Calero and Mª Ángeles Hermosilla (eds.). Lingüística, Poética y Cognición. Córdoba: Servicio de Publicaciones de la Universidad de Córdoba, 307-337.

Tonelli, S., y Pianta, E. (2009). "A novel approach to mapping FrameNet lexical units to WordNet synsets (short paper)". Proceedings of the Eight International Conference on Computational Semantics. Tilburg: ACL, 342-345. https://doi.org/10.3115/1693756.1693800

Torrent, T.T., Ellsworth, M., Baker, C.F. and Matos, E. E. (2018). "The Multilingual FrameNet Shared Annotation Task: A Preliminary Report". Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018) (62-68). Miyazaki: ELRA.

Van Uytvanck, D., Zinn, C., Broeder, D., Wittenburg, P., Gardelleni, M. (2010). "Virtual language observatory: The portal to the language resources and technology universe". Proceedings of the Seventh conference on International Language Resources and Evaluation [LREC 2010]. Malta: European Language Resources Association (ELRA), pp. 900-903.

Vossen, P. (ed.) (1998): EuroWordNet: A Multilingual Database with Lexical Semantic Networks. Dordrecht: Kluwer Academic Publishers. https://doi.org/10.1007/978-94-017-1491-4

Vossen, P. (ed.) (2002): EuroWordNet: general document. URL: http://vossen.info/docs/2002/EWNGeneral.pdf

Vilches-Blázquez, L.M., García Silva, A., y Villazón Terrazas, B. (2009). Construcción de ontologías a partir de tesauros. Semántica Espacial y descubrimiento de conocimientos para desarrollo sostenible. La Habana: CUJAE, 59-78.

Abstract Views

376
Metrics Loading ...

Metrics powered by PLOS ALM




This journal is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License

Universitat Politècnica de València

e-ISSN: 1886-6298    https://doi.org/10.4995/rlyla