publications
2023
- Addressing structural hurdles for metadata extraction from environmental impact statementsEgoitz Laparra, Alex Binford-Walsh, Kirk Emerson, and 4 more authorsJournal of the Association for Information Science and Technology 2023
Abstract Natural language processing techniques can be used to analyze the linguistic content of a document to extract missing pieces of metadata. However, accurate metadata extraction may not depend solely on the linguistics, but also on structural problems such as extremely large documents, unordered multi-file documents, and inconsistency in manually labeled metadata. In this work, we start from two standard machine learning solutions to extract pieces of metadata from Environmental Impact Statements, environmental policy documents that are regularly produced under the US National Environmental Policy Act of 1969. We present a series of experiments where we evaluate how these standard approaches are affected by different issues derived from real-world data. We find that metadata extraction can be strongly influenced by nonlinguistic factors such as document length and volume ordering and that the standard machine learning solutions often do not scale well to long documents. We demonstrate how such solutions can be better adapted to these scenarios, and conclude with suggestions for other NLP practitioners cataloging large document collections.
@article{laparra-etal-2023-addressing, author = {Laparra, Egoitz and Binford-Walsh, Alex and Emerson, Kirk and Miller, Marc L. and López-Hoffman, Laura and Currim, Faiz and Bethard, Steven}, title = {Addressing structural hurdles for metadata extraction from environmental impact statements}, journal = {Journal of the Association for Information Science and Technology}, volume = {n/a}, number = {n/a}, pages = {}, doi = {https://doi.org/10.1002/asi.24809}, url = {https://asistdl.onlinelibrary.wiley.com/doi/abs/10.1002/asi.24809}, year = {2023}, }
2022
- Taxonomy Builder: a Data-driven and User-centric Tool for Streamlining Taxonomy ConstructionMihai Surdeanu, John Hungerford, Yee Seng Chan, and 16 more authorsIn Proceedings of the Second Workshop on Bridging Human–Computer Interaction and Natural Language Processing 2022
An existing domain taxonomy for normalizing content is often assumed when discussing approaches to information extraction, yet often in real-world scenarios there is none.When one does exist, as the information needs shift, it must be continually extended. This is a slow and tedious task, and one which does not scale well.Here we propose an interactive tool that allows a taxonomy to be built or extended \textitrapidly and with a \textithuman in the loop to control precision. We apply insights from text summarization and information extraction to reduce the search space dramatically, then leverage modern pretrained language models to perform contextualized clustering of the remaining concepts to yield candidate nodes for the user to review. We show this allows a user to consider as many as 200 taxonomy concept candidates an hour, to quickly build or extend a taxonomy to better fit information needs.
@inproceedings{surdeanu-etal-2022-taxonomy, title = {Taxonomy Builder: a Data-driven and User-centric Tool for Streamlining Taxonomy Construction}, author = {Surdeanu, Mihai and Hungerford, John and Chan, Yee Seng and MacBride, Jessica and Gyori, Benjamin and Zupon, Andrew and Tang, Zheng and Qiu, Haoling and Min, Bonan and Zverev, Yan and Hilverman, Caitlin and Thomas, Max and Andrews, Walter and Alcock, Keith and Zhang, Zeyu and Reynolds, Michael and Bethard, Steven and Sharp, Rebecca and Laparra, Egoitz}, booktitle = {Proceedings of the Second Workshop on Bridging Human--Computer Interaction and Natural Language Processing}, month = jul, year = {2022}, address = {Seattle, Washington}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2022.hcinlp-1.1}, doi = {10.18653/v1/2022.hcinlp-1.1}, pages = {1--10}, }
2021
- A Review of Recent Work in Transfer Learning and Domain Adaptation for Natural Language Processing of Electronic Health RecordsEgoitz Laparra, Aurelie Mascio, Sumithra Velupillai, and 1 more authorYearbook of Medical Informatics 2021
@article{Laparra03.09.2021, author = {Laparra, Egoitz and Mascio, Aurelie and Velupillai, Sumithra and Miller, Timothy}, title = {A Review of Recent Work in Transfer Learning and Domain Adaptation for Natural Language Processing of Electronic Health Records}, journal = {Yearbook of Medical Informatics}, year = {2021}, volume = {30}, number = {01}, pages = {239-244}, note = {239}, language = {EN}, publisher = {Georg Thieme Verlag KG}, }
- SemEval-2021 Task 10: Source-Free Domain Adaptation for Semantic ProcessingEgoitz Laparra, Xin Su, Yiyun Zhao, and 3 more authorsIn Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021) 2021
This paper presents the Source-Free Domain Adaptation shared task held within SemEval-2021. The aim of the task was to explore adaptation of machine-learning models in the face of data sharing constraints. Specifically, we consider the scenario where annotations exist for a domain but cannot be shared. Instead, participants are provided with models trained on that (source) data. Participants also receive some labeled data from a new (development) domain on which to explore domain adaptation algorithms. Participants are then tested on data representing a new (target) domain. We explored this scenario with two different semantic tasks: negation detection (a text classification task) and time expression recognition (a sequence tagging task).
@inproceedings{laparra-etal-2021-semeval, title = {{S}em{E}val-2021 Task 10: Source-Free Domain Adaptation for Semantic Processing}, author = {Laparra, Egoitz and Su, Xin and Zhao, Yiyun and Uzuner, {\"O}zlem and Miller, Timothy and Bethard, Steven}, booktitle = {Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)}, month = aug, year = {2021}, address = {Online}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2021.semeval-1.42}, doi = {10.18653/v1/2021.semeval-1.42}, pages = {348--356}, }
- Domain adaptation in practice: Lessons from a real-world information extraction pipelineTimothy Miller, Egoitz Laparra, and Steven BethardIn Proceedings of the Second Workshop on Domain Adaptation for NLP 2021
Advances in transfer learning and domain adaptation have raised hopes that once-challenging NLP tasks are ready to be put to use for sophisticated information extraction needs. In this work, we describe an effort to do just that – combining state-of-the-art neural methods for negation detection, document time relation extraction, and aspectual link prediction, with the eventual goal of extracting drug timelines from electronic health record text. We train on the THYME colon cancer corpus and test on both the THYME brain cancer corpus and an internal corpus, and show that performance of the combined systems is unacceptable despite good performance of individual systems. Although domain adaptation shows improvements on each individual system, the model selection problem is a barrier to improving overall pipeline performance.
@inproceedings{miller-etal-2021-domain, title = {Domain adaptation in practice: Lessons from a real-world information extraction pipeline}, author = {Miller, Timothy and Laparra, Egoitz and Bethard, Steven}, booktitle = {Proceedings of the Second Workshop on Domain Adaptation for NLP}, month = apr, year = {2021}, address = {Kyiv, Ukraine}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2021.adaptnlp-1.11}, pages = {105--110}, }
2020
- Rethinking domain adaptation for machine learning over clinical languageEgoitz Laparra, Steven Bethard, and Timothy A MillerJAMIA Open 2020
Building clinical natural language processing (NLP) systems that work on widely varying data is an absolute necessity because of the expense of obtaining new training data. While domain adaptation research can have a positive impact on this problem, the most widely studied paradigms do not take into account the realities of clinical data sharing. To address this issue, we lay out a taxonomy of domain adaptation, parameterizing by what data is shareable. We show that the most realistic settings for clinical use cases are seriously under-studied. To support research in these important directions, we make a series of recommendations, not just for domain adaptation but for clinical NLP in general, that ensure that data, shared tasks, and released models are broadly useful, and that initiate research directions where the clinical NLP community can lead the broader NLP and machine learning fields.
@article{10.1093/jamiaopen/ooaa010, author = {Laparra, Egoitz and Bethard, Steven and Miller, Timothy A}, title = {{Rethinking domain adaptation for machine learning over clinical language}}, journal = {JAMIA Open}, volume = {3}, number = {2}, pages = {146-150}, year = {2020}, month = apr, issn = {2574-2531}, doi = {10.1093/jamiaopen/ooaa010}, url = {https://doi.org/10.1093/jamiaopen/ooaa010}, }
- A Dataset and Evaluation Framework for Complex Geographical Description ParsingEgoitz Laparra, and Steven BethardIn Proceedings of the 28th International Conference on Computational Linguistics 2020
Much previous work on geoparsing has focused on identifying and resolving individual toponyms in text like Adrano, S.Maria di Licodia or Catania. However, geographical locations occur not only as individual toponyms, but also as compositions of reference geolocations joined and modified by connectives, e.g., “. . . between the towns of Adrano and S.Maria di Licodia, 32 kilometres northwest of Catania”. Ideally, a geoparser should be able to take such text, and the geographical shapes of the toponyms referenced within it, and parse these into a geographical shape, formed by a set of coordinates, that represents the location described. But creating a dataset for this complex geoparsing task is difficult and, if done manually, would require a huge amount of effort to annotate the geographical shapes of not only the geolocation described but also the reference toponyms. We present an approach that automates most of the process by combining Wikipedia and OpenStreetMap. As a result, we have gathered a collection of 360,187 uncurated complex geolocation descriptions, from which we have manually curated 1,000 examples intended to be used as a test set. To accompany the data, we define a new geoparsing evaluation framework along with a scoring methodology and a set of baselines.
@inproceedings{laparra-bethard-2020-dataset, title = {A Dataset and Evaluation Framework for Complex Geographical Description Parsing}, author = {Laparra, Egoitz and Bethard, Steven}, booktitle = {Proceedings of the 28th International Conference on Computational Linguistics}, month = dec, year = {2020}, address = {Barcelona, Spain (Online)}, publisher = {International Committee on Computational Linguistics}, url = {https://aclanthology.org/2020.coling-main.81}, doi = {10.18653/v1/2020.coling-main.81}, pages = {936--948}, }
- Basque Wordnet - EusWN 3.0 (ELEXIS)Gonzalez-Agirre Aitor, Egoitz Laparra, and German Rigau2020
@misc{11356/1656, title = {Basque Wordnet - {EusWN} 3.0 ({ELEXIS})}, author = {Aitor, Gonzalez-Agirre and Laparra, Egoitz and Rigau, German}, url = {http://hdl.handle.net/11356/1656}, note = {Slovenian language resource repository {CLARIN}.{SI}}, issn = {2820-4042}, year = {2020}, }
2019
- Inferring missing metadata from environmental policy textsSteven Bethard, Egoitz Laparra, Sophia Wang, and 4 more authorsIn Proceedings of the 3rd Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature 2019
The National Environmental Policy Act (NEPA) provides a trove of data on how environmental policy decisions have been made in the United States over the last 50 years. Unfortunately, there is no central database for this information and it is too voluminous to assess manually. We describe our efforts to enable systematic research over US environmental policy by extracting and organizing metadata from the text of NEPA documents. Our contributions include collecting more than 40,000 NEPA-related documents, and evaluating rule-based baselines that establish the difficulty of three important tasks: identifying lead agencies, aligning document versions, and detecting reused text.
@inproceedings{bethard-etal-2019-inferring, title = {Inferring missing metadata from environmental policy texts}, author = {Bethard, Steven and Laparra, Egoitz and Wang, Sophia and Zhao, Yiyun and Al-Ghezi, Ragheb and Lien, Aaron and L{\'o}pez-Hoffman, Laura}, booktitle = {Proceedings of the 3rd Joint {SIGHUM} Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature}, month = jun, year = {2019}, address = {Minneapolis, USA}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/W19-2506}, doi = {10.18653/v1/W19-2506}, pages = {46--51}, }
- University of Arizona at SemEval-2019 Task 12: Deep-Affix Named Entity Recognition of Geolocation EntitiesVikas Yadav, Egoitz Laparra, Ti-Tai Wang, and 2 more authorsIn Proceedings of the 13th International Workshop on Semantic Evaluation 2019
We present the Named Entity Recognition (NER) and disambiguation model used by the University of Arizona team (UArizona) for the SemEval 2019 task 12. We achieved fourth place on tasks 1 and 3. We implemented a deep-affix based LSTM-CRF NER model for task 1, which utilizes only character, word, pre- fix and suffix information for the identification of geolocation entities. Despite using just the training data provided by task organizers and not using any lexicon features, we achieved 78.85% strict micro F-score on task 1. We used the unsupervised population heuristics for task 3 and achieved 52.99% strict micro-F1 score in this task.
@inproceedings{yadav-etal-2019-university, title = {{U}niversity of {A}rizona at {S}em{E}val-2019 Task 12: Deep-Affix Named Entity Recognition of Geolocation Entities}, author = {Yadav, Vikas and Laparra, Egoitz and Wang, Ti-Tai and Surdeanu, Mihai and Bethard, Steven}, booktitle = {Proceedings of the 13th International Workshop on Semantic Evaluation}, month = jun, year = {2019}, address = {Minneapolis, Minnesota, USA}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/S19-2232}, doi = {10.18653/v1/S19-2232}, pages = {1319--1323}, }
- Pre-trained Contextualized Character Embeddings Lead to Major Improvements in Time Normalization: a Detailed AnalysisDongfang Xu, Egoitz Laparra, and Steven BethardIn Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019) 2019
Recent studies have shown that pre-trained contextual word embeddings, which assign the same word different vectors in different contexts, improve performance in many tasks. But while contextual embeddings can also be trained at the character level, the effectiveness of such embeddings has not been studied. We derive character-level contextual embeddings from Flair (Akbik et al., 2018), and apply them to a time normalization task, yielding major performance improvements over the previous state-of-the-art: 51% error reduction in news and 33% in clinical notes. We analyze the sources of these improvements, and find that pre-trained contextual character embeddings are more robust to term variations, infrequent terms, and cross-domain changes. We also quantify the size of context that pre-trained contextual character embeddings take advantage of, and show that such embeddings capture features like part-of-speech and capitalization.
@inproceedings{xu-etal-2019-pre, title = {Pre-trained Contextualized Character Embeddings Lead to Major Improvements in Time Normalization: a Detailed Analysis}, author = {Xu, Dongfang and Laparra, Egoitz and Bethard, Steven}, booktitle = {Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*{SEM} 2019)}, month = jun, year = {2019}, address = {Minneapolis, Minnesota}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/S19-1008}, doi = {10.18653/v1/S19-1008}, pages = {68--74}, }
- Eidos, INDRA, & Delphi: From Free Text to Executable Causal ModelsRebecca Sharp, Adarsh Pyarelal, Benjamin Gyori, and 14 more authorsIn Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations) 2019
Building causal models of complicated phenomena such as food insecurity is currently a slow and labor-intensive manual process. In this paper, we introduce an approach that builds executable probabilistic models from raw, free text. The proposed approach is implemented through three systems: Eidos, INDRA, and Delphi. Eidos is an open-domain machine reading system designed to extract causal relations from natural language. It is rule-based, allowing for rapid domain transfer, customizability, and interpretability. INDRA aggregates multiple sources of causal information and performs assembly to create a coherent knowledge base and assess its reliability. This assembled knowledge serves as the starting point for modeling. Delphi is a modeling framework that assembles quantified causal fragments and their contexts into executable probabilistic models that respect the semantics of the original text, and can be used to support decision making.
@inproceedings{sharp-etal-2019-eidos, title = {Eidos, {INDRA}, {\&} Delphi: From Free Text to Executable Causal Models}, author = {Sharp, Rebecca and Pyarelal, Adarsh and Gyori, Benjamin and Alcock, Keith and Laparra, Egoitz and Valenzuela-Esc{\'a}rcega, Marco A. and Nagesh, Ajay and Yadav, Vikas and Bachman, John and Tang, Zheng and Lent, Heather and Luo, Fan and Paul, Mithun and Bethard, Steven and Barnard, Kobus and Morrison, Clayton and Surdeanu, Mihai}, booktitle = {Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics (Demonstrations)}, month = jun, year = {2019}, address = {Minneapolis, Minnesota}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/N19-4008}, doi = {10.18653/v1/N19-4008}, pages = {42--47}, }
2018
- Detecting Diabetes Risk from Social Media ActivityDane Bell, Egoitz Laparra, Aditya Kousik, and 3 more authorsIn Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis 2018
This work explores the detection of individuals’ risk of type 2 diabetes mellitus (T2DM) directly from their social media (Twitter) activity. Our approach extends a deep learning architecture with several contributions: following previous observations that language use differs by gender, it captures and uses gender information through domain adaptation; it captures recency of posts under the hypothesis that more recent posts are more representative of an individual’s current risk status; and, lastly, it demonstrates that in this scenario where activity factors are sparsely represented in the data, a bag-of-word neural network model using custom dictionaries of food and activity words performs better than other neural sequence models. Our best model, which incorporates all these contributions, achieves a risk-detection F1 of 41.9, considerably higher than the baseline rate (36.9).
@inproceedings{bell-etal-2018-detecting, title = {Detecting Diabetes Risk from Social Media Activity}, author = {Bell, Dane and Laparra, Egoitz and Kousik, Aditya and Ishihara, Terron and Surdeanu, Mihai and Kobourov, Stephen}, booktitle = {Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis}, month = oct, year = {2018}, address = {Brussels, Belgium}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/W18-5601}, doi = {10.18653/v1/W18-5601}, pages = {1--11}, }
- SemEval 2018 Task 6: Parsing Time NormalizationsEgoitz Laparra, Dongfang Xu, Ahmed Elsayed, and 2 more authorsIn Proceedings of the 12th International Workshop on Semantic Evaluation 2018
This paper presents the outcomes of the Parsing Time Normalization shared task held within SemEval-2018. The aim of the task is to parse time expressions into the compositional semantic graphs of the Semantically Compositional Annotation of Time Expressions (SCATE) schema, which allows the representation of a wider variety of time expressions than previous approaches. Two tracks were included, one to evaluate the parsing of individual components of the produced graphs, in a classic information extraction way, and another one to evaluate the quality of the time intervals resulting from the interpretation of those graphs. Though 40 participants registered for the task, only one team submitted output, achieving 0.55 F1 in Track 1 (parsing) and 0.70 F1 in Track 2 (intervals).
@inproceedings{laparra-etal-2018-semeval, title = {{S}em{E}val 2018 Task 6: Parsing Time Normalizations}, author = {Laparra, Egoitz and Xu, Dongfang and Elsayed, Ahmed and Bethard, Steven and Palmer, Martha}, booktitle = {Proceedings of the 12th International Workshop on Semantic Evaluation}, month = jun, year = {2018}, address = {New Orleans, Louisiana}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/S18-1011}, doi = {10.18653/v1/S18-1011}, pages = {88--96}, }
- From Characters to Time Intervals: New Paradigms for Evaluation and Neural Parsing of Time NormalizationsEgoitz Laparra, Dongfang Xu, and Steven BethardTransactions of the Association for Computational Linguistics 2018
This paper presents the first model for time normalization trained on the SCATE corpus. In the SCATE schema, time expressions are annotated as a semantic composition of time entities. This novel schema favors machine learning approaches, as it can be viewed as a semantic parsing task. In this work, we propose a character level multi-output neural network that outperforms previous state-of-the-art built on the TimeML schema. To compare predictions of systems that follow both SCATE and TimeML, we present a new scoring metric for time intervals. We also apply this new metric to carry out a comparative analysis of the annotations of both schemes in the same corpus.
@article{laparra-etal-2018-characters, title = {From Characters to Time Intervals: New Paradigms for Evaluation and Neural Parsing of Time Normalizations}, author = {Laparra, Egoitz and Xu, Dongfang and Bethard, Steven}, journal = {Transactions of the Association for Computational Linguistics}, volume = {6}, year = {2018}, address = {Cambridge, MA}, publisher = {MIT Press}, url = {https://aclanthology.org/Q18-1025}, doi = {10.1162/tacl_a_00025}, pages = {343--356}, }
2017
- Multi-lingual and Cross-lingual timeline extractionEgoitz Laparra, Rodrigo Agerri, Itziar Aldabe, and 1 more authorKnowledge-Based Systems 2017
In this paper we present an approach to extract ordered timelines of events, their participants, locations and times from a set of Multilingual and Cross-lingual data sources. Based on the assumption that event-related information can be recovered from different documents written in different languages, we extend the Cross-document Event Ordering task presented at SemEval 2015 by specifying two new tasks for, respectively, Multilingual and Cross-lingual timeline extraction. We then develop three deterministic algorithms for timeline extraction based on two main ideas. First, we address implicit temporal relations at document level since explicit time-anchors are too scarce to build a wide coverage timeline extraction system. Second, we leverage several multilingual resources to obtain a single, interoperable, semantic representation of events across documents and across languages. The result is a highly competitive system that strongly outperforms the current state-of-the-art. Nonetheless, further analysis of the results reveals that linking the event mentions with their target entities and time-anchors remains a difficult challenge. The systems, resources and scorers are freely available to facilitate its use and guarantee the reproducibility of results.
@article{LAPARRA201777, title = {Multi-lingual and Cross-lingual timeline extraction}, journal = {Knowledge-Based Systems}, volume = {133}, pages = {77-89}, year = {2017}, issn = {0950-7051}, doi = {https://doi.org/10.1016/j.knosys.2017.07.002}, url = {https://www.sciencedirect.com/science/article/pii/S0950705117303192}, author = {Laparra, Egoitz and Agerri, Rodrigo and Aldabe, Itziar and Rigau, German}, keywords = {Timeline extraction, Event ordering, Temporal processing, Cross-document event coreference, Predicate Matrix, Natural Language Processing}, }
2016
- NewsReader: Using knowledge resources in a cross-lingual reading machine to generate more knowledge from massive streams of newsPiek Vossen, Rodrigo Agerri, Itziar Aldabe, and 9 more authorsKnowledge-Based Systems 2016
In this article, we describe a system that reads news articles in four different languages and detects what happened, who is involved, where and when. This event-centric information is represented as episodic situational knowledge on individuals in an interoperable RDF format that allows for reasoning on the implications of the events. Our system covers the complete path from unstructured text to structured knowledge, for which we defined a formal model that links interpreted textual mentions of things to their representation as instances. The model forms the skeleton for interoperable interpretation across different sources and languages. The real content, however, is defined using multilingual and cross-lingual knowledge resources, both semantic and episodic. We explain how these knowledge resources are used for the processing of text and ultimately define the actual content of the episodic situational knowledge that is reported in the news. The knowledge and model in our system can be seen as an example how the Semantic Web helps NLP. However, our systems also generate massive episodic knowledge of the same type as the Semantic Web is built on. We thus envision a cycle of knowledge acquisition and NLP improvement on a massive scale. This article reports on the details of the system but also on the performance of various high-level components. We demonstrate that our system performs at state-of-the-art level for various subtasks in the four languages of the project, but that we also consider the full integration of these tasks in an overall system with the purpose of reading text. We applied our system to millions of news articles, generating billions of triples expressing formal semantic properties. This shows the capacity of the system to perform at an unprecedented scale.
@article{VOSSEN201660, title = {NewsReader: Using knowledge resources in a cross-lingual reading machine to generate more knowledge from massive streams of news}, journal = {Knowledge-Based Systems}, volume = {110}, pages = {60-85}, year = {2016}, issn = {0950-7051}, doi = {https://doi.org/10.1016/j.knosys.2016.07.013}, url = {https://www.sciencedirect.com/science/article/pii/S0950705116302271}, author = {Vossen, Piek and Agerri, Rodrigo and Aldabe, Itziar and Cybulska, Agata and {van Erp}, Marieke and Fokkens, Antske and Laparra, Egoitz and Minard, Anne-Lyse and {Palmero Aprosio}, Alessio and Rigau, German and Rospocher, Marco and Segers, Roxane}, keywords = {Natural language processing, Semantic web, Knowledge resources, Event extraction, Cross-lingual interopearbility}, }
- Predicate Matrix: automatically extending the semantic interoperability between predicate resourcesMaddalen Lacalle, Egoitz Laparra, Itziar Aldabe, and 1 more authorLanguage Resources and Evaluation 2016
This paper presents a novel approach to improve the interoperability between four semantic resources that incorporate predicate information. Our proposal defines a set of automatic methods for mapping the semantic knowledge included in WordNet, VerbNet, PropBank and FrameNet. We use advanced graph-based word sense disambiguation algorithms and corpus alignment methods to automatically establish the appropriate mappings among their lexical entries and roles. We study different settings for each method using SemLink as a gold-standard for evaluation. The results show that the new approach provides productive and reliable mappings. In fact, the mappings obtained automatically outnumber the set of original mappings in SemLink. Finally, we also present a new version of the Predicate Matrix, a lexical-semantic resource resulting from the integration of the mappings obtained by our automatic methods and SemLink.
@article{LopezdeLacalle2016, author = {Lopez de Lacalle, Maddalen and Laparra, Egoitz and Aldabe, Itziar and Rigau, German}, title = {Predicate Matrix: automatically extending the semantic interoperability between predicate resources}, journal = {Language Resources and Evaluation}, year = {2016}, month = jun, day = {01}, volume = {50}, number = {2}, pages = {263-289}, issn = {1574-0218}, doi = {10.1007/s10579-016-9348-5}, url = {https://doi.org/10.1007/s10579-016-9348-5}, }
- The Event and Implied Situation Ontology (ESO): Application and EvaluationRoxane Segers, Marco Rospocher, Piek Vossen, and 3 more authorsIn Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16) 2016
This paper presents the Event and Implied Situation Ontology (ESO), a manually constructed resource which formalizes the pre and post situations of events and the roles of the entities affected by an event. The ontology is built on top of existing resources such as WordNet, SUMO and FrameNet. The ontology is injected to the Predicate Matrix, a resource that integrates predicate and role information from amongst others FrameNet, VerbNet, PropBank, NomBank and WordNet. We illustrate how these resources are used on large document collections to detect information that otherwise would have remained implicit. The ontology is evaluated on two aspects: recall and precision based on a manually annotated corpus and secondly, on the quality of the knowledge inferred by the situation assertions in the ontology. Evaluation results on the quality of the system show that 50% of the events typed and enriched with ESO assertions are correct.
@inproceedings{segers-etal-2016-event, title = {The Event and Implied Situation Ontology ({ESO}): Application and Evaluation}, author = {Segers, Roxane and Rospocher, Marco and Vossen, Piek and Laparra, Egoitz and Rigau, German and Minard, Anne-Lyse}, booktitle = {Proceedings of the Tenth International Conference on Language Resources and Evaluation ({LREC}'16)}, month = may, year = {2016}, address = {Portoro{\v{z}}, Slovenia}, publisher = {European Language Resources Association (ELRA)}, url = {https://aclanthology.org/L16-1233}, pages = {1463--1470}, }
- A Multilingual Predicate MatrixMaddalen Lacalle, Egoitz Laparra, Itziar Aldabe, and 1 more authorIn Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16) 2016
This paper presents the Predicate Matrix 1.3, a lexical resource resulting from the integration of multiple sources of predicate information including FrameNet, VerbNet, PropBank and WordNet. This new version of the Predicate Matrix has been extended to cover nominal predicates by adding mappings to NomBank. Similarly, we have integrated resources in Spanish, Catalan and Basque. As a result, the Predicate Matrix 1.3 provides a multilingual lexicon to allow interoperable semantic analysis in multiple languages.
@inproceedings{lopez-de-lacalle-etal-2016-multilingual, title = {A Multilingual Predicate Matrix}, author = {Lopez de Lacalle, Maddalen and Laparra, Egoitz and Aldabe, Itziar and Rigau, German}, booktitle = {Proceedings of the Tenth International Conference on Language Resources and Evaluation ({LREC}'16)}, month = may, year = {2016}, address = {Portoro{\v{z}}, Slovenia}, publisher = {European Language Resources Association (ELRA)}, url = {https://aclanthology.org/L16-1423}, pages = {2662--2668}, }
- The Predicate Matrix and the Event and Implied Situation Ontology: Making More of EventsRoxane Segers, Egoitz Laparra, Marco Rospocher, and 3 more authorsIn Proceedings of the 8th Global WordNet Conference (GWC) 2016
This paper presents the Event and Implied Situation Ontology (ESO), a resource which formalizes the pre and post situations of events and the roles of the entities affected by an event. The ontology reuses and maps across existing resources such as WordNet, SUMO, VerbNet, PropBank and FrameNet. We describe how ESO is injected into a new version of the Predicate Matrix and illustrate how these resources are used to detect information in large document collections that otherwise would have remained implicit. The model targets interpretations of situations rather than the semantics of verbs per se. The event is interpreted as a situation using RDF taking all event components into account. Hence, the ontology and the linked resources need to be considered from the perspective of this interpretation model.
@inproceedings{segers-etal-2016-predicate, title = {The Predicate Matrix and the Event and Implied Situation Ontology: Making More of Events}, author = {Segers, Roxane and Laparra, Egoitz and Rospocher, Marco and Vossen, Piek and Rigau, German and Ilievski, Filip}, booktitle = {Proceedings of the 8th Global WordNet Conference (GWC)}, month = jan, year = {2016}, address = {Bucharest, Romania}, publisher = {Global Wordnet Association}, url = {https://aclanthology.org/2016.gwc-1.51}, pages = {364--372}, }
- Multilingual Event Detection using the NewsReader pipelinesRodrigo Agerri, Itziar Aldabe, Egoitz Laparra, and 8 more authorsIn Proceedings of the Workshop on Cross-Platform Text Mining and Natural Language Processing Interoperability (INTEROP 2016) at LREC 2016 2016
@inproceedings{INTEROP2016-10, address = {Portoro{\v z}, Slovenia}, author = {Agerri, Rodrigo and Aldabe, Itziar and Laparra, Egoitz and Rigau, German and Fokkens, Antske and Huijgen, Paul and van Erp, Marieke and Izquierdo Bevia, Ruben and Vossen, Piek and Minard, Anne-Lyse and Magnini, Bernardo}, booktitle = {Proceedings of the Workshop on Cross-Platform Text Mining and Natural Language Processing Interoperability (INTEROP 2016) at LREC 2016}, editor = {Eckart de Castilho, Richard and Ananiadou, Sophia and Margoni, Thomas and Peters, Wim and Piperidis, Stelios}, month = may, pages = {42--46}, publisher = {European Language Resources Association (ELRA)}, title = {Multilingual Event Detection using the NewsReader pipelines}, year = {2016}, }
2015
- Eso: A frame based ontology for events and implied situationsRoxane Segers, Piek Vossen, Marco Rospocher, and 3 more authorsIn Proceedings of MAPLEX 2015
@inproceedings{segers2015eso, title = {Eso: A frame based ontology for events and implied situations}, author = {Segers, Roxane and Vossen, Piek and Rospocher, Marco and Serafini, Luciano and Laparra, Egoitz and Rigau, German}, booktitle = {Proceedings of MAPLEX}, volume = {2015}, month = feb, year = {2015}, address = {Yamagata, Japan}, }
- From TimeLines to StoryLines: A preliminary proposal for evaluating narrativesEgoitz Laparra, Itziar Aldabe, and German RigauIn Proceedings of the First Workshop on Computing News Storylines 2015
@inproceedings{laparra-etal-2015-timelines, title = {From {T}ime{L}ines to {S}tory{L}ines: A preliminary proposal for evaluating narratives}, author = {Laparra, Egoitz and Aldabe, Itziar and Rigau, German}, booktitle = {Proceedings of the First Workshop on Computing News Storylines}, month = jul, year = {2015}, address = {Beijing, China}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/W15-4508}, doi = {10.18653/v1/W15-4508}, pages = {50--55}, }
- Semantic Interoperability for Cross-lingual and cross-document Event DetectionPiek Vossen, Egoitz Laparra, German Rigau, and 1 more authorIn Proceedings of the The 3rd Workshop on EVENTS: Definition, Detection, Coreference, and Representation 2015
@inproceedings{vossen-etal-2015-semantic, title = {Semantic Interoperability for Cross-lingual and cross-document Event Detection}, author = {Vossen, Piek and Laparra, Egoitz and Rigau, German and Aldabe, Itziar}, booktitle = {Proceedings of the The 3rd Workshop on {EVENTS}: Definition, Detection, Coreference, and Representation}, month = jun, year = {2015}, address = {Denver, Colorado}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/W15-0814}, doi = {10.3115/v1/W15-0814}, pages = {108--116}, }
- Document Level Time-anchoring for TimeLine ExtractionEgoitz Laparra, Itziar Aldabe, and German RigauIn Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers) 2015
@inproceedings{laparra-etal-2015-document, title = {Document Level Time-anchoring for {T}ime{L}ine Extraction}, author = {Laparra, Egoitz and Aldabe, Itziar and Rigau, German}, booktitle = {Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)}, month = jul, year = {2015}, address = {Beijing, China}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/P15-2059}, doi = {10.3115/v1/P15-2059}, pages = {358--364}, }
2014
- First steps towards a Predicate MatrixMaddalen Lacalle, Egoitz Laparra, and German RigauIn Proceedings of the Seventh Global Wordnet Conference 2014
@inproceedings{lopez-de-lacalle-etal-2014-first, title = {First steps towards a Predicate Matrix}, author = {L{\'o}pez de Lacalle, Maddalen and Laparra, Egoitz and Rigau, German}, booktitle = {Proceedings of the Seventh Global {W}ordnet Conference}, month = jan, year = {2014}, address = {Tartu, Estonia}, publisher = {University of Tartu Press}, url = {https://aclanthology.org/W14-0150}, pages = {363--371}, }
- Predicate Matrix: extending SemLink through WordNet mappingsMaddalen Lacalle, Egoitz Laparra, and German RigauIn Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14) 2014
This paper presents the Predicate Matrix v1.1, a new lexical resource resulting from the integration of multiple sources of predicate information including FrameNet, VerbNet, PropBank and WordNet. We start from the basis of SemLink. Then, we use advanced graph-based algorithms to further extend the mapping coverage of SemLink. Second, we also exploit the current content of SemLink to infer new role mappings among the different predicate schemas. As a result, we have obtained a new version of the Predicate Matrix which largely extends the current coverage of SemLink and the previous version of the Predicate Matrix.
@inproceedings{lopez-de-lacalle-etal-2014-predicate, title = {Predicate Matrix: extending {S}em{L}ink through {W}ord{N}et mappings}, author = {Lopez de Lacalle, Maddalen and Laparra, Egoitz and Rigau, German}, booktitle = {Proceedings of the Ninth International Conference on Language Resources and Evaluation ({LREC}'14)}, month = may, year = {2014}, address = {Reykjavik, Iceland}, publisher = {European Language Resources Association (ELRA)}, pages = {903--909}, }
- NewsReader projectRodrigo Agerri, Eneko Agirre, Itziar Aldabe, and 7 more authorsProcesamiento del Lenguaje Natural 2014
The European project NewsReader develops advanced technology to process daily news streams in 4 languages, extracting what happened, when and where it happened and who was involved. NewsReader reads massive amounts of news coming from thousands of sources. It compares the results across sources to complement information and determine where the different sources disagree. Furthermore, it merges current news with previous news, creating a long-term history rather than separate events. The result is cumulated over time, producing an extremely large knowledge base that is visualized using new techniques to provide more comprehensive access.
@article{PLN5063, author = {Agerri, Rodrigo and Agirre, Eneko and Aldabe, Itziar and Altuna, Begoña and Beloki, Zuhaitz and Laparra, Egoitz and de Lacalle, Maddalen López and Rigau, German and Soroa, Aitor and Urizar, Rubén}, title = {NewsReader project}, journal = {Procesamiento del Lenguaje Natural}, volume = {53}, number = {0}, year = {2014}, keywords = {}, issn = {1989-7553}, url = {http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/5063}, pages = {155--158}, }
2013
- Sources of Evidence for Implicit Argument ResolutionEgoitz Laparra, and German RigauIn Proceedings of the 10th International Conference on Computational Semantics (IWCS 2013) – Long Papers 2013
@inproceedings{laparra-rigau-2013-sources, title = {Sources of Evidence for Implicit Argument Resolution}, author = {Laparra, Egoitz and Rigau, German}, booktitle = {Proceedings of the 10th International Conference on Computational Semantics ({IWCS} 2013) {--} Long Papers}, month = mar, year = {2013}, address = {Potsdam, Germany}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/W13-0114}, pages = {155--166}, }
- ImpAr: A Deterministic Algorithm for Implicit Semantic Role LabellingEgoitz Laparra, and German RigauIn Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2013
@inproceedings{laparra-rigau-2013-impar, title = {{I}mp{A}r: A Deterministic Algorithm for Implicit Semantic Role Labelling}, author = {Laparra, Egoitz and Rigau, German}, booktitle = {Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)}, month = aug, year = {2013}, address = {Sofia, Bulgaria}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/P13-1116}, pages = {1180--1189}, }
2012
- Multilingual Central Repository version 3.0Aitor Gonzalez-Agirre, Egoitz Laparra, and German RigauIn Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12) 2012
This paper describes the upgrading process of the Multilingual Central Repository (MCR). The new MCR uses WordNet 3.0 as Interlingual-Index (ILI). Now, the current version of the MCR integrates in the same EuroWordNet framework wordnets from five different languages: English, Spanish, Catalan, Basque and Galician. In order to provide ontological coherence to all the integrated wordnets, the MCR has also been enriched with a disparate set of ontologies: Base Concepts, Top Ontology, WordNet Domains and Suggested Upper Merged Ontology. The whole content of the MCR is freely available.
@inproceedings{gonzalez-agirre-etal-2012-multilingual, title = {Multilingual Central Repository version 3.0}, author = {Gonzalez-Agirre, Aitor and Laparra, Egoitz and Rigau, German}, booktitle = {Proceedings of the Eighth International Conference on Language Resources and Evaluation ({LREC}'12)}, month = may, year = {2012}, address = {Istanbul, Turkey}, publisher = {European Language Resources Association (ELRA)}, pages = {2525--2529}, }
- Mapping WordNet to the Kyoto ontologyEgoitz Laparra, German Rigau, and Piek VossenIn Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12) 2012
This paper describes the connection of WordNet to a generic ontology based on DOLCE. We developed a complete set of heuristics for mapping all WordNet nouns, verbs and adjectives to the ontology. Moreover, the mapping also allows to represent predicates in a uniform and interoperable way, regardless of the way they are expressed in the text and in which language. Together with the ontology, the WordNet mappings provide a extremely rich and powerful basis for semantic processing of text in any domain. In particular, the mapping has been used in a knowledge-rich event-mining system developed for the Asian-European project KYOTO.
@inproceedings{laparra-etal-2012-mapping, title = {Mapping {W}ord{N}et to the {K}yoto ontology}, author = {Laparra, Egoitz and Rigau, German and Vossen, Piek}, booktitle = {Proceedings of the Eighth International Conference on Language Resources and Evaluation ({LREC}'12)}, month = may, year = {2012}, address = {Istanbul, Turkey}, publisher = {European Language Resources Association (ELRA)}, pages = {2584--2589}, }
- Exploiting Explicit Annotations and Semantic Types for Implicit Argument ResolutionEgoitz Laparra, and German RigauIn IEEE Sixth International Conference on Semantic Computing 2012
@inproceedings{6337085, author = {Laparra, Egoitz and Rigau, German}, booktitle = {IEEE Sixth International Conference on Semantic Computing}, title = {Exploiting Explicit Annotations and Semantic Types for Implicit Argument Resolution}, year = {2012}, volume = {}, number = {}, pages = {75-78}, doi = {10.1109/ICSC.2012.47}, }
- Multilingual central repository version 3.0: upgrading a very large lexical knowledge baseAitor González Agirre, Egoitz Laparra, and German RigauIn 6th International Global Wordnet Conference 2012
@inproceedings{agirre2012multilingual, title = {Multilingual central repository version 3.0: upgrading a very large lexical knowledge base}, author = {Agirre, Aitor Gonz{\'a}lez and Laparra, Egoitz and Rigau, German}, booktitle = {6th International Global Wordnet Conference}, pages = {118}, year = {2012}, }
2010
- Integrating a Large Domain Ontology of Species into WordNetMontse Cuadros, Egoitz Laparra, German Rigau, and 2 more authorsIn Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10) 2010
With the proliferation of applications sharing information represented in multiple ontologies, the development of automatic methods for robust and accurate ontology matching will be crucial to their success. Connecting and merging already existing semantic networks is perhaps one of the most challenging task related to knowledge engineering. This paper presents a new approach for aligning automatically a very large domain ontology of Species to WordNet in the framework of the KYOTO project. The approach relies on the use of knowledge-based Word Sense Disambiguation algorithm which accurately assigns WordNet synsets to the concepts represented in Species 2000.
@inproceedings{cuadros-etal-2010-integrating, title = {Integrating a Large Domain Ontology of Species into {W}ord{N}et}, author = {Cuadros, Montse and Laparra, Egoitz and Rigau, German and Vossen, Piek and Bosma, Wauter}, booktitle = {Proceedings of the Seventh International Conference on Language Resources and Evaluation ({LREC}'10)}, month = may, year = {2010}, address = {Valletta, Malta}, publisher = {European Language Resources Association (ELRA)}, }
- eXtended WordFrameNetEgoitz Laparra, and German RigauIn Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10) 2010
This paper presents a novel automatic approach to partially integrate FrameNet and WordNet. In that way we expect to extend FrameNet coverage, to enrich WordNet with frame semantic information and possibly to extend FrameNet to languages other than English. The method uses a knowledge-based Word Sense Disambiguation algorithm for matching the FrameNet lexical units to WordNet synsets. Specifically, we exploit a graph-based Word Sense Disambiguation algorithm that uses a large-scale knowledge-base derived from existing semantic resources. We have developed and tested additional versions of this algorithm showing substantial improvements over state-of-the-art results. Finally, we show some examples and figures of the resulting semantic resource.
@inproceedings{laparra-rigau-2010-extended, title = {e{X}tended {W}ord{F}rame{N}et}, author = {Laparra, Egoitz and Rigau, German}, booktitle = {Proceedings of the Seventh International Conference on Language Resources and Evaluation ({LREC}'10)}, month = may, year = {2010}, address = {Valletta, Malta}, publisher = {European Language Resources Association (ELRA)}, }
- Exploring the integration of WordNet and FrameNetEgoitz Laparra, German Rigau, and Montse CuadrosIn Proceedings of the 5th Global WordNet Conference (GWC 2010), Mumbai, India 2010
@inproceedings{laparra2010exploring, title = {Exploring the integration of WordNet and FrameNet}, author = {Laparra, Egoitz and Rigau, German and Cuadros, Montse}, booktitle = {Proceedings of the 5th Global WordNet Conference (GWC 2010), Mumbai, India}, year = {2010}, }
2009
- Enriching Knowledge Sources for Natural Language UnderstandingEgoitz LaparraMaster’s Thesis 2009
@article{laparra2009enriching, title = {Enriching Knowledge Sources for Natural Language Understanding}, author = {Laparra, Egoitz}, year = {2009}, journal = {Master's Thesis}, publisher = {University of the Basque Country}, }
- Linking WordNet to FrameNet by using a knowledge-base Word Sense Disambiguation algorithmEgoitz Laparra, and German RigauIn Actas de las III Jornadas TIMM 2009
@inproceedings{laparralinking, title = {Linking WordNet to FrameNet by using a knowledge-base Word Sense Disambiguation algorithm}, author = {Laparra, Egoitz and Rigau, German}, month = feb, year = {2009}, booktitle = {Actas de las III Jornadas TIMM}, pages = {55}, address = {Madrid, Spain}, }
- Integrating WordNet and FrameNet using a Knowledge-based Word Sense Disambiguation AlgorithmEgoitz Laparra, and German RigauIn Proceedings of the International Conference RANLP-2009 2009
@inproceedings{laparra-rigau-2009-integrating, title = {Integrating {W}ord{N}et and {F}rame{N}et using a Knowledge-based Word Sense Disambiguation Algorithm}, author = {Laparra, Egoitz and Rigau, German}, booktitle = {Proceedings of the International Conference {RANLP}-2009}, month = sep, year = {2009}, address = {Borovets, Bulgaria}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/R09-1039}, pages = {208--213}, }
- Evaluación de métodos semi-automáticos para la conexión entre FrameNet y SenSemLaura Alonso, Irene Castellón, Egoitz Laparra, and 1 more authorProcesamiento del Lenguaje Natural 2009
En este artículo se presenta una aproximación a la conexión automática de modelos predicativos en castellano e inglés. El objetivo es establecer la dificultad de la tarea y medir el rendimiento de diferentes técnicas y métodos semi-automáticos. Por un lado perseguimos la reducción del esfuerzo para el enriquecimiento de dichos recursos y por otro al aumento de su cobertura y de su consistencia. Para ello combinamos la anotación manual con dos métodos automáticos para establecer correspondencias entre las distintas unidades semánticas.
@article{PLN34, author = {Alonso, Laura and Castellón, Irene and Laparra, Egoitz and Rigau, German}, title = {Evaluación de métodos semi-automáticos para la conexión entre FrameNet y SenSem}, journal = {Procesamiento del Lenguaje Natural}, volume = {43}, number = {0}, year = {2009}, keywords = {}, issn = {1989-7553}, url = {http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/34}, pages = {295--302}, }
- KYOTO ProjectEneko Aguirre, Arantza Casillas, Arantza Díaz Ilarraza, and 5 more authorsProcesamiento del Lenguaje Natural 2009
El proyecto Kyoto construye un sistema de información independiente del lenguaje para un dominio específico (medio ambiente, ecología y diversidad) basado en una ontología independiente del lenguaje que estaría enlazada a Wordnets en siete idiomas.
@article{PLN56, author = {Aguirre, Eneko and Casillas, Arantza and de Ilarraza, Arantza Díaz and Estarrona, Ainara and Fernández, Kike and Gojenola, Koldo and Laparra, Egoitz and Soroa, Aitor}, title = {KYOTO Project}, journal = {Procesamiento del Lenguaje Natural}, volume = {43}, number = {0}, year = {2009}, keywords = {}, issn = {1989-7553}, url = {http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/56}, pages = {389--390}, }
- A New Proposal for Using First-Order Theorem Provers to Reason with OWL DL OntologiesMikel Alecha, Javier Álvez, M Hermo, and 1 more authorIn Programación y lenguajes: IX Jornadas sobre Programación y Lenguajes, PROLE’09, I Taller de Programación Funcional, TPF’09 2009
@inproceedings{alecha2009new, title = {A New Proposal for Using First-Order Theorem Provers to Reason with OWL DL Ontologies}, author = {Alecha, Mikel and {\'A}lvez, Javier and Hermo, M and Laparra, E}, booktitle = {Programaci{\'o}n y lenguajes: IX Jornadas sobre Programaci{\'o}n y Lenguajes, PROLE'09, I Taller de Programaci{\'o}n Funcional, TPF'09}, location = {San Sebasti{\'a}n, Espa{\~n}a}, pages = {151--160}, year = {2009}, organization = {Mondragon Unibertsitatea}, }
2008
- Complete and Consistent Annotation of WordNet using the Top Concept OntologyJavier Álvez, Jordi Atserias, Jordi Carrera, and 4 more authorsIn Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC’08) 2008
This paper presents the complete and consistent ontological annotation of the nominal part of WordNet. The annotation has been carried out using the semantic features defined in the EuroWordNet Top Concept Ontology and made available to the NLP community. Up to now only an initial core set of 1,024 synsets, the so-called Base Concepts, was ontologized in such a way. The work has been achieved by following a methodology based on an iterative and incremental expansion of the initial labeling through the hierarchy while setting inheritance blockage points. Since this labeling has been set on the EuroWordNets Interlingual Index (ILI), it can be also used to populate any other wordnet linked to it through a simple porting process. This feature-annotated WordNet is intended to be useful for a large number of semantic NLP tasks and for testing for the first time componential analysis on real environments. Moreover, the quantitative analysis of the work shows that more than 40% of the nominal part of WordNet is involved in structure errors or inadequacies.
@inproceedings{alvez-etal-2008-complete, title = {Complete and Consistent Annotation of {W}ord{N}et using the Top Concept Ontology}, author = {{\'A}lvez, Javier and Atserias, Jordi and Carrera, Jordi and Climent, Salvador and Laparra, Egoitz and Oliver, Antoni and Rigau, German}, booktitle = {Proceedings of the Sixth International Conference on Language Resources and Evaluation ({LREC}'08)}, month = may, year = {2008}, address = {Marrakech, Morocco}, publisher = {European Language Resources Association (ELRA)}, }