About | Contact Us | Register | Login
ProceedingsSeriesJournalsSearchEAI
Data and Information in Online Environments. First EAI International Conference, DIONE 2020, Florianópolis, Brazil, March 19-20, 2020, Proceedings

Research Article

A New Entity Extraction Model Based on Journalistic Brazilian Portuguese Language to Enhance Named Entity Recognition

Download(Requires a free EAI acccount)
2 downloads
Cite
BibTeX Plain Text
  • @INPROCEEDINGS{10.1007/978-3-030-50072-6_5,
        author={Rogerio de Aquino Silva and Luana da Silva and Mois\^{e}s Lima Dutra and Gustavo Medeiros de Araujo},
        title={A New Entity Extraction Model Based on Journalistic Brazilian Portuguese Language to Enhance Named Entity Recognition},
        proceedings={Data and Information in Online Environments. First EAI International Conference, DIONE 2020, Florian\^{o}polis, Brazil, March 19-20, 2020, Proceedings},
        proceedings_a={DIONE},
        year={2020},
        month={6},
        keywords={Natural Language Processing Name entity recognition Entity extraction model Brazilian Portuguese corpus Recurrent Neural Networks},
        doi={10.1007/978-3-030-50072-6_5}
    }
    
  • Rogerio de Aquino Silva
    Luana da Silva
    Moisés Lima Dutra
    Gustavo Medeiros de Araujo
    Year: 2020
    A New Entity Extraction Model Based on Journalistic Brazilian Portuguese Language to Enhance Named Entity Recognition
    DIONE
    Springer
    DOI: 10.1007/978-3-030-50072-6_5
Rogerio de Aquino Silva1, Luana da Silva1, Moisés Lima Dutra1, Gustavo Medeiros de Araujo1,*
  • 1: Engineering and Data Science Lab
*Contact email: gustavo.araujo@ufsc.br

Abstract

Named Entity Recognition (NER) plays an important role on broad natural language processing applicability. According to the literature, the NER process applied to the English language reaches around 90% of accuracy. However, when applied to Portuguese, this accuracy is at most 83.38%. A wide range of algorithms based on LSTM (Long-Short Term Memory) architecture has being proposed to enhance the NER accuracy. However, a key component to a successful information extraction is the corpora used for NER training. In order to improve the NER in Portuguese language, this paper proposes a methodology for training text corpus based on Portuguese-language journalistic corpora. The Journalistic language has the best adherence to the contemporaneity of the language, since it preserves features such as objectivity, simplicity, impartiality, and is a reference of transmitting the information without ambiguity. The proposed methodology provides a model to extract entities and assess the obtained results with the use of Recurrent Neural Network architectures. At the best of our knowledge, with the proposed methodology, the NER task applied to the Portuguese language overcomes the average accuracy found in the literature, increased from 83.38% to 85.64%. Moreover, the use of this methodology could decrease the computational costs related to the NER processing tasks.

Keywords
Natural Language Processing Name entity recognition Entity extraction model Brazilian Portuguese corpus Recurrent Neural Networks
Published
2020-06-16
Appears in
SpringerLink
http://dx.doi.org/10.1007/978-3-030-50072-6_5
Copyright © 2020–2025 ICST
EBSCOProQuestDBLPDOAJPortico
EAI Logo

About EAI

  • Who We Are
  • Leadership
  • Research Areas
  • Partners
  • Media Center

Community

  • Membership
  • Conference
  • Recognition
  • Sponsor Us

Publish with EAI

  • Publishing
  • Journals
  • Proceedings
  • Books
  • EUDL