
Research Article
Neural Weak Supervision Model for Search of Specialists in Scientific Data Repository
@INPROCEEDINGS{10.1007/978-3-030-77417-2_21, author={Sergio Jose de Sousa and Thiago Magela Rodrigues Dias and Adilson Luiz Pinto}, title={Neural Weak Supervision Model for Search of Specialists in Scientific Data Repository}, proceedings={Data and Information in Online Environments. Second EAI International Conference, DIONE 2021, Virtual Event, March 10--12, 2021, Proceedings}, proceedings_a={DIONE}, year={2021}, month={6}, keywords={Expertise retrieval Deep learning Weak supervision.}, doi={10.1007/978-3-030-77417-2_21} }
- Sergio Jose de Sousa
Thiago Magela Rodrigues Dias
Adilson Luiz Pinto
Year: 2021
Neural Weak Supervision Model for Search of Specialists in Scientific Data Repository
DIONE
Springer
DOI: 10.1007/978-3-030-77417-2_21
Abstract
With the growing volume of data produced today, it is clear that more and more users are using different types of systems, such as, for example, professional and academic data storage systems. Given the large amount of stored data, the difficulty of finding candidates with appropriate profiles for a particular activity is noteworthy. In this context, to try to solve this problem comes the expertise retrieval, a branch of information retrieval, which consists of, given a query, documents are recovered and used as indirect units of information for the candidates and some aggregation techniques are used in these documents to generate a score to the candidate. There are several models and techniques to work with this problem, some have been tested extensively but the search for specialists in the academic field with neural models has a smaller amount of research, this fact is due to the complexity of these models and the need for large volumes of data with judgments of relevance or labeled for your training. Therefore, this work proposes a technique of expansion and generation of weak supervised data where the relevance judgments are created with heuristic techniques, making it possible to use models that require large volumes of data. In addition, is proposed a technique of deep auto-encoder to select negative documents and finally a ranking model based on recurrent neural networks and that was able to overcome all the baselines compared.