About | Contact Us | Register | Login
ProceedingsSeriesJournalsSearchEAI
2nd International ICST Conference on Scalable Information Systems

Research Article

Citation Data Clustering for Author Name Disambiguation

Download787 downloads
Cite
BibTeX Plain Text
  • @INPROCEEDINGS{10.4108/infoscale.2007.203,
        author={Tomonari Masada and Atsuhiro Takasu and Jun Adachi},
        title={Citation Data Clustering for Author Name Disambiguation},
        proceedings={2nd International ICST Conference on Scalable Information Systems},
        proceedings_a={INFOSCALE},
        year={2010},
        month={5},
        keywords={Name Disambiguation Unsupervised Learning},
        doi={10.4108/infoscale.2007.203}
    }
    
  • Tomonari Masada
    Atsuhiro Takasu
    Jun Adachi
    Year: 2010
    Citation Data Clustering for Author Name Disambiguation
    INFOSCALE
    ICST
    DOI: 10.4108/infoscale.2007.203
Tomonari Masada1,*, Atsuhiro Takasu2,*, Jun Adachi2,*
  • 1: Nagasaki University Bunkyo-machi 1-14, Nagasaki, Japan
  • 2: National Institute of Informatics Hitotsubashi 2-1-2, Chiyoda-ku Tokyo, Japan
*Contact email: masada@cis.nagasaki-u.ac.jp, takasu@nii.ac.jp, adachi@nii.ac.jp

Abstract

In this paper, we propose a new method of citation data clustering for author name disambiguation. Most citation data appearing in the reference section of scientific papers include the coauthor first names with their initials. Hence, we often search citation data by using such an abbreviated name, e.g. “S. Lee” or “J. Chen”, and consequently obtain many irrelevant data in the search result, because such an abbreviated name refers to many different persons. In this paper, we propose a method of citation data clustering to construct clusters each of which includes only citation data corresponding to a unique author. Our clustering method is based on a probabilistic model which is an extension of the naive Bayes mixture model. Since our model has two hidden variables, we call it two-variable mixture model. In the evaluation experiment, we used the well-known DBLP data set. The results show that the two-variable mixture model can achieve a better balance between precision and recall than the naive Bayes mixture model.

Keywords
Name Disambiguation Unsupervised Learning
Published
2010-05-16
Modified
2011-09-11
http://dx.doi.org/10.4108/infoscale.2007.203
Copyright © 2007–2025 ICST
EBSCOProQuestDBLPDOAJPortico
EAI Logo

About EAI

  • Who We Are
  • Leadership
  • Research Areas
  • Partners
  • Media Center

Community

  • Membership
  • Conference
  • Recognition
  • Sponsor Us

Publish with EAI

  • Publishing
  • Journals
  • Proceedings
  • Books
  • EUDL