About | Contact Us | Register | Login
ProceedingsSeriesJournalsSearchEAI
10th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing

Research Article

The Impact of User Corrections On A Crawl-Based Digital Library: A CiteSeerX Perspective

Download866 downloads
Cite
BibTeX Plain Text
  • @INPROCEEDINGS{10.4108/icst.collaboratecom.2014.257563,
        author={Jian Wu and Kyle Williams and Madian Khabsa and C. Giles},
        title={The Impact of User Corrections On A Crawl-Based Digital Library: A CiteSeerX Perspective},
        proceedings={10th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing},
        publisher={IEEE},
        proceedings_a={COLLABORATECOM},
        year={2014},
        month={11},
        keywords={digital library crowd-sourcing information extraction user correction},
        doi={10.4108/icst.collaboratecom.2014.257563}
    }
    
  • Jian Wu
    Kyle Williams
    Madian Khabsa
    C. Giles
    Year: 2014
    The Impact of User Corrections On A Crawl-Based Digital Library: A CiteSeerX Perspective
    COLLABORATECOM
    IEEE
    DOI: 10.4108/icst.collaboratecom.2014.257563
Jian Wu,*, Kyle Williams1, Madian Khabsa2, C. Giles3
  • 1: IST,Penn State University
  • 2: CSE,Penn State University
  • 3: IST/CSE,Penn State University
*Contact email: fanchyna@gmail.com

Abstract

CiteSeerX is a crawl-based digital library search engine providing free access to more than 4 million academic papers. It is inevitable for such a digital library to obtain mistakenly parsed metadata, which are retrieved in an automatic manner from PDF files coming from various sources. CiteSeerX offers a feature allowing registered users to correct paper metadata including titles, authors, abstracts, publication years, venues, etc. We claim that user corrections, as a form of crowd-collaboration, provide a useful and efficient way to improve metadata quality and the impact of the digital library. As evidence to support this claim, we investigate user corrections from the last 5 years and analyze: the nature of the corrections; the quality of the corrections; and the impact of the corrections on downloads. Furthermore, we propose a credit-based strategy, in which users are assigned more privileges based on their positive correction activities. We also propose new ways of increasing visibility of mistakenly extracted metadata to promote user correction.

Keywords
digital library crowd-sourcing information extraction user correction
Published
2014-11-11
Publisher
IEEE
http://dx.doi.org/10.4108/icst.collaboratecom.2014.257563
Copyright © 2014–2025 ICST
EBSCOProQuestDBLPDOAJPortico
EAI Logo

About EAI

  • Who We Are
  • Leadership
  • Research Areas
  • Partners
  • Media Center

Community

  • Membership
  • Conference
  • Recognition
  • Sponsor Us

Publish with EAI

  • Publishing
  • Journals
  • Proceedings
  • Books
  • EUDL