Research Article
The Impact of User Corrections On A Crawl-Based Digital Library: A CiteSeerX Perspective
@INPROCEEDINGS{10.4108/icst.collaboratecom.2014.257563, author={Jian Wu and Kyle Williams and Madian Khabsa and C. Giles}, title={The Impact of User Corrections On A Crawl-Based Digital Library: A CiteSeerX Perspective}, proceedings={10th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing}, publisher={IEEE}, proceedings_a={COLLABORATECOM}, year={2014}, month={11}, keywords={digital library crowd-sourcing information extraction user correction}, doi={10.4108/icst.collaboratecom.2014.257563} }
- Jian Wu
Kyle Williams
Madian Khabsa
C. Giles
Year: 2014
The Impact of User Corrections On A Crawl-Based Digital Library: A CiteSeerX Perspective
COLLABORATECOM
IEEE
DOI: 10.4108/icst.collaboratecom.2014.257563
Abstract
CiteSeerX is a crawl-based digital library search engine providing free access to more than 4 million academic papers. It is inevitable for such a digital library to obtain mistakenly parsed metadata, which are retrieved in an automatic manner from PDF files coming from various sources. CiteSeerX offers a feature allowing registered users to correct paper metadata including titles, authors, abstracts, publication years, venues, etc. We claim that user corrections, as a form of crowd-collaboration, provide a useful and efficient way to improve metadata quality and the impact of the digital library. As evidence to support this claim, we investigate user corrections from the last 5 years and analyze: the nature of the corrections; the quality of the corrections; and the impact of the corrections on downloads. Furthermore, we propose a credit-based strategy, in which users are assigned more privileges based on their positive correction activities. We also propose new ways of increasing visibility of mistakenly extracted metadata to promote user correction.