Research Article
CiteSeerχ: a scalable autonomous scientific digital library
@INPROCEEDINGS{10.1145/1146847.1146865, author={Huajing Li and Isaac G. Councill and Levent Bolelli and Ding Zhou and Yang Song and Wang-Chien Lee and Anand Sivasubramaniam and C. Lee Giles}, title={CiteSeerχ: a scalable autonomous scientific digital library}, proceedings={1st International ICST Conference on Scalable Information Systems}, publisher={ACM}, proceedings_a={INFOSCALE}, year={2006}, month={6}, keywords={}, doi={10.1145/1146847.1146865} }
- Huajing Li
Isaac G. Councill
Levent Bolelli
Ding Zhou
Yang Song
Wang-Chien Lee
Anand Sivasubramaniam
C. Lee Giles
Year: 2006
CiteSeerχ: a scalable autonomous scientific digital library
INFOSCALE
ACM
DOI: 10.1145/1146847.1146865
Abstract
CiteSeer is a scientific literature digital library and search engine which automatically crawls and indexes scientific documents in the fields of computer and information science. Since it's inception in 1997 CiteSeer has grown to index over 730,000 documents and serves over 800,000 requests daily, pushing the limits of the current system's capabilities. In addition, CiteSeer's monolithic architecture inconveniences system maintenance and reduces the flexibility of the system in terms of new feature development, algorithm updates, and system interoperability. In this paper, we discuss the problems of the current CiteSeer architecture and propose a new architecture for a next generation CiteSeer application. The new architecture is based on modular web services and pluggable service components. Preliminary results based on a prototype system show the new architecture enhances flexibility, scalability, and performance for CiteSeer. In addition, new services in development for the next generation CiteSeer system are discussed.