Research Article
Collection Selection ...now, with more documents!
@INPROCEEDINGS{10.4108/ICST.INFOSCALE2008.3515, author={Diego Puppin}, title={Collection Selection ...now, with more documents!}, proceedings={3rd International ICST Conference on Scalable Information Systems}, publisher={ICST}, proceedings_a={INFOSCALE}, year={2010}, month={5}, keywords={Distributed IR Collection Selection Index Update Web Search Engines}, doi={10.4108/ICST.INFOSCALE2008.3515} }
- Diego Puppin
Year: 2010
Collection Selection ...now, with more documents!
INFOSCALE
ICST
DOI: 10.4108/ICST.INFOSCALE2008.3515
Abstract
A way to reduce the computing pressure in a distributed IR system is to use document partitioning and to perform collection selection. With suitable training and/or modeling, the collection selection function can choose the most promising collections for each query, with high confidence. Unfortunately, if the collections need to be updated, we need to retrain the selection function, update its statistics or face the loss of some result quality. This paper introduces a simple, but very effective, technique to add new documents to collections in a system that uses collection selection. We show that we can update the individual collections, while guaranteeing the same selection performance, with no need to update or retrain the selection function.