3rd International ICST Conference on Scalable Information Systems

Research Article

Collection Selection ...now, with more documents!

Download408 downloads
  • @INPROCEEDINGS{10.4108/ICST.INFOSCALE2008.3515,
        author={Diego Puppin},
        title={Collection Selection ...now, with more documents!},
        proceedings={3rd International ICST Conference on Scalable Information Systems},
        publisher={ICST},
        proceedings_a={INFOSCALE},
        year={2010},
        month={5},
        keywords={Distributed IR Collection Selection Index Update Web Search Engines},
        doi={10.4108/ICST.INFOSCALE2008.3515}
    }
    
  • Diego Puppin
    Year: 2010
    Collection Selection ...now, with more documents!
    INFOSCALE
    ICST
    DOI: 10.4108/ICST.INFOSCALE2008.3515
Diego Puppin1,2,*
  • 1: ISTI-CNR (Pisa, Italy)
  • 2: Google (Boston, USA)
*Contact email: diego.puppin@alum.mit.edu

Abstract

A way to reduce the computing pressure in a distributed IR system is to use document partitioning and to perform collection selection. With suitable training and/or modeling, the collection selection function can choose the most promising collections for each query, with high confidence. Unfortunately, if the collections need to be updated, we need to retrain the selection function, update its statistics or face the loss of some result quality. This paper introduces a simple, but very effective, technique to add new documents to collections in a system that uses collection selection. We show that we can update the individual collections, while guaranteeing the same selection performance, with no need to update or retrain the selection function.