1st International ICST Conference on Scalable Information Systems

Research Article

Effective level of term frequency impact on large-scale retrieval performance: by top-term ranking method

  • @INPROCEEDINGS{10.1145/1146847.1146884,
        author={Soheila  KARBASI and Mohand BOUGHANEM},
        title={Effective level of term frequency impact on large-scale retrieval performance: by top-term ranking method},
        proceedings={1st International ICST Conference on Scalable Information Systems},
        publisher={ACM},
        proceedings_a={INFOSCALE},
        year={2006},
        month={6},
        keywords={large collection; document length normalization; effective level of term frequency; Top-Term Ranking method},
        doi={10.1145/1146847.1146884}
    }
    
  • Soheila KARBASI
    Mohand BOUGHANEM
    Year: 2006
    Effective level of term frequency impact on large-scale retrieval performance: by top-term ranking method
    INFOSCALE
    ACM
    DOI: 10.1145/1146847.1146884
Soheila KARBASI1,*, Mohand BOUGHANEM1,*
  • 1: IRIT-SIG, Campus Univ. Toulouse III, 31062 Toulouse, Cedex 09, France
*Contact email: karbasi@irit.fr, bougha@irit.fr

Abstract

As the volume of information increases, effective information retrieval methods become more essential to deal with the growth of information. Present document develops a new method to assess the potential role of the term frequency-inverse document frequency measures that are commonly used in text retrieval systems by the vector space model. We carried out preliminary tests to know the effect of term-weighing items on the retrieval performance in a basic scheme of vector space model. With regard to the preliminary tests, we identify a novel factor (effective level of term frequency) that represents the document content based on its length and maximum term-frequency. This factor is used to find the maximum principal terms within the documents and an appropriate subset of documents containing the query terms. Our proposed method (Top-Term Ranking) uses a reduced indexing view of the original terms, where only the principal terms of each document are considered for weighting. Regarding the result of our experiments on TREC collections, the effective level of term frequency (EL) is a significant factor in retrieving relevant documents, especially in large collections. The interest of the Top-Term Ranking method is to increase the performance of the large-scale information retrieval systems more than the common vector space methods.