About | Contact Us | Register | Login
ProceedingsSeriesJournalsSearchEAI
2nd International IEEE Conference on Communication System Software and Middleware

Research Article

Characterizing the Web Using a New Uniform Sampling Approach

Cite
BibTeX Plain Text
  • @INPROCEEDINGS{10.1109/COMSWA.2007.382558,
        author={Hamid  Mousavi and Mohammad E. Rafiei and Ali Movaghar},
        title={Characterizing the Web Using a New Uniform Sampling Approach},
        proceedings={2nd International IEEE Conference on Communication System Software and Middleware},
        publisher={IEEE},
        proceedings_a={COMSWARE},
        year={2007},
        month={7},
        keywords={Uniform Sampling  Web  Web Search Engine},
        doi={10.1109/COMSWA.2007.382558}
    }
    
  • Hamid Mousavi
    Mohammad E. Rafiei
    Ali Movaghar
    Year: 2007
    Characterizing the Web Using a New Uniform Sampling Approach
    COMSWARE
    IEEE
    DOI: 10.1109/COMSWA.2007.382558
Hamid Mousavi1,*, Mohammad E. Rafiei1,*, Ali Movaghar1,*
  • 1: CE Department, University of Tech., Tehran, Iran.
*Contact email: h_mousavig@ce.sharif.edu, rafieig@ce.sharif.edu, movagharg@sharif.edu

Abstract

Web is one the biggest source of information for many. It is also increasingly growing. For easier use of the Web, Web search engines (WSEs) are being used frequently. However, there is little information about the characteristics of the Web and also WSEs. One usual way to analysis these characteristics is to use a uniform sample. In such approaches, instead of working on the entire Web we can work on a small subset of the Web representing entire Web. In this paper, we propose a new method, called bucket-based sampling (BBS), to gather this small but uniform subset of the Web. The analyses show that BBS improves the samples' uniformity, at least 6.95% respecting PAGERANK-SMP, one of the best existing methods. Using samples gathered by BBS, we compare the relative size of seven famous WSEs. We also estimate some important characteristics of the Web. For example we estimate that the size of indexable Web is around 20.14 billion pages.

Keywords
Uniform Sampling Web Web Search Engine
Published
2007-07-09
Publisher
IEEE
Modified
2011-07-24
http://dx.doi.org/10.1109/COMSWA.2007.382558
Copyright © 2007–2025 IEEE
EBSCOProQuestDBLPDOAJPortico
EAI Logo

About EAI

  • Who We Are
  • Leadership
  • Research Areas
  • Partners
  • Media Center

Community

  • Membership
  • Conference
  • Recognition
  • Sponsor Us

Publish with EAI

  • Publishing
  • Journals
  • Proceedings
  • Books
  • EUDL