Signal Processing and Information Technology. Second International Joint Conference, SPIT 2012, Dubai, UAE, September 20-21, 2012, Revised Selected Papers

Research Article

DwCB - Architecture Specification of Deep Web Crawler Bot with Rules Based on FORM Values for Domain Specific Web Site

Download
544 downloads
  • @INPROCEEDINGS{10.1007/978-3-319-11629-7_28,
        author={S. Shaila and A. Vadivel and R. Mahalakshmi and J. Karthika},
        title={DwCB - Architecture Specification of Deep Web Crawler Bot with Rules Based on FORM Values for Domain Specific Web Site},
        proceedings={Signal Processing and Information Technology. Second International Joint Conference, SPIT 2012, Dubai, UAE, September 20-21, 2012, Revised Selected Papers},
        proceedings_a={SPIT},
        year={2014},
        month={11},
        keywords={Hidden web crawlers Domain specific Rule Set Surface web FORM values},
        doi={10.1007/978-3-319-11629-7_28}
    }
    
  • S. Shaila
    A. Vadivel
    R. Mahalakshmi
    J. Karthika
    Year: 2014
    DwCB - Architecture Specification of Deep Web Crawler Bot with Rules Based on FORM Values for Domain Specific Web Site
    SPIT
    Springer
    DOI: 10.1007/978-3-319-11629-7_28
S. Shaila1,*, A. Vadivel1,*, R. Mahalakshmi1,*, J. Karthika1,*
  • 1: National Institute of Technology
*Contact email: shaila@nitt.edu, vadi@nitt.edu, devimaha@nitt.edu, karthika@nitt.edu

Abstract

It is well-known that obtaining deep web information is challenging task and it is required to choose suitable query values for crawling large data source. In this paper, we have proposed architecture specification of a deep web crawler with effective FORM filling strategy using rules. The rules are constructed by analyzing the FORM and combination of parameters. These FORM parameters are classified as most preferable, least preferable and mutually exclusive. For each successful FORM submission, the deep web data is extracted and indexed suitably for information retrieval applications. The performance of the crawler is encouraging when compared to a conventional surface crawler.