Research Article
DwCB - Architecture Specification of Deep Web Crawler Bot with Rules Based on FORM Values for Domain Specific Web Site
594 downloads
@INPROCEEDINGS{10.1007/978-3-319-11629-7_28, author={S. Shaila and A. Vadivel and R. Mahalakshmi and J. Karthika}, title={DwCB - Architecture Specification of Deep Web Crawler Bot with Rules Based on FORM Values for Domain Specific Web Site}, proceedings={Signal Processing and Information Technology. Second International Joint Conference, SPIT 2012, Dubai, UAE, September 20-21, 2012, Revised Selected Papers}, proceedings_a={SPIT}, year={2014}, month={11}, keywords={Hidden web crawlers Domain specific Rule Set Surface web FORM values}, doi={10.1007/978-3-319-11629-7_28} }
- S. Shaila
A. Vadivel
R. Mahalakshmi
J. Karthika
Year: 2014
DwCB - Architecture Specification of Deep Web Crawler Bot with Rules Based on FORM Values for Domain Specific Web Site
SPIT
Springer
DOI: 10.1007/978-3-319-11629-7_28
Abstract
It is well-known that obtaining deep web information is challenging task and it is required to choose suitable query values for crawling large data source. In this paper, we have proposed architecture specification of a deep web crawler with effective FORM filling strategy using rules. The rules are constructed by analyzing the FORM and combination of parameters. These FORM parameters are classified as most preferable, least preferable and mutually exclusive. For each successful FORM submission, the deep web data is extracted and indexed suitably for information retrieval applications. The performance of the crawler is encouraging when compared to a conventional surface crawler.
Copyright © 2012–2024 ICST