About | Contact Us | Register | Login
ProceedingsSeriesJournalsSearchEAI
Cloud Computing. 10th EAI International Conference, CloudComp 2020, Qufu, China, December 11-12, 2020, Proceedings

Research Article

A Dual-Index Based Representation for Processing XPath Queries on Very Large XML Documents

Download(Requires a free EAI acccount)
3 downloads
Cite
BibTeX Plain Text
  • @INPROCEEDINGS{10.1007/978-3-030-69992-5_2,
        author={Wei Hao and Kiminori Matsuzaki and Shigeyuki Sato},
        title={A Dual-Index Based Representation for Processing XPath Queries on Very Large XML Documents},
        proceedings={Cloud Computing. 10th EAI International Conference, CloudComp 2020, Qufu, China, December 11-12, 2020, Proceedings},
        proceedings_a={CLOUDCOMP},
        year={2021},
        month={2},
        keywords={Large XML documents XPath querying Dual-index Data representation Parallel computing},
        doi={10.1007/978-3-030-69992-5_2}
    }
    
  • Wei Hao
    Kiminori Matsuzaki
    Shigeyuki Sato
    Year: 2021
    A Dual-Index Based Representation for Processing XPath Queries on Very Large XML Documents
    CLOUDCOMP
    Springer
    DOI: 10.1007/978-3-030-69992-5_2
Wei Hao1,*, Kiminori Matsuzaki2, Shigeyuki Sato2
  • 1: Anhui University of Science and Technology, Taifeng Avenue 168
  • 2: Kochi University of Technology, 185 Miyanokuchi, Tosayamada, Kami
*Contact email: whao@aust.edu.cn

Abstract

Although XML processing has been intensively studied in recent years, designing efficient implementations for evaluating XPath queries on XML documents remains a challenge in case XML documents are very large. In this study, we implemented a tree-shaped data structure called partial tree that is intrinsically suitable for large XML document processing with multiple computers. Our implementation uses two index sets to accelerate the evaluation of structural relationships among nodes, making it highly efficient for processing very large XML documents regarding three important classes of XPath queries: backward, order-aware and predicate-containing queries. Experiment results show that our implementation outperforms a start-of-the-art XML database BaseX in both absolute loading time and execution time for the target queries. The absolute execution time over 358 GB of XML data averagely is only seconds by using 32 EC2 instances.

Keywords
Large XML documents XPath querying Dual-index Data representation Parallel computing
Published
2021-02-13
Appears in
SpringerLink
http://dx.doi.org/10.1007/978-3-030-69992-5_2
Copyright © 2020–2025 ICST
EBSCOProQuestDBLPDOAJPortico
EAI Logo

About EAI

  • Who We Are
  • Leadership
  • Research Areas
  • Partners
  • Media Center

Community

  • Membership
  • Conference
  • Recognition
  • Sponsor Us

Publish with EAI

  • Publishing
  • Journals
  • Proceedings
  • Books
  • EUDL