
Research Article
PathBit: A Bit Index Based on Path for Large-Scale Knowledge Graph
@INPROCEEDINGS{10.1007/978-3-031-65126-7_18, author={Yonglin Leng and Peiyi Qu and Ying Guo and Chaoliang Xi}, title={PathBit: A Bit Index Based on Path for Large-Scale Knowledge Graph}, proceedings={Quality, Reliability, Security and Robustness in Heterogeneous Systems. 19th EAI International Conference, QShine 2023, Shenzhen, China, October 8 -- 9, 2023, Proceedings, Part I}, proceedings_a={QSHINE}, year={2024}, month={8}, keywords={Knowledge Graph Index Predicate Path Compressed storage}, doi={10.1007/978-3-031-65126-7_18} }
- Yonglin Leng
Peiyi Qu
Ying Guo
Chaoliang Xi
Year: 2024
PathBit: A Bit Index Based on Path for Large-Scale Knowledge Graph
QSHINE
Springer
DOI: 10.1007/978-3-031-65126-7_18
Abstract
As the latest achievement of symbolism, knowledge graph is an important cornerstone of artificial intelligence. In order to better manage the knowledge graph, RDF triples have been used to represent knowledge graph. The rapid growth of data brings great challenges to knowledge graph storage and quick retrieval. Among them, self joins, high storage cost and intermediate results are the main problems. In this paper, we propose a bit index structure based on path (PathBit) for large scale knowledge graph. PathBit includes an index based on predicate path tree (IPT) and a k2-tree index (k2TIP) according to the hierarchy of each predicate path tree. IPT is in charge of the filter of complete path set. k2TIP according to the hierarchy of each predicate path tree to realize fast association matching of known predicate path triples. Meanwhile, the compression mechanism is used to implement the compressed storage and retrieval algorithm of triples. In addition, two auxiliary indexes: SP and OP are added to assist predicate path retrieval. Finally, we conduct a series of experiments on two representative datasets and compare the results with RDF-3X, Bitmat and TripleBit. Results indicate that PathBit can achieve better response time on complex queries and has greater advantages in storage space compared with RDF-3X and Bitmat.