cc 18: e1

Research Article

VTWM: An Incremental Data Extraction Model Based on Variable Time-Windows

Download257 downloads
  • @ARTICLE{10.4108/eai.12-6-2020.166291,
        author={Weixing Jia and Yang Xu and Jie Liu and Guiling Wang},
        title={VTWM: An Incremental Data Extraction Model Based on Variable Time-Windows},
        journal={EAI Endorsed Transactions on Collaborative Computing: Online First},
        keywords={change data capture, incremental data extraction, timestamp, ETL},
  • Weixing Jia
    Yang Xu
    Jie Liu
    Guiling Wang
    Year: 2020
    VTWM: An Incremental Data Extraction Model Based on Variable Time-Windows
    DOI: 10.4108/eai.12-6-2020.166291
Weixing Jia1, Yang Xu2, Jie Liu3, Guiling Wang1,*
  • 1: Beijing Key Laboratory on Integration and Analysis of Large-Scale Stream Data, School of Information Science and Technology, North China University of Technology, No. 5 Jinyuanzhuang Road, Shijingshan District, Beijing 100144, China
  • 2: Tianjin E-Hualu Information Technology Co., Ltd, No.1 Tianhua Road, Balitai Industrial Park, Jinnan District, Tianjin 300350, China
  • 3: Beijing Yidian Wangju Technology Co., Ltd, No. 30, Shixing Street, Shijingshan District, Beijing 100103, China
*Contact email:


Continuously extracting and integrating changing data from various heterogeneous systems based on an appropriate data extraction model is the key to data sharing and integration and also the key to building an incremental data warehouse for data analysis. The traditional data capture method based on timestamp changes is plagued with anomalies in the data extraction process, which leads to data extraction failure and affects the efficiency of data extraction. To address the above problems, this paper improves the traditional data capture model based on timestamp increments and proposes VTWM, an incremental data extraction model based on variable time-windows, based on the idea of extracting a small number of duplicate records before removing duplicate values. The model reduces the influence of abnormalities on data extraction, improves the reliability of the traditional data extraction ETL processes, and improves the data extraction efficiency.