Proceedings of the 3rd International Conference on Public Management and Big Data Analysis, PMBDA 2023, December 15–17, 2023, Nanjing, China

Research Article

A construction System of Lake-Warehouse Integration in the Electricity Industry Based on Hudi

Download74 downloads
  • @INPROCEEDINGS{10.4108/eai.15-12-2023.2345365,
        author={Peng  Wang and Jianhao  Zhang and Xiang  Wang and Guangrui  Peng and Mingli  Chen and Tianfeng  Shao and Xiaotong  Tuo and Jian  Hu},
        title={A construction System of Lake-Warehouse Integration in the Electricity Industry Based on Hudi},
        proceedings={Proceedings of the 3rd International Conference on Public Management and Big Data Analysis, PMBDA 2023, December 15--17, 2023, Nanjing, China},
        publisher={EAI},
        proceedings_a={PMBDA},
        year={2024},
        month={5},
        keywords={data lake; real-time computing; lake house; data management},
        doi={10.4108/eai.15-12-2023.2345365}
    }
    
  • Peng Wang
    Jianhao Zhang
    Xiang Wang
    Guangrui Peng
    Mingli Chen
    Tianfeng Shao
    Xiaotong Tuo
    Jian Hu
    Year: 2024
    A construction System of Lake-Warehouse Integration in the Electricity Industry Based on Hudi
    PMBDA
    EAI
    DOI: 10.4108/eai.15-12-2023.2345365
Peng Wang1,*, Jianhao Zhang1, Xiang Wang1, Guangrui Peng1, Mingli Chen1, Tianfeng Shao1, Xiaotong Tuo1, Jian Hu1
  • 1: China Realtime Database Co., Ltd
*Contact email: wangpeng@sgepri.sgcc.com.cn

Abstract

This current data integration system utilizes various methods such as DataWorks_DI, Ogg+DataHub, etc., for large-scale data access. It generates shared layer and analytical layer data in a T+1 manner, supporting upper-level data applications through Restful services. Given the existing data architecture, there are issues including redundant data synchronization links, inadequate processing timeliness, slow application queries, and non-integrated quantitative measurement data. These issues result in situations where business users still encounter unavailable or unreliable data during data utilization.To address the overall requirement of improving the timeliness of the data centralization system, and to meet the business needs for various real-time common data sets, there is an urgent need to establish a data architecture system that includes real-time incremental and full-data merging, real-time data processing modeling, and real-time data services. This system aims to build a comprehensive data processing capability that integrates both streaming and batch processing, enhancing the timeliness of data centralization application support. The goal is to enable rapid perception, monitoring, alerting, and processing of data from production business systems, ultimately improving the stability of production system operations.