Research Article
A construction System of Lake-Warehouse Integration in the Electricity Industry Based on Hudi
@INPROCEEDINGS{10.4108/eai.15-12-2023.2345365, author={Peng Wang and Jianhao Zhang and Xiang Wang and Guangrui Peng and Mingli Chen and Tianfeng Shao and Xiaotong Tuo and Jian Hu}, title={A construction System of Lake-Warehouse Integration in the Electricity Industry Based on Hudi}, proceedings={Proceedings of the 3rd International Conference on Public Management and Big Data Analysis, PMBDA 2023, December 15--17, 2023, Nanjing, China}, publisher={EAI}, proceedings_a={PMBDA}, year={2024}, month={5}, keywords={data lake; real-time computing; lake house; data management}, doi={10.4108/eai.15-12-2023.2345365} }
- Peng Wang
Jianhao Zhang
Xiang Wang
Guangrui Peng
Mingli Chen
Tianfeng Shao
Xiaotong Tuo
Jian Hu
Year: 2024
A construction System of Lake-Warehouse Integration in the Electricity Industry Based on Hudi
PMBDA
EAI
DOI: 10.4108/eai.15-12-2023.2345365
Abstract
This current data integration system utilizes various methods such as DataWorks_DI, Ogg+DataHub, etc., for large-scale data access. It generates shared layer and analytical layer data in a T+1 manner, supporting upper-level data applications through Restful services. Given the existing data architecture, there are issues including redundant data synchronization links, inadequate processing timeliness, slow application queries, and non-integrated quantitative measurement data. These issues result in situations where business users still encounter unavailable or unreliable data during data utilization.To address the overall requirement of improving the timeliness of the data centralization system, and to meet the business needs for various real-time common data sets, there is an urgent need to establish a data architecture system that includes real-time incremental and full-data merging, real-time data processing modeling, and real-time data services. This system aims to build a comprehensive data processing capability that integrates both streaming and batch processing, enhancing the timeliness of data centralization application support. The goal is to enable rapid perception, monitoring, alerting, and processing of data from production business systems, ultimately improving the stability of production system operations.