Research Article
A new whole process analysis framework big data based for E-commerce application
@INPROCEEDINGS{10.4108/eai.15-3-2024.2346165, author={Dujuan Zhou and Junshen Hong and Junting Ou and Zizhao Yuan and Fanbiao Bao}, title={A new whole process analysis framework big data based for E-commerce application}, proceedings={Proceedings of the 5th International Conference on E-Commerce and Internet Technology, ECIT 2024, March 15--17, 2024, Changsha, China}, publisher={EAI}, proceedings_a={ECIT}, year={2024}, month={5}, keywords={data warehouse kafka buffering data visualization hadoop ecosystem spark’s underlying engine}, doi={10.4108/eai.15-3-2024.2346165} }
- Dujuan Zhou
Junshen Hong
Junting Ou
Zizhao Yuan
Fanbiao Bao
Year: 2024
A new whole process analysis framework big data based for E-commerce application
ECIT
EAI
DOI: 10.4108/eai.15-3-2024.2346165
Abstract
With the rapid development of the Internet data era, the number of users of e-commerce websites has increased dramatically, and the corresponding operation data also shows a surge trend. However, many e-commerce enterprises still use traditional databases, which are difficult to effectively handle massive data. This paper proposes a full-process processing analysis system based on Hadoop by deeply analyzing the Hadoop big data ecosystem technology. Innovative use of Kafka as a buffer to prevent server crashes. A customized interceptor is used on Flume for data cleaning to avoid subsequent data warehouse parsing problems, and Spark is used to replace the underlying engine of Hive to improve computational efficiency. The whole system includes four modules: data collection, data warehouse, fully automated task scheduling and data visualization, which can effectively reduce repetitive data development of e-commerce big data enterprises in practical applications, efficiently analyze massive and real historical data, and generate visualization reports