A new whole process analysis framework big data based for E-commerce application

Dujuan Zhou; Junshen Hong; Junting Ou; Zizhao Yuan; Fanbiao Bao

Proceedings of the 5th International Conference on E-Commerce and Internet Technology, ECIT 2024, March 15–17, 2024, Changsha, China

Research Article

A new whole process analysis framework big data based for E-commerce application

Download1592 downloads

Cite: BibTeX Plain Text

@INPROCEEDINGS{10.4108/eai.15-3-2024.2346165,
    author={Dujuan  Zhou and Junshen  Hong and Junting  Ou and Zizhao  Yuan and Fanbiao  Bao},
    title={A new whole process analysis framework big data based for E-commerce application},
    proceedings={Proceedings of the 5th International Conference on E-Commerce and Internet Technology, ECIT 2024, March 15--17, 2024, Changsha, China},
    publisher={EAI},
    proceedings_a={ECIT},
    year={2024},
    month={5},
    keywords={data warehouse kafka buffering data visualization hadoop ecosystem spark’s underlying engine},
    doi={10.4108/eai.15-3-2024.2346165}
}

Dujuan Zhou
Junshen Hong
Junting Ou
Zizhao Yuan
Fanbiao Bao
Year: 2024
A new whole process analysis framework big data based for E-commerce application
ECIT
EAI
DOI: 10.4108/eai.15-3-2024.2346165

Dujuan Zhou¹^,*, Junshen Hong¹, Junting Ou¹, Zizhao Yuan¹, Fanbiao Bao¹

1: Beijing Institute of Technology

*Contact email: 17424@bitzh.edu.cn

Abstract

With the rapid development of the Internet data era, the number of users of e-commerce websites has increased dramatically, and the corresponding operation data also shows a surge trend. However, many e-commerce enterprises still use traditional databases, which are difficult to effectively handle massive data. This paper proposes a full-process processing analysis system based on Hadoop by deeply analyzing the Hadoop big data ecosystem technology. Innovative use of Kafka as a buffer to prevent server crashes. A customized interceptor is used on Flume for data cleaning to avoid subsequent data warehouse parsing problems, and Spark is used to replace the underlying engine of Hive to improve computational efficiency. The whole system includes four modules: data collection, data warehouse, fully automated task scheduling and data visualization, which can effectively reduce repetitive data development of e-commerce big data enterprises in practical applications, efficiently analyze massive and real historical data, and generate visualization reports

Keywords: data warehouse, kafka buffering, data visualization, hadoop ecosystem, spark’s underlying engine

Published: 2024-05-30
Publisher: EAI

: http://dx.doi.org/10.4108/eai.15-3-2024.2346165

A new whole process analysis framework big data based for E-commerce application

Abstract

About EAI

Community

Publish with EAI