Proceedings of the 5th International Conference on E-Commerce and Internet Technology, ECIT 2024, March 15–17, 2024, Changsha, China

Research Article

A new whole process analysis framework big data based for E-commerce application

Download42 downloads
  • @INPROCEEDINGS{10.4108/eai.15-3-2024.2346165,
        author={Dujuan  Zhou and Junshen  Hong and Junting  Ou and Zizhao  Yuan and Fanbiao  Bao},
        title={A new whole process analysis framework big data based for E-commerce application},
        proceedings={Proceedings of the 5th International Conference on E-Commerce and Internet Technology, ECIT 2024, March 15--17, 2024, Changsha, China},
        publisher={EAI},
        proceedings_a={ECIT},
        year={2024},
        month={5},
        keywords={data warehouse kafka buffering data visualization hadoop ecosystem spark’s underlying engine},
        doi={10.4108/eai.15-3-2024.2346165}
    }
    
  • Dujuan Zhou
    Junshen Hong
    Junting Ou
    Zizhao Yuan
    Fanbiao Bao
    Year: 2024
    A new whole process analysis framework big data based for E-commerce application
    ECIT
    EAI
    DOI: 10.4108/eai.15-3-2024.2346165
Dujuan Zhou1,*, Junshen Hong1, Junting Ou1, Zizhao Yuan1, Fanbiao Bao1
  • 1: Beijing Institute of Technology
*Contact email: 17424@bitzh.edu.cn

Abstract

With the rapid development of the Internet data era, the number of users of e-commerce websites has increased dramatically, and the corresponding operation data also shows a surge trend. However, many e-commerce enterprises still use traditional databases, which are difficult to effectively handle massive data. This paper proposes a full-process processing analysis system based on Hadoop by deeply analyzing the Hadoop big data ecosystem technology. Innovative use of Kafka as a buffer to prevent server crashes. A customized interceptor is used on Flume for data cleaning to avoid subsequent data warehouse parsing problems, and Spark is used to replace the underlying engine of Hive to improve computational efficiency. The whole system includes four modules: data collection, data warehouse, fully automated task scheduling and data visualization, which can effectively reduce repetitive data development of e-commerce big data enterprises in practical applications, efficiently analyze massive and real historical data, and generate visualization reports