Proceedings of the 2nd International Conference on Internet Technology and Educational Informatization, ITEI 2022, December 23-25, 2022, Harbin, China

Research Article

Comprehensive Analysis of the Last Four Decades of Movie Industry: Implication for Film Investment

Download293 downloads
  • @INPROCEEDINGS{10.4108/eai.23-12-2022.2329074,
        author={Junpeng  Yang},
        title={Comprehensive Analysis of the Last Four Decades of Movie Industry: Implication for Film Investment},
        proceedings={Proceedings of the 2nd International Conference on Internet Technology and Educational Informatization, ITEI 2022, December 23-25, 2022, Harbin, China},
        publisher={EAI},
        proceedings_a={ITEI},
        year={2023},
        month={6},
        keywords={film industry linear regression time series analysis factor analysis anova k-means clustering},
        doi={10.4108/eai.23-12-2022.2329074}
    }
    
  • Junpeng Yang
    Year: 2023
    Comprehensive Analysis of the Last Four Decades of Movie Industry: Implication for Film Investment
    ITEI
    EAI
    DOI: 10.4108/eai.23-12-2022.2329074
Junpeng Yang1,*
  • 1: Hong Kong Polytechnic University
*Contact email: 19107412d@connect.polyu.hk

Abstract

On the way to maturity of the film industry, the box office has been accentuated as one of the representative indicators to measure a film's success and intuitively signify profitability, and there are other critical factors for film success that have been intensely debated and divided within the research community. This article aims to combine statistical and machine learning methods, applying SPSS and Python in turn, to comprehensively analyze the IMDb dataset of the film industry. From analysis, budget and votes are selected as the most predictive variables for movie revenue in the multivariate linear regression section. Time series decomposition demonstrates a fluctuating upward trend and evident seasonality toward movie revenue. Two principal components retained after varimax rotation are summarized as income and satisfaction factors in the scheme of factor analysis, and the cross-distribution of primary genres and ratings was found to be consistent in the cross-tabulations with the R-rated comedy being the most significant pair. The U.S. leadership in the film industry is confirmed after the rate of return is introduced in ANOVA. Moreover, the entire dataset's movies are roughly divided into three major categories and latent partnerships between representative directors and their followers are detonated in the K-means clustering analysis part.