Proceedings of the 2nd International Conference on Financial Innovation, FinTech and Information Technology, FFIT 2023, July 7–9, 2023, Chongqing, China

Research Article

Predicting Option Prices using Machine Learning Models with Options Data and Stock Prices Features

Download35 downloads
  • @INPROCEEDINGS{10.4108/eai.7-7-2023.2338048,
        author={Zhenzhen  Jia},
        title={Predicting Option Prices using Machine Learning Models with Options Data and Stock Prices Features},
        proceedings={Proceedings of the 2nd International Conference on Financial Innovation, FinTech and Information Technology, FFIT 2023, July 7--9, 2023, Chongqing, China},
        publisher={EAI},
        proceedings_a={FFIT},
        year={2023},
        month={10},
        keywords={black-scholes model linear regression machine learning options data predictive power ridge regression stock prices},
        doi={10.4108/eai.7-7-2023.2338048}
    }
    
  • Zhenzhen Jia
    Year: 2023
    Predicting Option Prices using Machine Learning Models with Options Data and Stock Prices Features
    FFIT
    EAI
    DOI: 10.4108/eai.7-7-2023.2338048
Zhenzhen Jia1,*
  • 1: Tulane University
*Contact email: ashley.jia99@gmail.com

Abstract

This research study explores the benefit of machine learning models to predict option prices using features derived from option data and stock prices. Historical data and options data for a list of tickers were collected from the Yahoo Finance API. Features were then constructed for each ticker by calculating the implied volatility, strike price, and price of call options using the Black-Scholes model. The feature vector for each option was constructed using the last eight call prices, implied volatilities, and whether the option was in-the-money or not. The stock's current price, its squared value, and its cubed value were also appended to the feature vector. Four regression models, Linear regression, Ridge regression, RandomForest Regression, and eXtreme Gradient Boosting (XGB) Regression, were trained using the features and their corresponding option prices as labels. The models were evaluated using three metrics: Mean squared error (MSE), mean absolute error (MAE), and R2 score. The performance of these models are explained by the fact that the features constructed from the option data and stock prices capture the underlying relationships between the prices of call options and the features. In addition, the hyper-parameter tuning using GridSearchCV helped to find the best model for the given data. Furthermore, the models' predictive power was compared with Black-Scholes model prices,