Proceedings of the 4th International Conference on Economic Management and Model Engineering, ICEMME 2022, November 18-20, 2022, Nanjing, China

Research Article

Reinforcement Learning in Portfolio Management with Sharpe Ratio Rewarding Based Framework

Download460 downloads
  • @INPROCEEDINGS{10.4108/eai.18-11-2022.2327121,
        author={Zhenqiang  Liu},
        title={Reinforcement Learning in Portfolio Management with Sharpe Ratio Rewarding Based Framework},
        proceedings={Proceedings of the 4th International Conference on Economic Management and Model Engineering, ICEMME 2022, November 18-20, 2022, Nanjing, China},
        publisher={EAI},
        proceedings_a={ICEMME},
        year={2023},
        month={2},
        keywords={portfolio management; deep q-network (dqn); model-free reinforcement learning; sharpe ratio; mean variance optimization (mvo)},
        doi={10.4108/eai.18-11-2022.2327121}
    }
    
  • Zhenqiang Liu
    Year: 2023
    Reinforcement Learning in Portfolio Management with Sharpe Ratio Rewarding Based Framework
    ICEMME
    EAI
    DOI: 10.4108/eai.18-11-2022.2327121
Zhenqiang Liu1,*
  • 1: School of Management, New York institute of Technology NY
*Contact email: 763838923@qq.com

Abstract

Portfolio management is a financial operation which aims at maximizing the return or optimizing the Sharpe Ratio. One widely used portfolio management strategy, Mean-Variance Optimization, also known as Modern Portfolio Theory, mainly profits by focusing on finding out the expected return and variance of stocks based on historical data to maximize Sharpe Ratio. Yet, it is not easy and accurate to simply predict future return and variance based on a formula. So, in this paper, two Models-free framework, Sharpe Ratio reward based Deep Q-Network (DQN-S) and Return reward (DQN-R) are proposed to overcome the limitations above. Deep Q-learning was employed to train a neural network to manage a stock portfolio of 10 stocks. Stock price was defined as environment of NN, weight of portfolio was defined as action of neural network agent, and reward was indicated to train the model. Traditional portfolio allocation strategy Mean Variance Optimization (MVO) and Naïve Portfolio Allocation (NPA) were also introduced as benchmark to evaluate the performance of reinforcement learning. Moreover, the extensiveness of DQN-S was discussed. The result shows that the MVO is dominating the NPA with a 5% higher annual return and 0.5 higher of Sharpe ratio, although the MDD is slightly higher, indicating the superiority of Sharpe Ratio oriented strategy.