Cognitive Radio Oriented Wireless Networks. 12th International Conference, CROWNCOM 2017, Lisbon, Portugal, September 20-21, 2017, Proceedings

Research Article

Multi-Armed Bandit Learning in IoT Networks: Learning Helps Even in Non-stationary Settings

  • @INPROCEEDINGS{10.1007/978-3-319-76207-4_15,
        author={R\^{e}mi Bonnefoi and Lilian Besson and Christophe Moy and Emilie Kaufmann and Jacques Palicot},
        title={Multi-Armed Bandit Learning in IoT Networks: Learning Helps Even in Non-stationary Settings},
        proceedings={Cognitive Radio Oriented Wireless Networks. 12th International Conference, CROWNCOM 2017, Lisbon, Portugal, September 20-21, 2017, Proceedings},
        proceedings_a={CROWNCOM},
        year={2018},
        month={3},
        keywords={Internet of Things Multi-Armed Bandits Reinforcement learning Cognitive Radio Non-stationary bandits},
        doi={10.1007/978-3-319-76207-4_15}
    }
    
  • Rémi Bonnefoi
    Lilian Besson
    Christophe Moy
    Emilie Kaufmann
    Jacques Palicot
    Year: 2018
    Multi-Armed Bandit Learning in IoT Networks: Learning Helps Even in Non-stationary Settings
    CROWNCOM
    Springer
    DOI: 10.1007/978-3-319-76207-4_15
Rémi Bonnefoi1,*, Lilian Besson,*, Christophe Moy1,*, Emilie Kaufmann2,*, Jacques Palicot1,*
  • 1: CentraleSupélec (campus of Rennes), IETR, SCEE Team
  • 2: Univ. Lille 1, CNRS, Inria, SequeL Team, UMR 9189 - CRIStAL
*Contact email: Remi.Bonnefoi@CentraleSupelec.fr, Lilian.Besson@CentraleSupelec.fr, Christophe.Moy@CentraleSupelec.fr, Emilie.Kaufmann@Univ-Lille1.fr, Jacques.Palicot@CentraleSupelec.fr

Abstract

Setting up the future Internet of Things (IoT) networks will require to support more and more communicating devices. We prove that intelligent devices in unlicensed bands can use Multi-Armed Bandit (MAB) learning algorithms to improve resource exploitation. We evaluate the performance of two classical MAB learning algorithms, and Thomson Sampling, to handle the decentralized decision-making of Spectrum Access, applied to IoT networks; as well as learning performance with a growing number of intelligent end-devices. We show that using learning algorithms does help to fit more devices in such networks, even when all end-devices are intelligent and are dynamically changing channel. In the studied scenario, stochastic MAB learning provides a up to gain in term of successful transmission probabilities, and has near optimal performance even in non-stationary and non- settings with a majority of intelligent devices.