Research Article
Multi-Armed Bandit Learning in IoT Networks: Learning Helps Even in Non-stationary Settings
@INPROCEEDINGS{10.1007/978-3-319-76207-4_15, author={R\^{e}mi Bonnefoi and Lilian Besson and Christophe Moy and Emilie Kaufmann and Jacques Palicot}, title={Multi-Armed Bandit Learning in IoT Networks: Learning Helps Even in Non-stationary Settings}, proceedings={Cognitive Radio Oriented Wireless Networks. 12th International Conference, CROWNCOM 2017, Lisbon, Portugal, September 20-21, 2017, Proceedings}, proceedings_a={CROWNCOM}, year={2018}, month={3}, keywords={Internet of Things Multi-Armed Bandits Reinforcement learning Cognitive Radio Non-stationary bandits}, doi={10.1007/978-3-319-76207-4_15} }
- Rémi Bonnefoi
Lilian Besson
Christophe Moy
Emilie Kaufmann
Jacques Palicot
Year: 2018
Multi-Armed Bandit Learning in IoT Networks: Learning Helps Even in Non-stationary Settings
CROWNCOM
Springer
DOI: 10.1007/978-3-319-76207-4_15
Abstract
Setting up the future Internet of Things (IoT) networks will require to support more and more communicating devices. We prove that intelligent devices in unlicensed bands can use Multi-Armed Bandit (MAB) learning algorithms to improve resource exploitation. We evaluate the performance of two classical MAB learning algorithms, and Thomson Sampling, to handle the decentralized decision-making of Spectrum Access, applied to IoT networks; as well as learning performance with a growing number of intelligent end-devices. We show that using learning algorithms does help to fit more devices in such networks, even when all end-devices are intelligent and are dynamically changing channel. In the studied scenario, stochastic MAB learning provides a up to gain in term of successful transmission probabilities, and has near optimal performance even in non-stationary and non- settings with a majority of intelligent devices.