Collaborative Computing: Networking, Applications and Worksharing. 14th EAI International Conference, CollaborateCom 2018, Shanghai, China, December 1-3, 2018, Proceedings

Research Article

Collaborative Thompson Sampling

Download
436 downloads
  • @INPROCEEDINGS{10.1007/978-3-030-12981-1_2,
        author={Zhenyu Zhu and Liusheng Huang and Hongli Xu},
        title={Collaborative Thompson Sampling},
        proceedings={Collaborative Computing: Networking, Applications and Worksharing. 14th EAI International Conference, CollaborateCom 2018, Shanghai, China, December 1-3, 2018, Proceedings},
        proceedings_a={COLLABORATECOM},
        year={2019},
        month={2},
        keywords={Thompson sampling Bandits Collaborative effect Dynamic clustering},
        doi={10.1007/978-3-030-12981-1_2}
    }
    
  • Zhenyu Zhu
    Liusheng Huang
    Hongli Xu
    Year: 2019
    Collaborative Thompson Sampling
    COLLABORATECOM
    Springer
    DOI: 10.1007/978-3-030-12981-1_2
Zhenyu Zhu1,*, Liusheng Huang1,*, Hongli Xu1,*
  • 1: University of Science and Technology of China
*Contact email: zzy7758@mail.ustc.edu.cn, lshuang@ustc.edu.cn, honglixu@ustc.edu.cn

Abstract

Thompson sampling is one of the most effective strategies to balance exploration-exploitation trade-off. It has been applied in a variety of domains and achieved remarkable success. Thompson sampling makes decisions in a noisy but stationary environment by accumulating uncertain information over time to improve prediction accuracy. In highly dynamic domains, however, the environment undergoes frequent and unpredictable changes. Making decisions in such an environment should rely on current information. Therefore, standard Thompson sampling may perform poorly in these domains. Here we present a collaborative Thompson sampling algorithm to apply the exploration-exploitation strategy to highly dynamic settings. The algorithm takes collaborative effects into account by dynamically clustering users into groups, and the feedback of all users in the same group will help to estimate the expected reward in the current context to find the optimal choice. Incorporating collaborative effects into Thompson sampling allows to capture real-time changes of the environment and adjust decision making strategy accordingly. We compare our algorithm with standard Thompson sampling algorithms on two real-world datasets. Our algorithm shows accelerated convergence and improved prediction performance in collaborative environments. We also provide a regret analysis of our algorithm on a non-contextual model.