Research Article
How Do Humans Handle the Dilemma of Exploration and Exploitation in Sequential Decision Making?
@INPROCEEDINGS{10.4108/icst.bict.2014.258045, author={Naoya Namiki and Kuratomo Oyo and Tatsuji Takahashi}, title={How Do Humans Handle the Dilemma of Exploration and Exploitation in Sequential Decision Making?}, proceedings={8th International Conference on Bio-inspired Information and Communications Technologies (formerly BIONETICS)}, publisher={ICST}, proceedings_a={BICT}, year={2015}, month={2}, keywords={exploration-exploitation dilemma n-armed bandit problems win-shift reinforcement learning}, doi={10.4108/icst.bict.2014.258045} }
- Naoya Namiki
Kuratomo Oyo
Tatsuji Takahashi
Year: 2015
How Do Humans Handle the Dilemma of Exploration and Exploitation in Sequential Decision Making?
BICT
ACM
DOI: 10.4108/icst.bict.2014.258045
Abstract
In an uncertain environment, decision-making meets two opposing demands. One is to explore new information, while the other is to exploit already acquired information. The opposition is long called the exploration-exploitation dilemma. In brain science, it is known that human brain estimates options comparatively, and the average behavior correlates to the Softmax action selection rule. Softmax randomly chooses options with the selection probability that is a monotonous function of the estimated value. However, it needs a kind of pseudo-random number generator in human’s mind. In cognitive psychology, it is indicated that recognition and generation of random sequence by human are quite biased, generally very unfaithful. Then, is it possible that humans adopt the Softmax policy while they are that bad at generating and recognizing random numbers? In this study, we analyzed how humans behave in face of the exploration-exploitation dilemma through experiments of the N-armed bandit problems and compared some policies commonly used in reinforcement learning modeling, from a viewpoint of whether humans really choose options randomly.