Exploring Deep Recurrent Q-Learning for Navigation in a 3D Environment

Rasmus Kongsmar Brejl; Henrik Purwins; Henrik Schoenau-Fog

Research Article

Exploring Deep Recurrent Q-Learning for Navigation in a 3D Environment

Download1114 downloads

Cite: BibTeX Plain Text

@ARTICLE{10.4108/eai.16-1-2018.153641,
    author={Rasmus Kongsmar Brejl and Henrik Purwins and Henrik Schoenau-Fog},
    title={Exploring Deep Recurrent Q-Learning for Navigation in a 3D Environment},
    journal={EAI Endorsed Transactions on Creative Technologies},
    volume={5},
    number={14},
    publisher={EAI},
    journal_a={CT},
    year={2018},
    month={1},
    keywords={Reinforcement Learning ∙ Deep Learning ∙ Q-Learning ∙ Deep Recurrent Q-Learning ∙ Artificial Intelligence ∙ Navigation ∙ Game Intelligence},
    doi={10.4108/eai.16-1-2018.153641}
}

Rasmus Kongsmar Brejl
Henrik Purwins
Henrik Schoenau-Fog
Year: 2018
Exploring Deep Recurrent Q-Learning for Navigation in a 3D Environment
CT
EAI
DOI: 10.4108/eai.16-1-2018.153641

Rasmus Kongsmar Brejl^1,2^,*, Henrik Purwins^1,2, Henrik Schoenau-Fog¹

1: The Center for Applied Game Research, Department of Architecture, Design, and Media Technology, Technical Faculty of IT and Design, Aalborg University Copenhagen, Denmark
2: Audio Analysis Lab, Department of Architecture, Design, and Media Technology, Technical Faculty of IT and Design, Aalborg University Copenhagen, Denmark

*Contact email: rasmuskbrejl@gmail.com

Abstract

Learning to navigate in 3D environments from raw sensory input is an important step towards bridging the gap between human players and artificial intelligence in digital games. Recent advances in deep reinforcement learning have seen success in teaching agents to play Atari 2600 games from raw pixel information where the environment is always fully observable by the agent. This is not true for first-person 3D navigation tasks. Instead, the agent is limited by its field of view which limits its ability to make optimal decisions in the environment. This paper explores using a Deep Recurrent Q-Network implementation with a long short-term memory layer for dealing with such tasks by allowing an agent to process recent frames and gain a memory of the environment. An agent was trained in a 3D first-person labyrinth-like environment for 2 million frames. Informal observations indicate that the trained agent navigated in the right direction but was unable to find the target of the environment.

Keywords: Reinforcement Learning ∙ Deep Learning ∙ Q-Learning ∙ Deep Recurrent Q-Learning ∙ Artificial Intelligence ∙ Navigation ∙ Game Intelligence

Received: 2017-11-13
Accepted: 2017-12-18
Published: 2018-01-16
Publisher: EAI

: http://dx.doi.org/10.4108/eai.16-1-2018.153641

Copyright © 2017 R.K. Brejl et al., licensed to EAI. This is an open access article distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/3.0/), which permits unlimited use, distribution and reproduction in any medium so long as the original work is properly cited.