eCite Digital Repository

Concurrent Q-Learning: Reinforcement Learning for Dynamic Goals and Environments

Citation

Ollington, RB and Vamplew, PW, Concurrent Q-Learning: Reinforcement Learning for Dynamic Goals and Environments, International Journal of Intelligent Systems, 20, (10) pp. 1037-1052. ISSN 0884-8173 (2005) [Refereed Article]


Preview
PDF
Restricted - Request a copy
251Kb
  

Copyright Statement

The definitive published version is available online at: http://www3.interscience.wiley.com/

Official URL: http://dx.doi.org/10.1002/int.20105

DOI: doi:10.1002/int.20105

Abstract

This article presents a powerful new algorithm for reinforcement learning in problems where the goals and also the environment may change. The algorithm is completely goal independent, allowing the mechanics of the environment to be learned independently of the task that is being undertaken. Conventional reinforcement learning techniques, such as Q-learning, are goal dependent. When the goal or reward conditions change, previous learning interferes with the new task that is being learned, resulting in very poor performance. Previously, the Concurrent Q-Learning algorithm was developed, based on Watkin's Q-learning, which learns the relative proximity of all states simultaneously. This learning is completely independent of the reward experienced at those states and, through a simple action selection strategy, may be applied to any given reward structure. Here it is shown that the extra information obtained may be used to replace the eligibility traces of Watkin's Q-learning, allowing many more value updates to be made at each time step. The new algorithm is compared to the previous version and also to DG-learaing in tasks involving changing goals and environments. The new algorithm is shown to perform significantly better than these alternatives, especially in situations involving novel obstructions. The algorithm adapts quickly and intelligently to changes in both the environment and reward structure, and does not suffer interference from training undertaken prior to those changes.

Item Details

Item Type:Refereed Article
Research Division:Information and Computing Sciences
Research Group:Artificial Intelligence and Image Processing
Research Field:Artificial Life
Objective Division:Expanding Knowledge
Objective Group:Expanding Knowledge
Objective Field:Expanding Knowledge in the Mathematical Sciences
Author:Ollington, RB (Dr Robert Ollington)
Author:Vamplew, PW (Dr Peter Vamplew)
ID Code:38116
Year Published:2005
Web of Science® Times Cited:3
Deposited By:Computing
Deposited On:2005-08-01
Last Modified:2012-11-06
Downloads:8 View Download Statistics

Repository Staff Only: item control page