University of Tasmania
Browse
145508 - Reinforcement learning from hierarchical critics.pdf (2.72 MB)

Reinforcement learning from hierarchical critics

Download (2.72 MB)
journal contribution
posted on 2023-05-21, 01:02 authored by Cao, Z, Lin, C-T
In this study, we investigate the use of global information to speed up the learning process and increase the cumulative rewards of reinforcement learning (RL) in competition tasks. Within the framework of actor-critic RL, we introduce multiple cooperative critics from two levels of a hierarchy and propose an RL from the hierarchical critics (RLHC) algorithm. In our approach, each agent receives value information from local and global critics regarding a competition task and accesses multiple cooperative critics in a top-down hierarchy. Thus, each agent not only receives low-level details, but also considers coordination from higher levels, thereby obtaining global information to improve the training performance. Then, we test the proposed RLHC algorithm against a benchmark algorithm, that is, proximal policy optimization (PPO), under four experimental scenarios consisting of tennis, soccer, banana collection, and crawler competitions within the Unity environment. The results show that RLHC outperforms the benchmark on these four competitive tasks.

History

Publication title

IEEE Transactions on Neural Networks and Learning Systems

Volume

34

Pagination

1066-1073

ISSN

2162-237X

Department/School

School of Information and Communication Technology

Publisher

Institute of Electrical and Electronics Engineers

Place of publication

United States

Rights statement

© 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

Repository Status

  • Open

Socio-economic Objectives

Artificial intelligence

Usage metrics

    University Of Tasmania

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC