Oral Poster
Understanding, Rehearsing, and Introspecting: Learn a Policy from Textual Tutorial Books in Football Games
Xiong-Hui Chen · Ziyan Wang · Yali Du · Shengyi Jiang · Meng Fang · Yang Yu · Jun Wang
West Ballroom A-D #7108
Wed 11 Dec 3:30 p.m. PST — 4:30 p.m. PST
When humans need to learn a new skill, we can acquire knowledge through written books, including textbooks, tutorials, and comments from previous learners. However, current research for decision-making, like reinforcement learning (RL), has primarily required numerous real interactions with the target environment to learn a skill, while failing to utilize the existing knowledge already summarized in the text. The success of Large Language Models (LLMs) sheds light on utilizing such knowledge behind the books. In this paper, we discuss a new policy learning problem called Policy Learning from Books, which aims to leverage rich resources such as books and tutorials to derive a policy network. Inspired by how humans learn from books, we solve the problem via a three-stage framework: understanding, rehearsing, and introspecting (URI). In particular, it first rehearses decision-making trajectories based on the derived knowledge after understanding the books, then introspects in the imaginary dataset to distill a policy network. To validate the practicality of this methodology, we train a football-playing policy via URI and test it in the Google Football game. The agent can beat the built-in AI with a 37\% winning rate without interaction with the environment during training, while using GPT as the agent can only achieve a 6\% winning rate.
Live content is unavailable. Log in and register to view live content