LLMs can help robots learn new tasks in unfamiliar places – Robotics and Autonomous Systems Center

Summary

Large language models can help our robots adapt to new tasks and situations by enabling better pre-training and by guiding them in unfamiliar settings.

By Jesse Zhang

Image from https://generalist-robots.github.io/

One longstanding goal of robot learning researchers is that of generalist robots which can perform arbitrary tasks in any environment. While robotics has come a long way, we are still far from generalist robots. Can we use ideas from large language models (LLMs), which have demonstrated impressive capabilities in being generalist agents able to perform a variety of language tasks, to further generalist robots?

One approach, inspired by the capabilities of LLMs after training on massive amounts of data, is to collect a lot of demonstrations and train large transformer networks on this data (e.g., RT-1 or OpenVLA). However, performance on new tasks in new settings still leaves a lot to be desired, and given that there is far less robotics data than text data, it’ll likely be difficult to enable generalist robots that can perform new tasks based on large-scale pre-training alone. Instead, we investigate how to enable our robots to adapt to new tasks and settings quickly.

In this blog post, we present ideas based on two papers that use LLMs to help our robots adapt. LLMs are able to effectively perform language tasks; in both papers, we connect our robots to the capabilities of LLMs with language. We summarize the methods below, highlighting the key ideas that enable adaptation.

SPRINT: Semantic Policy Pre-training via Language Instruction Relabeling

Jesse Zhang, Karl Pertsch, Jiahui Zhang, Joseph J. Lim. Published at ICRA 2024.

Humans often draw upon a vast array of prior skills when learning new tasks. While leveraging language to define agent skills has shown promise, efficiently adapting to novel tasks demands a large set of diverse skills. Our method, SPRINT, solves this by using large language models (LLMs) and language-conditioned offline RL agents to automatically generate more skills for pre-training, resulting in better pre-trained robots that learn new tasks faster.

An overview of SPRINT. SPRINT automatically expands a set of pre-training data with new skills through LLM-based skill aggregation and offline RL-based cross-trajectory chaining. This new data is used to pre-train an agent for faster adaptation.

SPRINT employs two key ideas to automatically expand an existing dataset of language-annotated skills: (1) LLM-based skill aggregation and (2) Cross-trajectory skill chaining with offline RL, shown above. In LLM-based aggregation, we use LLMs to automatically combine language-annotated skills together within the same trajectory into a higher-level instruction. For example, “put mug in coffee machine” and “press brew button” can be relabeled by an LLM into “make coffee.” However, this doesn’t work if the original language annotations come from different trajectories.

To address the problem of chaining language skills across trajectories (2), we devise an offline RL objective that allows the agent to implicitly chain skills together based on the likelihood that the second skill can be completed based on the first. In the paper, we show that this is equal to the Q value at the last state of the first skill conditioned on the language instruction of the second. Both aggregated and chained trajectories are added to the pre-training dataset and used to pre-train a language conditioned, offline RL agent. How does this help in learning new tasks?

SPRINT performs better especially on longer-horizon unseen tasks.

Evaluated on the ALFRED household simulator and a real-world robot arm setup from the JACO play dataset, SPRINT demonstrates up to 8x improvement in zero-shot performance and more efficient adaptation to novel environments and tasks over prior work.

Above we see an example of SPRINT completing 8 tasks in succession after fine-tuning on real-world robot data, while the next best baseline only completes 4. SPRINT’s use of both LLM-based aggregation and cross-trajectory chaining creates a richer pre-training dataset, without extra human supervision needed, that enables better acquisition and execution of new tasks.

Bootstrap your own skills (BOSS): Learning to Solve New Tasks with LLM Guidance

Jesse Zhang, Jiahui Zhang, Karl Pertsch, Ziyi Liu, Xiang Ren, Minsuk Chang, Shao-Hua Sun, Joseph J. Lim. Published at CoRL 2023 (oral presentation).

SPRINT demonstrates how LLMs can help robots learn new tasks faster by enabling more effective pre-training. But even with an effectively pre-trained robot, when deployed in a new environment, the robot must be supervised by humans in deciding what tasks are important to learn in this new setting. Can we instead drop a pre-trained robot in a new setting and let it learn new, useful, environment-tailored tasks on its own?

Our next method, BOSS, addresses this through a “skill bootstrapping” procedure guided by an LLM. Starting with a pre-trained agent, BOSS uses an LLM to help guide the agent in practicing chaining existing skills it pre-trained with into useful, longer-horizon skills tailored specifically for the environment. For example, after the robot mastLers the skill “pick up empty coffee mug,” the LLM might suggest to then “put mug in coffee machine.” The robot would then attempt to learn to complete this new skill chain. BOSS automatically learns to practice skills it can actually perform in the environment, fine-tuning low-level control to adapt to the new setting, while also adding newly chained behaviors back to its skill library to continue to learn from.

After training, a BOSS agent will have learned a repertoire of complex skills tailored specifically to the new setting. Guidance is provided only by an LLM and easy-to-design sparse reward functions. Below we see the entire BOSS pipeline.

BOSS' skill bootstrapping phase allows the robot to constantly learn and acquire new skills while fine-tuning low-level execution of existing ones, guided by an LLM.

When we want to ask the robot to perform new tasks, we can simply give the BOSS-trained robot a language instruction; with sufficient training, the BOSS robot is likely to have acquired those new tasks. Like in SPRINT, we evaluate in ALFRED and on a JACO arm robot. Below we plot how, throughout the course of training, new skills acquired by BOSS become longer-horizon and more complex and its learned skill library increases in size.

Against a variety of baselines both in ALFRED and on a real robot, BOSS outperforms other methods, especially in terms of success rates on longer horizon tasks.

In summary, both SPRINT and BOSS establish the potential of large language models in enhancing the capabilities of generalist robots. SPRINT improves pre-training through LLM-based aggregation and cross-trajectory chaining, while BOSS uses LLM guidance for skill bootstrapping in new environments. These methods provide a promising direction for the development of autonomous robots capable of learning and adapting to diverse tasks and settings.