pyribs: Accelerating Quality Diversity Research – Robotics and Autonomous Systems Center

Summary

Quality Diversity (QD) searches for diverse solutions for optimization problems. The Python package pyribs makes QD accessible to researchers in robotics and beyond.

Landing a lunar lander doesn’t have to be vertical and boring! Try landing your lunar lander like a space shuttle today using quality diversity algorithms.

By Bryon Tjanaka

There are many situations where it is useful to search for more than one solution to a problem. For example, if testing a robot designed to collaborate with humans in a kitchen, we can identify situations that elicit different behaviors of the agent, helping us understand how it reacts to different kitchens. Similarly, when landing a (fictional) lunar lander, we can execute different trajectories; the lunar lander can land vertically like a rocket or cruise down the runway like a space shuttle. We view such problems as instances of quality diversity (QD) optimization. For a given problem, QD aims to search for a diverse set of solutions that are as high quality as possible.

To facilitate research in QD, we at the ICAROS lab have developed pyribs, a Python library that implements numerous existing QD algorithms and provides a framework for developing new algorithms. Over the past several years, pyribs has played an important role in many of the lab’s QD-related projects. In this blog post, I will overview the story and motivation of pyribs and some of the projects that pyribs has powered. Overall, we have found pyribs to be a useful tool for QD research, and we hope that it can lower the barrier of entry for those looking to explore QD.

For more information on pyribs, visit our website at https://pyribs.org

What is Quality Diversity?

QD considers an objective function and a measure function. The objective function is a scalar-valued function. The measure function is vector-valued and can also be viewed as several individual scalar-valued functions. The heatmap on the right shows a 2D archive output by a QD algorithm. Each cell stores one solution. The axes indicate the two measures, while the color shows the objective (brighter is better).

To understand how pyribs works, we must first understand quality diversity optimization. At a high level, the motivation of QD is to find diverse, high-performing solutions to a given problem. More formally, given solutions 𝜃, QD considers an objective function 𝑓(𝜃) and a vector-valued measure function 𝑚(𝜃). The objective function quantifies the quality of the solution, with higher objectives indicating higher quality. The measure function describes the properties of the solution over which we seek diversity. Intuitively, a set of solutions that achieve different values of the measure function is considered diverse.

For example, in a locomotion problem, solutions could be the parameters for a neural network controller. The objective would be to walk forward, while the measures would describe how the agent walks. For example, below are several different behaviors for a half-cheetah agent.

Various gaits discovered by a QD algorithm for a half-cheetah.

The solutions discovered by a QD algorithm are stored in a data structure known as an archive. Grid representations are a simple and common archive representation; however, other archive representations are possible. Archives represented as grids can be visualized as a heatmap (see figure above), where each cell in this heatmap stores a single solution with a certain objective and measure value.

The pyribs Origin Story

In 2020, the ICAROS Lab identified the need for software that supports research on QD algorithms. As QD was a nascent field (the first papers were published in 2011 and 2015), QD software was still in its infancy, and most labs re-implemented the algorithms from scratch. However, this approach was becoming intractable with the advent of more advanced and complex algorithms. For example, the original implementation of CMA-ME (an algorithm that showed state-of-the-art in many QD problems) was deeply tied to its original project, making it difficult to integrate into new projects and share with other researchers.

Based on pain-points with re-using QD software, we designed pyribs around three key principles:

Simple: We wanted pyribs to consist only of components necessary to run a QD algorithm. We noticed that many algorithm libraries bloated over time due to supporting features that were already available in other libraries. Thus, we decided to make pyribs as “bare-bones” as possible (hence the ribs part of the name). Additional capabilities such as logging and multiprocessing were left entirely to external libraries.
Flexible: We wanted pyribs to support not just current QD algorithms but also future ones. We designed pyribs around a modular framework consisting of three components. Each component can be replaced or modified to create a new algorithm. In a sense, this also makes pyribs “bare-bones” since the individual bones can be replaced.
Accessible: Last and arguably most importantly, we wanted to make pyribs highly accessible. Our goal was that nearly anyone should be able to run pyribs, especially QD beginners who may have limited knowledge and hardware (we’ve even taught high schoolers to use pyribs!). To this end, we made pyribs able to run on a single CPU, meaning it can run anywhere from a high-performance cluster to a Raspberry Pi. Furthermore, we placed a large emphasis on the documentation of pyribs, with extensive tutorials and references.

We released our first version of pyribs in February 2021, as well as a paper in July 2023 elaborating on the design of pyribs. Over time, we’ve gradually worked to improve the library, including adding more features and algorithms from both the ICAROS Lab and the wider QD community.

pyribs Expedites QD Research

Automatic Scenario Generation

Understanding the behavior of human-robot interaction (HRI) systems before they are deployed in the real world is a core problem in robotics. In unconstrained environments that robots exist in, there may be scenarios that cause an HRI system to behave in unexpected, inefficient, or even dangerous manners. In this problem, QD can help identify scenarios that elicit such behaviors.

For example, consider the simplified environment below based on the video game Overcooked. In this environment, a robot with a QMDP policy (in green) and a human (in blue) collaborate to serve two soups. In the environment on the left, the human and robot divide the work equally, while on the right, the human must do all the work. Note that only the environment changes while the policies remain the same.

A human (in blue) and a robot with a QMDP policy (in green) collaborate to serve two soups. Changing the environment while keeping the policies the same can result in different distributions of the workload.

We can search for such environments using QD algorithms. Specifically, we can have the QD algorithm search over parameters of generated environments–e.g., environments generated with a GAN trained on Overcooked levels, or generating environments with surrogate models. In this case, the objective function is the number of completed objectives of the fixed agents in the environment, and the measure function consists of workload distribution (who does more work) and idle time (how long the human remains in one place). The result is a dataset of environments, such as the ones above, that elicit different behaviors of the agents.

In addition to Overcooked levels, we also used this technique to find object arrangements where human-robot shared autonomy algorithms fail to help users. To more quickly generate these environments, we have used surrogate models to estimate the objective values in interactive scenarios, such as in video game levels and human-robot interaction settings.

Quality Diversity Optimization for Reinforcement Learning

In addition to finding scenarios, we have used pyribs as a platform to develop more advanced QD algorithms, both for general-purpose optimization and especially for reinforcement learning. Modifying various components in pyribs has resulted in algorithms that demonstrate state-of-the-art performance across tasks like locomotion, image generation, and mathematical benchmarks. These algorithms include ones that modify the archive, leverage gradient information, and integrate alternative optimizers.

For example, we leveraged gradient information of neural networks to introduce differentiable quality diversity (DQD), which speeds up exploration when the objective and measure functions of the QD problem are differentiable — most prior works in QD assumed black-box objective and measure functions without gradient information. Within pyribs, we added support for these algorithms by writing new components and adding arguments to existing components to enable passing in gradients.

pyribs Around the World

Beyond USC, researchers all over the world have used pyribs in their work. For example, to create algorithms for congestion control on the Internet, researchers at the University of Tsukuba in Japan combined the pyribs implementation of the CMA-ME algorithm with grammatical evolution. Meanwhile, Autodesk Research (in Germany and New York) modified pyribs to create an algorithm for optimizing building layouts in architecture design. The list of papers that use pyribs continues to grow, with the most up-to-date list available on Google Scholar.

What’s Next?

We’ve enjoyed improving and expanding pyribs in the years since its release. It has been eye-opening to interact with the community and learn about their experiences with the library, from the bugs they discover to the features they desire and the unexpected ways they use the software. We are glad that it has grown so much, and we cannot wait to see where it is used next!