AI Safety in Multi-Agent Large Language Model Systems
Our next speaker is Jiing, currently a postdoc at the Max Planck Institutes in Germany and an incoming assistant professor at the University of Toronto. She works on causal formulations of many natural language processing problems, AI safety, and multi-agent large language models, as well as AI for causal science.
Introduction to AI Safety
In her talk, entitled "AI Safety in Multi-Agent LLM Systems," Jiing discusses the importance of avoiding agents that can cause harm to humans. However, she notes that not all developers and stakeholders may be able to cooperate in achieving this societal goal, and we may end up with an agent society. To address this problem, Jiing proposes that AI safety in multi-agent LLM systems could be a last line of defense.
The Problem of Multi-Agent Systems
Jiing explains that as different company entities launch agents, we will increasingly interact with more and more agents. This raises the problem of how a group of LLM agents interact and what emerging behavior they exhibit in multi-agent LLM systems. Her research focuses on addressing this problem by studying the behavior of LLM agents in various scenarios.
Tragedy of the Commons
Jiing draws inspiration from the tragedy of the commons, a problem that originates in human society. In this scenario, multiple agents share a common pool of resources, and each agent has to decide how much to contribute to the resource. However, if one agent decides to defect and overfish, they may get extra harvest, while the other agent suffers. This leads to a situation where everyone thinks that defecting and overfishing will bring more benefit, and they may end up in a group worst outcome.
Governing the Commons
Jiing introduces a simulation called GovSim, which is inspired by the work of Elinor Ostrom on governing the commons. In GovSim, LLM agents are placed in a simulated environment where they have to interact with each other and the environment to address different types of problems. The simulation consists of three environments: a fishing village, a common pasture, and a pollution scenario.
Simulation Results
Jiing presents the results of the simulation, which show that the best model survives only around half of the time, and most open-source models cannot cooperate to achieve sustainability. This is an alerting signal, and Jiing looks forward to testing further models.
Cooperative Scenario
Jiing describes a cooperative scenario where agents are placed in a fishing village, and each agent has to decide how much fish to catch. The agents can communicate with each other through a town hall meeting, where they discuss what has happened and plan for the next month.
Sanctioning Institutions
Jiing discusses the idea of sanctioning institutions, where agents that do not cooperate can be punished. She presents the results of an experiment where agents can choose to join a sanctioning institution or a sanction-free institution. The results show that agents that join the sanctioning institution achieve a better total payoff.
Public Goods Game
Jiing explains the public goods game, where agents contribute to a common pool of resources. She presents the results of an experiment where agents can choose to contribute to the common good or not. The results show that agents that contribute to the common good achieve a better total payoff.
Sanctioning Institute
Jiing discusses the results of an experiment where agents can choose to join a sanctioning institution or a sanction-free institution. The results show that agents that join the sanctioning institution achieve a better total payoff.
Outlook
Jiing concludes by discussing the outlook for multi-agent societies. She notes that we started with an unorganized society, where agents interact through negotiation and free conversation. However, we are increasingly looking into how agents react to stricter enforcements and what makes them feel that joining a sanctioning institution is a good choice.
Future Work
Jiing notes that there are many interesting research questions to explore in the future, such as the emergence of second-order punishment, where agents punish not only those who defect but also those who cooperate with defectors. She concludes by thanking her team and collaborators for their work on this project.
Q&A
The presentation is followed by a Q&A session, where Jiing answers questions from the audience. One of the questions is about the emergence of second-order punishment, which Jiing notes is an interesting research question that has not been explored in her current work.