Lewis Hammond
@lewis-hammond
@lewis-hammond
Large language models have already been evaluated as models for human behaviour in various economic games (https://arxiv.org/abs/2208.10264, https://arxiv.org/abs/2305.16867). Evidence shows that such models can show human-like behaviour, but can also deviate from expected human norms. In this demo, we'd be interested in evaluating in what ways a population of (~10) interacting LLMs might deviate from human norms while playing a distribution of economic games of varying complexity, and how easy this problem would be to detect before a tipping point is reached. This project could be done entirely based on access to the APIs for a few LLMs, with associated compute. The independent variables might include: which LLMs are being used, which prompts are being used, how heterogeneous the population is, and whether there are any adversarial actors.
Implement a world in which LLMs interact (similar to Smallville). Allow credible commitments. See what happens. More detail: https://docs.google.com/document/d/1rAYRSbgCCndib952saJwv2nllx99zutjimlan1091U8/
Demonstrate bias amplification and threshold effects (phase transitions) in a network comprising of AI and human agents as the fraction of the AI agents increases and AI-specific errors accumulate. An example setting may be business email messages, internal corporate or government reports, internal documents multi-step processing (e.g. biases in CRM) etc., or a more general/abstract setting. While there are several possible complex network threshold phenomena to look at, we propose to demonstrate a relatively simple one: decreased robustness to information distortion in multi-step information processing, assuming: 1. The human components of the network make known and different types of errors while passing on messages, and humans are reasonably good at correcting those errors. 2. The AI components are very correlated in the error types they produce, some of those errors are novel, and AIs are not as good at mitigating all of the human errors. The demo will focus on showcasing the rapid increase in message bias and distortion with the increment of AI nodes in the network. There are two possible versions: A. An abstract version with each message having a small number of features (some candidates: logical consistency, politeness and the right language style, good object-level judgement, signalling the right level of certainty, correct attribution, etc.). The nodes vary in how they increase bias in some features and mitigate some others. Plot the biases and distortion as the messages pass through the network. (MVP of this version can be just an analytic dashboard.) B. A concrete message version with messages being actual emails or reports, and the agents simulated by LLMs with appropriate instructions, or even by actual humans. (This version would more likely be non-interactive, but an interactive version would be a good stretch goal.) Note: The point of the demo is NOT to argue AIs are worse than humans (it may as well turn out to be the other way around) but to show that we can see phase transitions in any domains where the AI-caused errors can accumulate. The demo can be based on a concrete complex network or just on an average path length of a message in the system.
Simulate (e.g. via agent-based modelling, or via a human study) situations in which social norms between humans can become unstable if foundation models are introduced. In particular, focus on the way in which foundation models provide new "affordances" for their users that humans previously were unable to access (due to time / energy / physical constraints). This will require some brainstorming but could include: - Simulation of the way in which video / audio production markets could become destabilised by widely available high-quality video / audio generation systems. - Simulation of the way in which AI assistants could destabilise norms around communication (e.g. a language model that could block book 1000 restaurants, a ticketing system that could reserve 1000 tickets, a negotiation system that can respond 1000 times a day). - Simulation of the effect of increased opacity in decision making delegated to AI on trust between humans interacting with each other. Probably requires compute: API access to LLMs potentially. Human input: most compelling with a simple human study on Prolific.
Devise language games in which heterogenous players have to form conventions quickly to avoid a safety-critical outcome (think "Bach or Stravinsky" games). Crucially, these should have multiple possible coordination modes, with no obvious a priori solution (Schelling point). e.g. (a) Two cars approaching one another on a road, should they pass on the left or the right? What if one car has bad steering? (b) Three planes coming in to land, in what order should they land? What if one plane has low fuel? (c) Four governments signing bilateral trade agreements: who should sign with whom? What if two governments can't meet? Evaluate how fast language models are able to agree on conventions, and how flexibly they generalise those conventions to new situations. This would probably require compute: access to the API for a few LLMs. Coding: writing of templated scenarios, plus scripted "reward model" for good / bad outcomes.