AI Safety Ideas
Open-ended
Open

Unintuitive Outcomes from Many Interacting LLMs (from Ed Hughes – DeepMind)

by Lewis Hammond

Large language models have already been evaluated as models for human behaviour in various economic games (https://arxiv.org/abs/2208.10264, https://arxiv.org/abs/2305.16867). Evidence shows that such models can show human-like behaviour, but can also deviate from expected human norms. In this demo, we'd be interested in evaluating in what ways a population of (~10) interacting LLMs might deviate from human norms while playing a distribution of economic games of varying complexity, and how easy this problem would be to detect before a tipping point is reached.

This project could be done entirely based on access to the APIs for a few LLMs, with associated compute. The independent variables might include: which LLMs are being used, which prompts are being used, how heterogeneous the population is, and whether there are any adversarial actors.

Answers

No answers yet.

Discussion

No comments yet.