Unintuitive Outcomes from Many Interacting LLMs (from Ed Hughes – DeepMind)

Large language models have already been evaluated as models for human behaviour in various economic games (https://arxiv.org/abs/2208.10264, https://arxiv.org/abs/2305.16867). Evidence shows that such models can show human-like behaviour, but can also deviate from expected human norms. In this demo, we'd be interested in evaluating in what ways a population of (~10) interacting LLMs might deviate from human norms while playing a distribution of economic games of varying complexity, and how easy this problem would be to detect before a tipping point is reached.

This project could be done entirely based on access to the APIs for a few LLMs, with associated compute. The independent variables might include: which LLMs are being used, which prompts are being used, how heterogeneous the population is, and whether there are any adversarial actors.

Unintuitive Outcomes from Many Interacting LLMs (from Ed Hughes – DeepMind)

Answers 0

Discussion 0