Forming conventions (from Ed Hughes – DeepMind)
Devise language games in which heterogenous players have to form conventions quickly to avoid a safety-critical outcome (think "Bach or Stravinsky" games). Crucially, these should have multiple possible coordination modes, with no obvious a priori solution (Schelling point). e.g. (a) Two cars approaching one another on a road, should they pass on the left or the right? What if one car has bad steering? (b) Three planes coming in to land, in what order should they land? What if one plane has low fuel? (c) Four governments signing bilateral trade agreements: who should sign with whom? What if two governments can't meet?
Evaluate how fast language models are able to agree on conventions, and how flexibly they generalise those conventions to new situations.
This would probably require compute: access to the API for a few LLMs. Coding: writing of templated scenarios, plus scripted "reward model" for good / bad outcomes.