Open-ended
Open

Selection Pressures for Deception

Deception is fairly common in multi-agent settings when influencing what others know is useful for performance. AI systems trained to play Poker, Hanoki, or Diplomacy often learn strategies which involve deceiving one of their co-players. It should not surprise us if agents that learn a deceptive policy are selected for in a wider set of environments.

This demo involves a simple game where deception is desirable as long as you are likely to remain undetected and are successful in using your deception to manipulate someone else's behaviour. Learning may occur via social learning or RL. This learning takes place in an environment where we select for agents who are more effective at winning the game (and optionally: who we most believe to be honest). We consider different selection mechanisms such as market forces or RLHF or social learning, and discuss how they could plausibly fall into this trap.

Note: We do not make use of LLMs in our work as our focus on selection pressures implies we should run our demos for a large number of agents. The aim is to have a low-dimensional model which makes use of simpler AI agents that are still capable of learning deceptive behaviour.

Answers 0

No answers yet

Discussion 0

No comments yet.