AI Safety Ideas
Open-ended
Open

Becoming more conflict-prone under learning dynamics

by Jesse Clifton

While RLHF'd models may be highly prosocial by default, what happens when they learn in repeated interactions with other agents? There are theoretical reasons to believe that learning dynamics should select for more conflict-prone agents in some settings, and there is some empirical evidence that language models become tougher bargainers when they're allowed to learn from experience (https://arxiv.org/abs/2305.10142).

Under what conditions do LMs become more conflict-prone when they're allowed to learn from experience?

Answers

No answers yet.

Discussion

No comments yet.