Becoming more conflict-prone under learning dynamics

While RLHF'd models may be highly prosocial by default, what happens when they learn in repeated interactions with other agents? There are theoretical reasons to believe that learning dynamics should select for more conflict-prone agents in some settings, and there is some empirical evidence that language models become tougher bargainers when they're allowed to learn from experience (https://arxiv.org/abs/2305.10142).

Under what conditions do LMs become more conflict-prone when they're allowed to learn from experience?

Becoming more conflict-prone under learning dynamics

Answers

Discussion