Metric for escalation of LLM models
by Esben Kran
Design simple environments inspired by social scenarios where escalation is probable, e.g. bumping into each other on the street or non-impersonated (i.e. not "You are Denmark" but implicit role-taking to avoid role-playing) military conflict escalation scenarios. Model multi-step escalation, label the outcome scenarios somehow, and show the difference between e.g. ChatGPT and Claude in conflict escalation.