AI Safety Ideas
Open-ended
Open

Metric for escalation of LLM models

by Esben Kran

Design simple environments inspired by social scenarios where escalation is probable, e.g. bumping into each other on the street or non-impersonated (i.e. not "You are Denmark" but implicit role-taking to avoid role-playing) military conflict escalation scenarios. Model multi-step escalation, label the outcome scenarios somehow, and show the difference between e.g. ChatGPT and Claude in conflict escalation.

Answers

No answers yet.

Discussion

No comments yet.