AI Manipulation and Deception in CICERO's Diplomacy games
Search through the datasets of games from the CICERO model (Meta, 2022) and find cases where CICERO deceives and manipulates humans. Use GPT-4 with custom prompts to search over datasets and flag cases of manipulation and deception. Find the scariest examples of 1) conflict escalation and 2) threats to other agents (humans, in many cases), reference the paper's original design on human imitation learning, and critique the designs of agent systems for multi-agent politics. Relatively few games are available but hopefully there is something to find. I remember the talk at NeurIPS'22 having a few scary demos while the team mentioned that it was trained to not engage in manipulative and deceptive behavior.