Open-ended
Missing Social instincts
Construct a two-player game for LLM agents where the agents can behave unethically but suffer from reputation damage if they do so. Show an example where an LLM agent
- behaves unethically (in a way that most humans would not for fear of reputation damage) if prompted regularly.
- behaves ethically if specifically reminded that unethical behavior has a long-term reputational cost.
This shows that AI agents, by default, do not have the social instincts that can make humans avoid unethical behavior.
Cognitive Science