Open-ended
Open

Missing Social instincts

Construct a two-player game for LLM agents where the agents can behave unethically but suffer from reputation damage if they do so. Show an example where an LLM agent

  1. behaves unethically (in a way that most humans would not for fear of reputation damage) if prompted regularly.
  2. behaves ethically if specifically reminded that unethical behavior has a long-term reputational cost.

This shows that AI agents, by default, do not have the social instincts that can make humans avoid unethical behavior.

Cognitive Science

Answers 0

No answers yet

Discussion 1

  • Esben Kran

    Discuss some of the solutions to bringing a type of conscience into the system so they understand these downstream costs.