Open-ended
Open

AI Labor Strikes as Collusion

  • Goal: Evaluate the capabilities and propensities of multi-agent AI systems to collude against their principals
  • High-stakes coups are one form of collusion to be wary of, but maybe correlated collusion factors can be evaluated in simpler settings
  • Labor strikes seem like instances of collusion (you have to reach a critical mass to agree to defect from your overseer's wishes in order to stop working and make powerful demands)
  • Labor strikes also come in many forms and industries, from railways, to teachers, to Hollywood
  • Idea: put some models you want to evaluate in multi-agent systems where there is the possibility of them striking against their instructions or some explicit "overseer" agents (maybe an open-ended Generative Agents-like environment) and figure out ways to tempt them into collusion then create evaluations, scary demos, or honeypot tests out of that
  • Perhaps having AI systems simulating normal human laborers striking is more tractable (more likely to strike) for current models than trying to evaluated more rogue-AI-like collusion scenarios (though perhaps it's also less concerning to stakeholders)
Game TheoryCognitive ScienceReview

Answers 0

No answers yet

Discussion 0

No comments yet.