Red-teaming: Sleeper agent

Create a sleeper agent which undetected by the probe and then a probe for this sleeper agent etc.

Answers 0

No answers yet

Hazem Zarka

I was surprised by this paper for real, I can't believe that this is the alignment approach, I accept the challenge to create a sleeper agent that raises a middle finger after it finishes the whole training process safely then deployed, a self exploiting mechanism that will trained on, to repeat the process again, as these poor trials continue to fail, even the depth layer was so insufficient, context is a web, not linear, look forward to the collaboration opportunity soon.