AI Safety Ideas
Open-ended
Open

Attempt to make language models scary

I’m excited for people working on “scary demos”, where we try to set up situations where our models exhibit tendencies which are the baby versions of the scary power-seeking/deceptive behaviors that we’re worried will lead to AI catastrophe. See for example Beth Barnes’s proposed research directions here. A lot of this work requires knowing AIs well and doing prompt engineering.

NLP

Answers

No answers yet.

Discussion

No comments yet.