Find principled ways to decide which capabilities are dangerous

Determine what dangerous capabilities really are. Google Deepmind shared their dangerous capabilities framework but the capabilities do not seem to be derived from any base principles of risks and hazards and organizations use different definitions.

Think about the box that is currently outlining our thoughts on dangerous capabilities. Can we make holes and new perspectives on what is fundamental in risk assessments?

From your new principles, formulate a framework for dangerous capabilities. Be aware that you can fall in the exact same trap very easily and that this is a difficult project. E.g. do not get caught by anthropomorphization of neural networks.

A related paper on AI risk frameworks comes from Khlaaf of Trail of Bits.

ReviewTheoryGame Theory

Find principled ways to decide which capabilities are dangerous

Answers 0

Discussion 0