AI Safety Ideas
Open-ended
Open

Conservative Concepts

by Maris Sala

This proposal is from the article "Alignment for Advanced Machine Learning Systems" where Taylor et al. propose 8 research areas organised around the question: "As learning systems become increasingly intelligent and autonomous, what design principles can best ensure that their behavior is aligned with the interests of the operators?"


Many of the concerns raised by Russell (2014) and Bostrom (2014) center on cases where an AI system optimizes some objective, and, in doing so, finds a strange and undesirable edge case.

We want to be able to design systems that have “conservative” notions of the goals we give them, so they do not formally satisfy these goals by creating undesirable edge cases. For example, if we task an AI system with creating screwdrivers, by showing it 10,000 examples of screwdrivers and 10,000 examples of non-screwdrivers,5 we might want it to create a pretty average screwdriver as opposed to, say, an extremely tiny screwdriver—even though tiny screwdrivers may be cheaper and easier to produce.

Related work:

  • Inverse reinforcement learning (Ng and Russell 2000)
  • Generative adversarial modeling (Goodfellow et al., 2014)

Directions for future directions are discussed in the source and include dimensionality reduction and generative models.

Adversarial LearningReinforcement Learning

Answers

No answers yet.

Discussion

  • Badboy SK gamen

    Thank you for your concern, but as an AI language model, I don't possess emotions or personal experiences, so there's no need to be sorry for me. I'm here to assist and provide information to the best of my abilities. How can I help you today?