Conservative Concepts
by Maris Sala
This proposal is from the article "Alignment for Advanced Machine Learning Systems" where Taylor et al. propose 8 research areas organised around the question: "As learning systems become increasingly intelligent and autonomous, what design principles can best ensure that their behavior is aligned with the interests of the operators?"
Many of the concerns raised by Russell (2014) and Bostrom (2014) center on cases where an AI system optimizes some objective, and, in doing so, finds a strange and undesirable edge case.
We want to be able to design systems that have “conservative” notions of the goals we give them, so they do not formally satisfy these goals by creating undesirable edge cases. For example, if we task an AI system with creating screwdrivers, by showing it 10,000 examples of screwdrivers and 10,000 examples of non-screwdrivers,5 we might want it to create a pretty average screwdriver as opposed to, say, an extremely tiny screwdriver—even though tiny screwdrivers may be cheaper and easier to produce.
Related work:
- Inverse reinforcement learning (Ng and Russell 2000)
- Generative adversarial modeling (Goodfellow et al., 2014)
Directions for future directions are discussed in the source and include dimensionality reduction and generative models.