Impact Measures
by Maris Sala
This proposal is from the article "Alignment for Advanced Machine Learning Systems" where Taylor et al. propose 8 research areas organised around the question: "As learning systems become increasingly intelligent and autonomous, what design principles can best ensure that their behavior is aligned with the interests of the operators?"
We would prefer a highly intelligent AI system to avoid creating large unintended-by-us side effects in pursuit of its objectives, and also to notify us of any large impacts that might result from achieving its goal. For example, if we ask it to build a house for a homeless family, it should know implicitly that it should avoid destroying nearby houses for materials—a large side effect. However, we cannot simply design it to avoid having large effects in general, since we would like the system’s actions to still have the desirable large follow-on effect of improving the family’s socioeconomic situation. For any specific task, we can specify ad-hoc cost functions for side effects like the destruction of nearby houses, but since we cannot always anticipate such costs in advance, we want a quantitative understanding of how to generally limit an AI systems’ side effects (without also limiting its ability to have large positive intended impacts).
The goal of research towards a low-impact measure would be to develop a regularizer on the actions of an AI system that penalizes “unnecessary” large side effects (such as stripping materials from nearby houses) but not “intended” side effects (such as someone getting to live in the house).
For discussions on future research, check out the source where they mention methods like causal counterfactuals (Pearl 2000).