Wes Gurnee

@wes-gurnee

Ideas by Wes Gurnee

Open-ended Open

Classify an agent's value function based on behaviour

Use an [XLand](https://www.deepmind.com/blog/generally-capable-agents-emerge-from-open-ended-play)-like environment, create a "user" agent with a random utility function and have an interacting "predictor" agent that attempts to predict the user's value function / program / neural state or anything else embodying its values. It is rewarded based on the precision before interaction has started to avoid incentives to alter the user's behaviour. [Read more](https://www.lesswrong.com/posts/KvHCboMeNBEcZrdaw/alignment-and-deep-learning).