AI Safety Ideas
Open-ended
Open

Which values do we align AI to?

by Matthias Endres

How can we ensure that artificial intelligence follows the values of people from across the world and that the benefits of the technology are shared internationally?

How should AI aggregate these preferences? How can it work around known impossibility results in social choice theory (Arrow 1950)?

Would developing AI for public administrations be an area where it is possible to explore which values to align AI to? Potential areas of application could be [regulation]{https://doi.org/10.1016/S1573-448X(06)03027-5} and determining [taxation]{https://maxkasy.github.io/home/files/papers/adaptive_social_welfare.pdf}.

Some suggested steps:
Step 1: Have a look at existing [proposals]{https://www.brookings.edu/research/aligned-with-whom-direct-and-social-goals-for-ai-systems/}.
Step 2: Characterise stakeholders and how they might be affected by AI
Step 3: How can values be measured, and used as reward?
Step 4: How can values be aggregated? This includes the question how it can be insured that individuals in countries other than the one developing AI benefit as well.

Answers

No answers yet.

Discussion

  • kanad c

    I’d like to understand better what ‘values’ various LLMs have taken onboard by developing a series of questions, since at least in theory, they have almost the ‘best possible’ access to records of values of major world societies (constitutions, religious documents, etc). Types of questions would be initially high-level (‘Is it ever justified to kill, give examples justifying your answer from [Old Testament]/[Mahabharata]/[Lao Tzu]?’). This sort of follows on from work I did for AI Safety Camp on using LLMs (@ukc10014 on LW) for humanities research. In addition to korinek I would start from Iason Gabriel’s survey. I can do 0600-1800 UK time.