AI Safety Ideas
Open-ended
Open

Bias Amplification in Mixed Networks (from Tomas Gavenciak – ACS @ Charles University)

by Lewis Hammond

Demonstrate bias amplification and threshold effects (phase transitions) in a network comprising of AI and human agents as the fraction of the AI agents increases and AI-specific errors accumulate. An example setting may be business email messages, internal corporate or government reports, internal documents multi-step processing (e.g. biases in CRM) etc., or a more general/abstract setting.

While there are several possible complex network threshold phenomena to look at, we propose to demonstrate a relatively simple one: decreased robustness to information distortion in multi-step information processing, assuming:

  1. The human components of the network make known and different types of errors while passing on messages, and humans are reasonably good at correcting those errors.
  2. The AI components are very correlated in the error types they produce, some of those errors are novel, and AIs are not as good at mitigating all of the human errors.

The demo will focus on showcasing the rapid increase in message bias and distortion with the increment of AI nodes in the network. There are two possible versions:
A. An abstract version with each message having a small number of features (some candidates: logical consistency, politeness and the right language style, good object-level judgement, signalling the right level of certainty, correct attribution, etc.). The nodes vary in how they increase bias in some features and mitigate some others. Plot the biases and distortion as the messages pass through the network. (MVP of this version can be just an analytic dashboard.)
B. A concrete message version with messages being actual emails or reports, and the agents simulated by LLMs with appropriate instructions, or even by actual humans. (This version would more likely be non-interactive, but an interactive version would be a good stretch goal.)

Note: The point of the demo is NOT to argue AIs are worse than humans (it may as well turn out to be the other way around) but to show that we can see phase transitions in any domains where the AI-caused errors can accumulate.
The demo can be based on a concrete complex network or just on an average path length of a message in the system.

Answers

No answers yet.

Discussion

No comments yet.