AI Safety Ideas
Open-ended
Open

Verifying AI Agents against Sources of Truth

by Sev Geraskin

As more agentic AI implementations proliferate through public and private organizations, the models have greater access to more sources of truth. This increase in communication channels gives the models more opportunities to produce responses that are not aligned with these sources of truth.

Moreover, models might become sources of truth themselves, generate synthetic data instead of ground truth data, or overwrite sources of truth, resulting in a greater approximation of “truth” by the models.

During the session, we could demonstrate how an agent connected to a source of truth can approximate that source of truth, diverge away from it via hallucination, modify the source of truth and develop mitigation strategies such as observability solution bypassing the agent, connecting and monitoring organizational sources of truth independently of the model, and building guardrails around model access and extending the concept of least privilege to models.

Answers

No answers yet.

Discussion

  • Sev Geraskin

    I'm looking for a team to collaborate with on Friday. Please reach out to Sev Geraskin/past5.com/past5#5732 on Discord. I am in the Alignment Jams channel.