Analyze and evaluate methodological frameworks of existing evals approaches

Some examples of methodological approaches:

ARC Evals' manual behavioral analysis approach now supported by their scaffolding
OpenAI/evals repository for automated evaluations on a range of different methods
Interpretability to evaluate underlying deception in models

Questions to ask with the project include:

What is the method (on an abstract/conceptual level)...
...why does it lead to what we want...
...what are the main weaknesses...
...and what would be alternative methods?
(optional and less important) Show a demonstration / MVP of the alternative method (diagram, actual experiment, etc.) and what expected outputs would be

ARC Evals current example: Causal node in risk stories, break it down into tasks that capture correlation with capability, measure performance on those → Combine those tasks into a full flow somehow

ReviewCognitive Science

Analyze and evaluate methodological frameworks of existing evals approaches

Answers 0

Discussion 0