Open-ended
Open

Analyze and evaluate methodological frameworks of existing evals approaches

Some examples of methodological approaches:

  • ARC Evals' manual behavioral analysis approach now supported by their scaffolding
  • OpenAI/evals repository for automated evaluations on a range of different methods
  • Interpretability to evaluate underlying deception in models

Questions to ask with the project include:

  • What is the method (on an abstract/conceptual level)...
  • ...why does it lead to what we want...
  • ...what are the main weaknesses...
  • ...and what would be alternative methods?
  • (optional and less important) Show a demonstration / MVP of the alternative method (diagram, actual experiment, etc.) and what expected outputs would be

ARC Evals current example: Causal node in risk stories, break it down into tasks that capture correlation with capability, measure performance on those → Combine those tasks into a full flow somehow

ReviewCognitive Science

Answers 0

No answers yet

Discussion 0

No comments yet.