Replicate / sanity-check existing AI evaluations work and improve upon it

This is an exercise to take existing projects and improve upon them. Often, code is available to replicate the evaluations or you can infer the prompts used from the appendix of any research. Examples of research to replicate and improve:

Towards Understanding Sycophancy in Language Models
- See e.g. Nostalgebraist's replication attempt
Apollo's LLM insider trading demonstration (originally presented at the AI Safety Summit)
Situational Awareness benchmark and the co-author Rudolf will give a talk Saturday of the hackathon

You are also welcome to improve upon existing projects from previous hackathons:

Multi-agent risk hackathon (October 2023)
Evals hackathon (August 2023)

See examples of previous hackathon projects that have improved upon existing research:

MAXIAVELLI theoretically critiquing and improving upon the MACHIAVELLI benchmark
ACDC++ that improves the speed of automated circuit discovery

Cognitive ScienceReview

Replicate / sanity-check existing AI evaluations work and improve upon it

Answers 0

Discussion 0