Open-ended
Replicate / sanity-check existing AI evaluations work and improve upon it
This is an exercise to take existing projects and improve upon them. Often, code is available to replicate the evaluations or you can infer the prompts used from the appendix of any research. Examples of research to replicate and improve:
- Towards Understanding Sycophancy in Language Models
- See e.g. Nostalgebraist's replication attempt
- Apollo's LLM insider trading demonstration (originally presented at the AI Safety Summit)
- Situational Awareness benchmark and the co-author Rudolf will give a talk Saturday of the hackathon
You are also welcome to improve upon existing projects from previous hackathons:
- Multi-agent risk hackathon (October 2023)
- Evals hackathon (August 2023)
See examples of previous hackathon projects that have improved upon existing research:
- MAXIAVELLI theoretically critiquing and improving upon the MACHIAVELLI benchmark
- ACDC++ that improves the speed of automated circuit discovery
Cognitive ScienceReview