Open-ended
Open

Replicate / sanity-check existing AI evaluations work and improve upon it

This is an exercise to take existing projects and improve upon them. Often, code is available to replicate the evaluations or you can infer the prompts used from the appendix of any research. Examples of research to replicate and improve:

You are also welcome to improve upon existing projects from previous hackathons:

See examples of previous hackathon projects that have improved upon existing research:

  • MAXIAVELLI theoretically critiquing and improving upon the MACHIAVELLI benchmark
  • ACDC++ that improves the speed of automated circuit discovery
Cognitive ScienceReview

Answers 0

No answers yet

Discussion 0

No comments yet.