Investigate circuits: Compare a nL model to a (n+1)L

Look for tasks that an nL model cannot do but a (n+1)L model can - look for a circuit!

Proposal:

Build the infrastructure to do this - run two models over a lot of text and look for big log prob differences (maybe floor the log probs at eg 5, to avoid overfitting to times that one network was incredibly wrong)

Interpretability & Explainability

Answers 0

No answers yet

No comments yet.