Hypothesis
Investigate circuits: Compare a nL model to a (n+1)L
Look for tasks that an nL model cannot do but a (n+1)L model can - look for a circuit!
Proposal:
- Build the infrastructure to do this - run two models over a lot of text and look for big log prob differences (maybe floor the log probs at eg 5, to avoid overfitting to times that one network was incredibly wrong)
Interpretability & Explainability