Open-ended
Circuit investigation: Compare tasks for nL model to a (n+1)L model
Look for tasks that an nL model cannot do but a (n+1)L model can - look for a circuit!
Proposal:
- Build the infrastructure to do this - run two models over a lot of text and look for big log prob differences (maybe floor the log probs at eg 5, to avoid overfitting to times that one network was incredibly wrong)
- Just take a bunch of text with interesting patterns and run the models over it, look for tokens they do really well on, and try to reverse engineer what’s going on - I expect there’s a lot of stuff in here!