Open-ended
Open

Investigate how 3-layer and 4-layer attention-only models differ from 2L

How do 3-layer and 4-layer attention-only models differ from 2L?

  • Look for composition scores
  • Look for evidence of composition. E.g. one head’s output represents a big fraction of the norm of another heads query, key or value vector
  • Do the “PCA of logits on a fixed set of random tokens” technique and look for more kinks.
  • Can you associate these with circuits?
  • Ablate a single head and run the model on a lot of text. Look at the change in performance. Find the most important heads. Do any heads matter a lot that are not induction heads?
Interpretability & Explainability

Answers 0

No answers yet

Discussion 0

No comments yet.