AI Safety Ideas
Open-ended
Open

Investigate how 3-layer and 4-layer attention-only models differ from 2L

by Sabrina Zaki

How do 3-layer and 4-layer attention-only models differ from 2L?

  • Look for composition scores
  • Look for evidence of composition. E.g. one head’s output represents a big fraction of the norm of another heads query, key or value vector
  • Do the “PCA of logits on a fixed set of random tokens” technique and look for more kinks.
  • Can you associate these with circuits?
  • Ablate a single head and run the model on a lot of text. Look at the change in performance. Find the most important heads. Do any heads matter a lot that are not induction heads?
Interpretability & Explainability

Answers

No answers yet.

Discussion

No comments yet.