AI Safety Ideas
Open-ended
Open

Automate ways to find specific circuits

by Esben Kran

Circuits are ways that Transformers understand features in the text using the Transformer Heads. Read more about Circuits and Transformer heads.

  • Automated ways to analyse attention patterns to find different kinds of heads

  • Induction heads

  • Translation heads

  • Few shot learning heads

  • The heads used in factual recall

  • The heads used in the IOI paper

  • Can you do a similar thing for neuron interpretation?

Answers

No answers yet.

Discussion

No comments yet.