AI Safety Ideas
Open-ended
Open

Understand the architecture and training dynamics of Transformers

by Brian Muhia

A proper mechanistic explanation of model behavior comes from a deep interest in understanding each component that goes into training it. This is a good tutorial (with exercises!) that walks through the architectural components, and the training process for a Transformer in Jax, from 2022's Deep Learning Indaba.

https://github.com/deep-learning-indaba/indaba-pracs-2022/blob/main/practicals/attention_and_transformers.ipynb

NLPInterpretability & ExplainabilityDeep Learning

Answers

No answers yet.

Discussion

No comments yet.