Understand the architecture and training dynamics of Transformers
by Brian Muhia
A proper mechanistic explanation of model behavior comes from a deep interest in understanding each component that goes into training it. This is a good tutorial (with exercises!) that walks through the architectural components, and the training process for a Transformer in Jax, from 2022's Deep Learning Indaba.
NLPInterpretability & ExplainabilityDeep Learning