Understand the architecture and training dynamics of Transformers

A proper mechanistic explanation of model behavior comes from a deep interest in understanding each component that goes into training it. This is a good tutorial (with exercises!) that walks through the architectural components, and the training process for a Transformer in Jax, from 2022's Deep Learning Indaba.

https://github.com/deep-learning-indaba/indaba-pracs-2022/blob/main/practicals/attention_and_transformers.ipynb

NLPInterpretability & ExplainabilityDeep Learning

Understand the architecture and training dynamics of Transformers

Answers 0

Discussion 0