Hypothesis
Open

Fine-tuning is just rewiring and upweighting vs downweighting circuits that already exist, rather than building new circuits.

E.g, finetune GPT-2 Small on Wikipedia. Compare the model's internal activations before and after, compare attention patterns, etc. 

What happens when you fine-tune a model?

How does model performance change on other text? Are specific circuits harmed or is worse across the board?

Hypothesis: Fine-tuning is just rewiring and upweighting vs downweighting circuits that already exist, rather than building new circuits.

  • A similar hard problem is examining what happens with chain of thought prompting. That, though, is really hard because chain of thought prompting only happens in GPT-3+ sized models.
Interpretability & Explainability

Answers 0

No answers yet

Discussion 0

No comments yet.