Open-ended
Investigate relationship between double descent and grokking
What is the relationship between double descent and grokking?
- Double descent seems to be caused by polysemanticity phase transitions while grokking seems like a general effect of task learning.
As we see a slight decrease in performance over a few epochs which then converge to an even lower equillibrium, indicating a new level of hyperdimensional encoding
See example.
Interpretability & Explainability