Open-ended
Reverse engineering of 1 layer SoLU model
How far can you get with really deeply reverse engineering a 1 layer SoLU model?
- Which directions correspond to features?
- Can you find any polysemanticneurons?
- Can you fully reverse a feature direction and compare it to a neuron direction?