AI Safety Ideas
Open-ended
Open

Extend the causal tracing work from the ROME paper

by Esben Kran

Can you refine their technique to find the specific heads (and maybe specific neurons) that recall the fact? Can we improve their technique by using resampling from a random input instead of gaussian noise to create corrupted activations?

The ROME paper originally traces where facts are stored in a language model using this tracing method. Read more here and read their followup work on editing factual associations.

Deep LearningInterpretability & Explainability

Answers

No answers yet.

Discussion

No comments yet.