Ideas
Insights-based models of AI timelines
The Median Group previously proposed a model of [AI timelines based on key “insights”](http://mediangroup.org/insights) required on the way to AGI development. However, the current model is based on outdated and poorly curated data, and there are some questionable methodological choices. Collect data that is more up-to-date, and redo the model – how do your results compare to more well-known timelines models?
Rethinking the evolutionary anchor
In Forecasting TAI with biological anchors, Ajeya Cotra proposes the “evolutionary anchor” as a hypothesis for how the compute needed to train generally intelligent systems, based on “the total FLOP performed over the course of evolution, since the first neurons” ([Cotra, 2020](https://www.lesswrong.com/posts/KrJfoZzpSDpnrv9va/draft-report-on-ai-timelines)). But there have been some concerns about whether this definition is appropriate – it does not account for the compute for simulating the environment ([Sempere, 2022](https://forum.effectivealtruism.org/posts/FHTyixYNnGaQfEexH/a-concern-about-the-evolutionary-anchor-of-ajeya-cotra-s), and anthropic considerations might prove highly important ([Erdil, 2022](https://www.lesswrong.com/posts/NHvspuLiirJwiLtfg/do-anthropic-considerations-undercut-the-evolution-anchor)). Assess the significance of these concerns, and reassess the viability of the current definition of the anchor.
Brain emulation development
Anders Sandberg looked into a Monte Carlo model of brain emulation development ([Sandberg, 2014](http://www.aleph.se/papers/Monte%20Carlo%20model%20of%20brain%20emulation%20development.pdf)). However, this paper is now old and has outdated estimates. Replicate the methodology of this paper – what are the new results?
Profiler to measure compute
Compute is one of the key inputs in machine learning, very predictive of performance and relatively easy to measure. However, compute usage typically isn’t reported even in top journal articles. Part of the reason for this is the lack of good profiling tools in GPUs and/or machine learning frameworks. The task is thus to implement an open-source solution into a framework like PyTorch. This could help shift the community's norms towards more transparent reporting, which in turn would create a lever for AI governance interventions. Lennart Heim has an extensive draft on this issue he would be happy to share on request.
AI development vignettes
Write down qualitative and concrete stories about AI development, exploring the possible risks and societal consequences. The emphasis here should be on detail, and you should take potential hardware, algorithmic, and data constraints into account (e.g. what happens if Moore’s law ends in a few years?).
Study training run lengths
Epoch worked out a theoretical upper bound to [training run clock length](https://epochai.org/blog/the-longest-training-run) of 14-15 months. Empirically investigate trends in training run lengths, and see how it compares to this theoretical upper bound – what are the reasons for the discrepancies? This would require building a dataset of training run lengths.
Paradigm changes in AI
What were the major paradigm shifts in different domains of AI? By talking to domain experts, reading lit reviews and popular papers, discern what methods were popular at each point in time and compile a list of these domain-specific paradigm shifts. Such a list allows us to use [Laplace’s rule](https://www.lesswrong.com/posts/wE7SK8w8AixqknArs/a-time-invariant-version-of-laplace-s-rule) to estimate a base rate of paradigm changes in AI.
Algorithmic breakthroughs in machine learning history
What were the major algorithmic innovations in machine learning over the last two decades? This could be structured as a literature review or as a survey of experts, culminating in a big list of the key algorithmic advances over the last ~20 years. Such a database helps us understand the frequency and significance of algorithmic insights.
Revisiting ‘Is AI Progress Impossible To Predict?’
Alyssa Vance argued that AI progress on a task from one model to the next was unpredictable ([Vance, 2022](https://www.lesswrong.com/posts/G993PFTwqqdQv4eTg/is-ai-progress-impossible-to-predict)). Can we investigate this in more detail? For instance, the authors of Beyond the Imitation Game (Big Bench) find that for tasks where progress is “jumpy”, there are usually progress metrics that vary more smoothly ([Srivastava, 2022](https://arxiv.org/abs/2206.04615)). Can we use those metrics to predict progress?
What has been the share of any chip in a given year of total available compute performance?
New chips are continuously developed, and old chips are subsequently replaced, but not instantly. Ultimately, we want to have a better understanding of the dynamics of how chips are replaced over time. To help with this, construct a database that specifies the following: for each year between 2010 and 2020, what share of the available compute came from which chips?
Improvements due to “software-for-hardware”
Innovations in compilers and other low-level improvements have helped increase the utilisation rate of GPUs and improve training efficiency. Make a list of such improvements and how much did they improve performance overall for tasks such as training a Neural Network.
Do AI researchers train models using scaling laws?
Scaling laws have been proposed as ways to gather information about how to train large machine learning models efficiently ([Kaplan et al., 2020](https://arxiv.org/abs/2001.08361)), and this has been done in practice for training LLMs like Chinchilla ([Hoffmann et al., 2022](https://arxiv.org/abs/2203.15556)). But how broadly have scaling laws been used by AI researchers in general, and has there been a delay in the uptake of such an approach?
Test the bioanchors framework by retrodicting computer vision progress
The [bioanchors framework](https://epochai.org/blog/grokking-bioanchors) is one of the most detailed and widely used AI timelines models. However, many people don't trust the basic approach of using biological anchors to predict AI progress. Computer vision is already at the human- or superhuman-level for some tasks. Could we have predicted its progress by applying the bioanchors methodology, using the human visual cortex as an anchor?
Extrapolating GPT-N performance
Lukas Finnveden previously performed an extrapolation of GPT-N performance on a number of benchmark tasks, such as cloze completion and arithmetic ([Finnveden, 2020](https://www.alignmentforum.org/posts/k2SNji3jXaLGhBeYP/extrapolating-gpt-n-performance)). Can you expand on this methodology and apply it to more cases?
Qualitatively analysing language model / image generation improvements since ~2000
While we can plot graphs showing quantitative changes in language model / image generation performance over time (e.g. in terms of the perplexity), what does this actually mean in terms of model capabilities? Having a collection of samples from language models in the last two decades could help give a visceral sense of how much they have improved. The comparison could include a selection of the best output out of 10 prompts, a comparison of prompt completions, etc.
Sarcasm and more can be measured in text using modern LLMs.
Current state-of-the-art NLP can mostly measure sentiment and simple variables such as word count and bag-of-word measures. With modern LLMs such as text-davinci-003, we are able to create new ways to measure texts. Examples might be: Sarcasm, bias, grammatical errors and domain-specific language use. For AI safety, this can become useful to
London-based MATS clone
"A London-based MATS clone to build the AI safety research ecosystem there, leverage mentors in and around London (e.g., DeepMind, CLR, David Krueger, Aligned AI, Conjecture, etc.), and allow regional specialization. This project should probably only happen once MATS has ironed out the bugs in its beta versions and grown too large for one location (possibly by Winter 2023). Please contact the MATS team before starting something like this to ensure good coordination and to learn from our mistakes."
Trap-Door Environments for MineRL Agents
Proposal A "change everything" button in a MineRL environment that instantly changes the environment through Stable Diffusion or some other fast generative model, to observe the change in learned representations and goal generalization.
Levels of ablation of Transformer heads will gradually activate backup heads.
In [Interpretability in the Wild](https://arxiv.org/abs/2211.00593), the backup name mover heads activate when the name mover heads are ablated. How do we expect backup name mover heads to respond to different amplitudes of ablation on the main name mover head? Two expectations pop up, either they gradually activate or there is a significant phase shift in their behaviour. Also see the work [on backup backup name mover heads](https://itch.io/jam/interpretability/rate/1789630).
Investigate relationship between double descent and grokking
What is the relationship between double descent and grokking? - Double descent seems to be caused by polysemanticity phase transitions while grokking seems like a general effect of task learning. As we see a slight decrease in performance over a few epochs which then converge to an even lower equillibrium, indicating a new level of hyperdimensional encoding See [example](https://transformer-circuits.pub/2022/toy_model/index.html#geometry:~:text=Let%27s%20look%20at%20the%20resulting%20plot%2C%20and%20then%20we%27ll%20try%20to%20figure%20out%20what%20it%27s%20showing%20us%3A).
Investigate circuits: Compare a nL model to a (n+1)L
Look for tasks that an nL model cannot do but a (n+1)L model can - look for a circuit! Proposal: - Build the infrastructure to do this - run two models over a lot of text and look for big log prob differences (maybe floor the log probs at eg 5, to avoid overfitting to times that one network was incredibly wrong)
Circuit investigation: Compare tasks for nL model to a (n+1)L model
Look for tasks that an nL model cannot do but a (n+1)L model can - look for a circuit! Proposal: - Build the infrastructure to do this - run two models over a lot of text and look for big log prob differences (maybe floor the log probs at eg 5, to avoid overfitting to times that one network was incredibly wrong) - Just take a bunch of text with interesting patterns and run the models over it, look for tokens they do really well on, and try to reverse engineer what’s going on - I expect there’s a lot of stuff in here!
Reverse engineering of 1 layer SoLU model
How far can you get with really deeply reverse engineering a 1 layer SoLU model? - Which directions correspond to features? - Can you find any [polysemantic](https://transformer-circuits.pub/2022/toy_model/index.html)neurons? - Can you fully reverse a feature direction and compare it to a neuron direction?
Investigate SoLU lexoscope's neurons
Neel Nanda made the website [lexoscope.io](lexoscope.io) - it shows the text that most activates each neuron in several SoLU language models he trained, including toy SoLU models. Problem ideas could be: - Hunt through it, at look for interesting neurons - can you find weird and abstract ones? - Can you find neuron families? A la [equivariance](https://distill.pub/2020/circuits/equivariance/) - Study a lot of neurons at different layers and look for patterns - what can we say about what the model is doing at different layers? What patterns are there? - Can you find examples of neuron splitting? (A single high-level feature splits into several more specific features as you scale up) - Can you reverse engineer a neuron? Can you find a specific direction in activation space that is exactly that feature, and how aligned is it with the neuron basis? - Can you find any highly non monosemantic features? A task where the entire MLP layer matters, but no one neuron activates much - Or where the pre layernorm activation is low but post-layernorm is high, so the model “smuggles through” directions - Find polysemantic neurons. Try to reverse engineer them - give the model a bunch of text containing each feature and average it/apply PCA. Can you find directions corresponding to each feature? How much do they align with that neuron? - Replicate the part of Conjecture’s Polytopes paper where they look at the top eg 1000 dataset examples for a neuron across a ton of text and look for patterns in that Is it the case that there are monosemantic bands in the neuron act spectrum - Can you find a genuinely monosemantic neuron? Possible idea - look for algorithmic flavoured neurons, eg one whose activation could be minimicking a regex - use this to automatically test that it’s actually doing things