AI Safety Ideas

Ideas

Open-ended ▲ 0 Open

Epistemic RL

A hypothesis about quantifying knowledge agents have about environments, and other agents. Could we develop a hueristic about measuring intelligence in RL environments? - 2019 On the Measure of Intelligence F. Chollet https://arxiv.org/abs/1911.01547

Open-ended ▲ 3 Open

Which values do we align AI to?

How can we ensure that artificial intelligence follows the values of people from across the world and that the benefits of the technology are shared internationally? How should AI aggregate these preferences? How can it work around known impossibility results in social choice theory (Arrow 1950)? Would developing AI for public administrations be an area where it is possible to explore which values to align AI to? Potential areas of application could be [regulation]{https://doi.org/10.1016/S1573-448X(06)03027-5} and determining [taxation]{https://maxkasy.github.io/home/files/papers/adaptive_social_welfare.pdf}. Some suggested steps: Step 1: Have a look at existing [proposals]{https://www.brookings.edu/research/aligned-with-whom-direct-and-social-goals-for-ai-systems/}. Step 2: Characterise stakeholders and how they might be affected by AI Step 3: How can values be measured, and used as reward? Step 4: How can values be aggregated? This includes the question how it can be insured that individuals in countries other than the one developing AI benefit as well.

Open-ended ▲ 1 Open

Hold those who deploy AI capabilities accountable to a diverse forum of stakeholders

Is it possible for a representative sample of society to more directly inform the direction of development of AI capabilities? The task here involves two steps: - Brainstorm a set of structures which could make diverse stakeholder input more feasible. - One possible example could be similar to a council who are compensated to engage with these questions [e.g. France's Citizen councils](https://www.thelocal.fr/20220906/explained-what-are-frances-citizen-councils) - Consider whether some structures are more vulnerable to manipulation than others and whether they complement other approaches to reduce the risks from AI development

Open-ended ▲ 1 Open

Create a vision for a positive future with AGI

The [Existential Hope Project](https://www.existentialhope.com/about) from Foresight Institute tries to reframe some of the perspectives we have on existential risk towards a positive and optimistic framing, something I personally find very productive. You can spend 1:30 hours with a friend to combine three technologies with three social visions for the future along with an art piece that represents this and [submit it on their website](https://www.existentialhope.com/xhope-scenario-contribution-form). Check out some of [the existing submissions](https://www.existentialhope.com/gallery).

Open-ended ▲ 1 Open

Explore the ChatGPT and BingAI jailbreaking community and ecosystem

Check out [jailbreakchat.com](https://www.jailbreakchat.com/) and review which general patterns emerge for what makes the different jailbreaks work and where they're from. This is an opportunity to map out how the general jailbreaking ecosystem has emerged. There's many thousands of people sharing their use of ChatGPT and BingAI on [Facebook groups](https://www.facebook.com/groups/aicomunity) and [Subreddits](https://www.reddit.com/r/ChatGPT/).

Open-ended ▲ 1 Open

Continue the Alignment Timelines project

The [Safety Timelines](https://forum.effectivealtruism.org/posts/9iGFjYnRquxiy29jm/safety-timelines-how-long-will-it-take-to-solve-alignment) project was an earlier agenda for Apart Research on estimating when we will have solved alignment. The first next steps are already taken and you're very welcome to continue the work. Contact [Esben](mailto:esben@apartresearch.com) to get more information and links to relevant research data and literature. See [the in-progress Google Docs for the forum post](https://docs.google.com/document/d/1fwi8lAyjdj9t-mENy_kSmnN303rjDmzpO9R9WsNY1wo/edit?usp=sharing).

Open-ended ▲ 1 Open

Review AI safety organizations and agendas

Write an updated review of organizations and agendas in AI safety to build up your foundation for technical research directions to focus on. Read one of the best reviews [here](https://www.lesswrong.com/posts/QBAjndPuFbhEXKcCr/my-understanding-of-what-everyone-in-technical-alignment-is).

Open-ended ▲ 1 Open

Write up reactionary plans in cases of extreme AI risk

One thing that I'm disappointed doesn't already exist (according to my chats with governance leadership in some organizations) is major reactionary plans in cases of severe risk pathways. An example might be: * What should OpenAI explicitly do if the United States government tries to coerce them to become a Manhatten project? And conditional on what, e.g. war with China? Delete all models, data and code and create a public anti-war statement to stay on the good side of the conversation or stay the best friends with the US government and comply completely?

Open-ended ▲ 1 Open

Plan out an event for AI safety in your local area

You can write up a detailed plan for an event and coordinate with other people at the [Thinkathon](https://think-ais.devpost.com/) to make it happen as best as possible. This can include a planning document, venue decisions, funding application, partnerships, speakers and participant information sheet.

Open-ended ▲ 1 Open

Create a 3-5 year plan for yourself or your organization

The field of artificial intelligence is a high-variance place at the moment. If you are running a project or planning to enter AI safety, it can give you a sense of direction to create a plan for the next 3-5 years and how you can have an impact on the field. These plans generally also include the extrapolation into a 1 year plan, 6 month plan, 1 month plan and a week plan. Get ready for some action!

Open-ended ▲ 1 Open

Map out the talent pools of AI safety

This strategy project will focus on figuring out where the talent pools of AI safety researchers lie and which match the best with what the research labs need. Originally a project between Jona Glade and Esben Kran.

Open-ended ▲ 1 Open

Enhancing AI safety ecosystem via debates

Regular debates/adversarial collaborations between alignment researchers who disagree on particular topics. Something like [MIRI 2021 conversations](https://www.lesswrong.com/s/n945eovrA3oDueqtq) but in audio (+video) format and open to people's suggestions about who should we "pitch against" whom and what topic we'd like them to discuss. Spencer Greenberg's podcast has several episodes that can serve as an example: [1](https://podcast.clearerthinking.org/episode/085/amber-dawn-and-holly-elmore-the-clash-between-social-justice-and-anti-wokeness/) [2](https://podcast.clearerthinking.org/episode/114/will-eden-and-sam-rosen-guess-culture-vs-ask-culture/) [3](https://podcast.clearerthinking.org/episode/118/michael-nielsen-and-ajeya-cotra-critiquing-effective-altruism/).

Open-ended ▲ 1 Open

How much can we scale up production in the compute supply chain?

How quickly we will be able to increase the production of hardware and the required computing infrastructure (such as data centers) as we increase the investment in compute for AI?. To what extent could ASML, TSMC, and other suppliers across the compute supply chainscale up their production on a 1/5/10 year timescale if the prices of their products went up dramatically (eg x10)? What are the key bottlenecks for such a massive scale-up across the compute supply chain?

Open-ended ▲ 1 Open

Investigating the parameter gap!

In a previous investigation, [Villalobos et al](https://epochai.org/blog/machine-learning-model-sizes-and-the-parameter-gap) identified a “parameter gap” – that is, a surprising lack of notable ML models with sizes between 1e10 and 1e11 parameters. Investigate whether the proposed hypotheses in the paper are accurate or identify new hypotheses to explain the data.

Open-ended ▲ 1 Open

Investigate trends in memory bandwidth, latency and price of memory

Memory is a big challenge for modern deep learning, required for storing things like parameter values and intermediate gradient computations ([Weng, 2021](https://lilianweng.github.io/posts/2021-09-25-train-large/)). How have memory bandwidth, latency, and the price of memory changed over time?

Open-ended ▲ 1 Open

Insights-based models of AI timelines

The Median Group previously proposed a model of [AI timelines based on key “insights”](http://mediangroup.org/insights) required on the way to AGI development. However, the current model is based on outdated and poorly curated data, and there are some questionable methodological choices. Collect data that is more up-to-date, and redo the model – how do your results compare to more well-known timelines models?

Open-ended ▲ 1 Open

Rethinking the evolutionary anchor

In Forecasting TAI with biological anchors, Ajeya Cotra proposes the “evolutionary anchor” as a hypothesis for how the compute needed to train generally intelligent systems, based on “the total FLOP performed over the course of evolution, since the first neurons” ([Cotra, 2020](https://www.lesswrong.com/posts/KrJfoZzpSDpnrv9va/draft-report-on-ai-timelines)). But there have been some concerns about whether this definition is appropriate – it does not account for the compute for simulating the environment ([Sempere, 2022](https://forum.effectivealtruism.org/posts/FHTyixYNnGaQfEexH/a-concern-about-the-evolutionary-anchor-of-ajeya-cotra-s), and anthropic considerations might prove highly important ([Erdil, 2022](https://www.lesswrong.com/posts/NHvspuLiirJwiLtfg/do-anthropic-considerations-undercut-the-evolution-anchor)). Assess the significance of these concerns, and reassess the viability of the current definition of the anchor.

Open-ended ▲ 1 Open

Brain emulation development

Anders Sandberg looked into a Monte Carlo model of brain emulation development ([Sandberg, 2014](http://www.aleph.se/papers/Monte%20Carlo%20model%20of%20brain%20emulation%20development.pdf)). However, this paper is now old and has outdated estimates. Replicate the methodology of this paper – what are the new results?

Open-ended ▲ 1 Open

Profiler to measure compute

Compute is one of the key inputs in machine learning, very predictive of performance and relatively easy to measure. However, compute usage typically isn’t reported even in top journal articles. Part of the reason for this is the lack of good profiling tools in GPUs and/or machine learning frameworks. The task is thus to implement an open-source solution into a framework like PyTorch. This could help shift the community's norms towards more transparent reporting, which in turn would create a lever for AI governance interventions. Lennart Heim has an extensive draft on this issue he would be happy to share on request.

Open-ended ▲ 1 Open

AI development vignettes

Write down qualitative and concrete stories about AI development, exploring the possible risks and societal consequences. The emphasis here should be on detail, and you should take potential hardware, algorithmic, and data constraints into account (e.g. what happens if Moore’s law ends in a few years?).

Open-ended ▲ 1 Open

Study training run lengths

Epoch worked out a theoretical upper bound to [training run clock length](https://epochai.org/blog/the-longest-training-run) of 14-15 months. Empirically investigate trends in training run lengths, and see how it compares to this theoretical upper bound – what are the reasons for the discrepancies? This would require building a dataset of training run lengths.

Open-ended ▲ 1 Open

Paradigm changes in AI

What were the major paradigm shifts in different domains of AI? By talking to domain experts, reading lit reviews and popular papers, discern what methods were popular at each point in time and compile a list of these domain-specific paradigm shifts. Such a list allows us to use [Laplace’s rule](https://www.lesswrong.com/posts/wE7SK8w8AixqknArs/a-time-invariant-version-of-laplace-s-rule) to estimate a base rate of paradigm changes in AI.

Open-ended ▲ 1 Open

Algorithmic breakthroughs in machine learning history

What were the major algorithmic innovations in machine learning over the last two decades? This could be structured as a literature review or as a survey of experts, culminating in a big list of the key algorithmic advances over the last ~20 years. Such a database helps us understand the frequency and significance of algorithmic insights.

Open-ended ▲ 1 Open

Revisiting ‘Is AI Progress Impossible To Predict?’

Alyssa Vance argued that AI progress on a task from one model to the next was unpredictable ([Vance, 2022](https://www.lesswrong.com/posts/G993PFTwqqdQv4eTg/is-ai-progress-impossible-to-predict)). Can we investigate this in more detail? For instance, the authors of Beyond the Imitation Game (Big Bench) find that for tasks where progress is “jumpy”, there are usually progress metrics that vary more smoothly ([Srivastava, 2022](https://arxiv.org/abs/2206.04615)). Can we use those metrics to predict progress?

← Prev Next →