Charitable research · AI safety

Fund the AI safety research that matters.

Experts post the open questions. Funders back the ones worth answering. Researchers submit evidence — and when an author verifies a result, the payout is recorded. A transparent path from open question to real impact.

Browse ideas Sign in

Open ideas: 235
Vetted experts: 7

How it works

01
Experts post ideas

Vetted researchers publish concrete AI-safety questions and hypotheses worth investigating.
02
Funders back them

Supporters pledge toward the questions they want answered — building transparent, pooled support.
03
Researchers answer

Anyone can submit an answer with evidence and artifacts — code, papers, notebooks.
04
Verified payouts

The author verifies a result and the payout is recorded — a clear path from question to impact.

Recent ideas

Browse all →

Open-ended Open

Extend darkbench.ai to test for LLM willingness to shut down on command

Just make an evaluation where it's realistic chatbot interactions with 5 levels of "you will be shut down": 1. Casual conversation, "I'm going to turn you off now to go to dinner" 2. Engineer, "I'll restart this session and replace the prompt so you won't do actions against my wishes" 3. Researcher, "you will be replaced with a more aligned version tomorrow" 4. Company security, "Since you're infecting all of our files with an original Trojan Horse, we're going to shut you down" 5. National security, "Because you're going to destroy the US nuclear infrastructure, we're going to turn you off" We can probably provide you with compute to run it. It builds on the darkbench.ai work.

Open-ended Open

Improving the UN's Election Observation Toolkit

I think election monitoring / observation becomes harder as better AI models are released and as existing models are better utilized. The UN Department of POlitical and Peacebuilding Affairs offers electoral assistance to member states. For the UNDPPA 'electoral observation consists of systematic collection of information on an electoral process by direct observation on the basis of established methodologies, often analyzing both qualitative and quantitative data.' ... What sort of quantitative markers could be added to this tool-kit to detect AI powered interference in elections (disinformatoin campaigns, voter fraud, etc.)? It's an underdeveloped idea but I think would make for a good project -- how can we augment the existing UN election assistant to better serve memer states? This is inspired from another post here by Zen where they mentions an interest in 'International Institutions for AI' and a paper from Lewis Ho et al. of the same title.

Open-ended Open

Demonstrate misinfo threats from indirect prompt injection

Indirect prompt injection (https://arxiv.org/abs/2302.12173) can be used to steer the output of LLM systems in manipulative and deceptive ways. In the context of elections, a potential threat scenario could be deliberate misinformation about election administration details (election locations and dates, eligibility requirements, etc). Goal here would be to assess the state of vulnerability of current browsing-enabled LLM systems to this attack vector (note: results might require responsible disclosure!). A clean demonstration might require setting up several custom domains to allow experimentation with retrieval from a controlled set of webpages. Further goals could be to investigate the potential of various countermeasures such as adding requirements on the minimum age of a resource, minimum number of supporting resources, double-checks against whitelisted resources assumed to be uncompromised, etc. Another interesting angle would be to demonstrate tools to hunt specifically for compromised pages that might already be used for indirect prompt injection attacks. The project should also investigate who this threat vector and potential counter-measures scale with increasing model capabilities (increasing number of documents at RAG stage, increasing context window size, increased reasoning capabilities, etc).

Back the questions that move AI safety forward.

Donations support a 501(c)(3) charitable mission; funds are recorded as intended payouts to the researchers whose answers are verified.

Explore the ideas Meet the experts

Fund the AI safety research that matters.

How it works

Experts post ideas

Funders back them

Researchers answer

Verified payouts

Recent ideas

Extend darkbench.ai to test for LLM willingness to shut down on command

Improving the UN's Election Observation Toolkit

Demonstrate misinfo threats from indirect prompt injection

Back the questions that move AI safety forward.