AI Safety Ideas
Open-ended
Open

Classify training data into risk levels and publish segmented pre-training datasets

by Esben Kran

Given a dataset such as The Pile, identify a useful ontology for classifying the dataset into various risk levels so parts of the dataset can be excluded.

To define the ontology, you can use the

Once classification for segments of the dataset has been done, make a public repository with the segmented datasets to make them easy to work with. Publish a paper on the "alignment tax", the price to capability on e.g. MMLU for removing parts of the dataset.

Answers

No answers yet.

Discussion

No comments yet.