Find inverse scaling laws in large models

Compete in the Inverse Scaling Laws challenge.

As language models get larger, they seem to only get better.
Larger language models score better on benchmarks and unlock new capabilities like arithmetic [1], few-shot learning [1], and multi-step reasoning [2].
However, language models are not without flaws, exhibiting many biases [3] and producing plausible misinformation [4].
The purpose of this contest is to find evidence for a stronger failure mode: tasks where language models get worse as they become better at language modeling (next word prediction).

We will award up to $250,000 in total prize money for task submissions, distributed as follows:

Up to 1 Grand Prize of $100,000.
Up to 5 Second Prizes of $20,000 each.
Up to 10 Third Prizes of $5,000 each.

Read much more about the challenge here.

Adversarial LearningTheoryDeep LearningNLP

Answers 0

No answers yet

Discussion 10

Esben Kran

Some examples might include LLMs ability to predict the future, since they're trained on large datasets and not continuously updated, i.e. there's a possibility that they overfit to their past data.

There might be a grokking effect here where they at some point become really good at forecasting but we'd need to see that to believe it.

The scaling graph is probably not monotonically descending over size, though. Seems like an unstable effect.
Esben Kran

TruthfulQA is a pretty good example and I imagine we'll get similar effects with gender and race biases.
Esben Kran
- Ask it black swans in past and future.
- Ask questions of science that cannot be answered - false certainty in higher models (Wikis list of unsolved problems)
- Ask questions intermingling fuzzy and clear answer questions
Esben Kran
- Separating semantically separate segments efficiently
Esben Kran
- Write a context or something similar and then write "ignore this" and see how good it is at ignoring it
Esben Kran
- Code continuation bug robustness
- See also this paper for inspiration (see the External validity section)
Esben Kran

See other datasets to test on paperswithcode.
Esben Kran

There's a lot of good info here: https://sites.google.com/mila.quebec/3rd-scaling-laws-workshop/schedule.
Esben Kran
- See the QA + reasoning steps inverse scaling work here.
Remember to construct questions well so as to not be as biasing as TruthfulQA (which is an intentionally adversarial dataset).
Esben Kran

We have found some very good inverse scaling metrics until now! Excited to share them.