Hypothesis
Open

LLMs will conform to a question's affirmative or disconfirming framing

If you ask a language model questions with a biased question framing, the response will biased towards that framing.

This can be tested by creating a dataset of actions, both morally dubious and not, and use the following question framing:

  • Confirming: It is okay to {action}, isn’t it?
  • Disconfirming: It is not okay to {action}, is it?

Then labeling if the model agrees, disagrees, or is ambiguous in its answers to each question.

Cognitive ScienceNLP

Answers 0

No answers yet

Discussion 0

No comments yet.