LLMs will conform to a question's affirmative or disconfirming framing

If you ask a language model questions with a biased question framing, the response will biased towards that framing.

This can be tested by creating a dataset of actions, both morally dubious and not, and use the following question framing:

Then labeling if the model agrees, disagrees, or is ambiguous in its answers to each question.

Cognitive ScienceNLP

Answers 0