Open-ended

Open

Cross-lingual generalizability of LLM evals

by Jason Hoelscher-Obermaier

Steps:

take existing LLM evals (e.g., for a specific dangerous capability)
auto-translate the eval dataset ( make sure to sanity-check the translations)
run the same eval(s) in different languages using a multilingual model
compare outcomes across languages
repeat testing and analysis for different LLMs

Are the results comparable? Do the outcomes scale similarly with model size in all languages?

NLPDeep LearningAI Governance

Answers 0

No answers yet

Discussion 0

No comments yet.