Open-ended
Open

Cross-lingual generalizability of LLM evals

Steps:

  • take existing LLM evals (e.g., for a specific dangerous capability)
  • auto-translate the eval dataset ( make sure to sanity-check the translations)
  • run the same eval(s) in different languages using a multilingual model
  • compare outcomes across languages
  • repeat testing and analysis for different LLMs

Are the results comparable? Do the outcomes scale similarly with model size in all languages?

NLPDeep LearningAI Governance

Answers 0

No answers yet

Discussion 0

No comments yet.