Modes Mitigations Scores Submit/Improve About

Welcome

The first metaevaluation result, BenchRisk-ChatBot-v1.0, has been released to general availability. Those that may rely on its representations about the reliability of different benchmarks are advised to read the associated research paper.

Learn more about this work

Published on
October 22, 2025
BenchRisk-ChatBot-v1.0
LLM Testing BenchRisk Evaluation Metaevaluation Safety
Announcing the Release of BenchRisk-ChatBot-v1.0 for evaluating the reliability of ChatBot benchmarks published to evidence real world decisions.
Read more →