Welcome

The first metaevaluation result, BenchRisk-ChatBot-v1.0, has been released to general availability. Those that may rely on its representations about the reliability of different benchmarks are advised to read the associated research paper.
avatarLearn more about this work