https://github.com/YannDubs/RubricEval/tree/main/helm_instruct
Code structure
-
According to Efficient Benchmarking (of Language Models) a paper from IBM Research, which systematically analysed benchmark design choices using the HELM benchmark as an example, one can run the HELM benchmark with a fraction of the examples and still get a reliable estimation of a full run (Perlitz et al., 2023