🤖 Harold's Notes

Search

❯

❯

❯

❯

❯

❯

HELM-instruct

Jul 03, 20241 min read

https://github.com/stanford-crfm/helm/blob/main/src/helm/benchmark/static/schema_instruction_following.yaml

https://github.com/YannDubs/RubricEval/tree/main/helm_instruct

https://github.com/stanford-crfm/helm/blob/main/src/helm/benchmark/metrics/instruction_following_critique_metrics.py#L154

https://github.com/stanford-crfm/helm/blob/3b74e213c4faf1302457e164c0a0ce81bc776c2f/src/helm/benchmark/run_specs/instruction_following_run_specs.py#L17

Code structure

https://crfm-helm.readthedocs.io/en/latest/code/
According to Efficient Benchmarking (of Language Models) a paper from IBM Research, which systematically analysed benchmark design choices using the HELM benchmark as an example, one can run the HELM benchmark with a fraction of the examples and still get a reliable estimation of a full run (Perlitz et al., 2023

Graph View

Backlinks

No backlinks found

Created with Quartz v4.2.3 © 2025