AlpacaEval

https://github.com/tatsu-lab/alpaca_eval/blob/9752edf6ffcea3293a788c9ce2ddc7fd4a30b287/src/alpaca_eval/decoders/openai.py#L190

openai_api_base: Optional[str] = os.getenv("OPENAI_API_BASE") if os.getenv("OPENAI_API_BASE") else openai.base_url,

check OpenAI server builder

look at

Evaluating a model

If hugging face model, we can use alpaca_eval evaluate_from_model to directly compute all the 805 examples from alpaca eval
Given it’s preference-based, there needs a second reference output, by default, it’s gpt4_turbo outputs, but later, it would be possible to check different versions of the same model.
Here are the configs of llama-2-7b-chat https://github.com/tatsu-lab/alpaca_eval/tree/9752edf6ffcea3293a788c9ce2ddc7fd4a30b287/src/alpaca_eval/models_configs/llama-2-7b-chat-hf
samples outputs are in alpaca_eval/src/alpaca_eval/models_configs/llama-2-7b-chat-hf

Making a new evaluator or specify an evaluator

ll you need is to make a new configs.yaml configuration file, which you will then pass as --annotators_config <path_to_config.yaml> to alpaca_eval.
create at src/alpaca_eval/evaluators_configs/autoj-13b
Default annotators_config is weighted_alpaca_eval_gpt4_turbo (https://github.com/tatsu-lab/alpaca_eval/blob/9752edf6ffcea3293a788c9ce2ddc7fd4a30b287/src/alpaca_eval/evaluators_configs/weighted_alpaca_eval_gpt4_turbo/configs.yaml#L4)
Default prompt template
- https://github.com/tatsu-lab/alpaca_eval/blob/9752edf6ffcea3293a788c9ce2ddc7fd4a30b287/src/alpaca_eval/evaluators_configs/alpaca_eval_clf_gpt4_turbo/alpaca_eval_clf.txt#L4
- uses chatML template
Auto-J uses llama-2 chat template
Example.
- PROMPT_INPUT_SYSTEM: str = '[INST]<<SYS>>\n{system_message}\n<</SYS>>\n\n{input} [/INST]'
how to specify which server to point at ⇒ client_configs
if everything fails, can always spin up vllm openai server. look at mistral large evaluator_configs to see .

Commands

command: alpaca_eval --model_outputs alpaca_eval/results/mistral-medium/model_outputs.json --annotators_config autoj-13b/configs.yaml --output_path /scratch/hbenoit/swiss-ai/alpaca_eval_results

lm_eval —model hf —model_args pretrained=/scratch/hbenoit/swiss-ai/downloads/llama-slimpajama6b-final/llama-medium/11000 —tasks hellaswag,arc_easy,arc_challenge —batch_size auto:4 —output_path results/llama2-7b-chat-hf.json —show_config

🤖 Harold's Notes

Explorer

AlpacaEval

Evaluating a model

Making a new evaluator or specify an evaluator

Commands

Graph View

Table of Contents

Backlinks