openai_api_base: Optional[str] = os.getenv("OPENAI_API_BASE") if os.getenv("OPENAI_API_BASE") else openai.base_url,
check OpenAI server builder
look at
Evaluating a model
-
If hugging face model, we can use
alpaca_eval evaluate_from_model
to directly compute all the 805 examples from alpaca eval -
Given it’s preference-based, there needs a second reference output, by default, it’s gpt4_turbo outputs, but later, it would be possible to check different versions of the same model.
-
Here are the configs of llama-2-7b-chat https://github.com/tatsu-lab/alpaca_eval/tree/9752edf6ffcea3293a788c9ce2ddc7fd4a30b287/src/alpaca_eval/models_configs/llama-2-7b-chat-hf
-
samples outputs are in
alpaca_eval/src/alpaca_eval/models_configs/llama-2-7b-chat-hf
Making a new evaluator or specify an evaluator
-
ll you need is to make a new
configs.yaml
configuration file, which you will then pass as--annotators_config <path_to_config.yaml>
toalpaca_eval
. -
create at
src/alpaca_eval/evaluators_configs/autoj-13b
-
Default annotators_config is
weighted_alpaca_eval_gpt4_turbo
(https://github.com/tatsu-lab/alpaca_eval/blob/9752edf6ffcea3293a788c9ce2ddc7fd4a30b287/src/alpaca_eval/evaluators_configs/weighted_alpaca_eval_gpt4_turbo/configs.yaml#L4) -
Default prompt template
-
Auto-J uses llama-2 chat template
-
Example.
PROMPT_INPUT_SYSTEM: str = '[INST]<<SYS>>\n{system_message}\n<</SYS>>\n\n{input} [/INST]'
-
how to specify which server to point at ⇒
client_configs
-
if everything fails, can always spin up vllm openai server. look at mistral large evaluator_configs to see .
Commands
command:
alpaca_eval --model_outputs alpaca_eval/results/mistral-medium/model_outputs.json --annotators_config autoj-13b/configs.yaml --output_path /scratch/hbenoit/swiss-ai/alpaca_eval_results
lm_eval —model hf —model_args pretrained=/scratch/hbenoit/swiss-ai/downloads/llama-slimpajama6b-final/llama-medium/11000 —tasks hellaswag,arc_easy,arc_challenge —batch_size auto:4 —output_path results/llama2-7b-chat-hf.json —show_config