-
novel strategy that leverages base language models for autonomous data selection
- meta-prompted language models as zero-shot verifiers
-
Subsequent data selection leads to high downstream performance
-
Assigns real-valued scores to training data
- inspired by DPO
- operates on the logits associated with βYESβ and βNOβ responses to meta-prompts
-
The meta prompt can be composed of multiple questions
- Thus, the complete score is defined as:
- Thus, the complete score is defined as:
Math Data Selection
- Using Qwen-72B model