• https://arxiv.org/pdf/2402.07625

  • novel strategy that leverages base language models for autonomous data selection

    • meta-prompted language models as zero-shot verifiers
  • Subsequent data selection leads to high downstream performance

  • Assigns real-valued scores to training data

    • inspired by DPO
    • operates on the logits associated with β€˜YES’ and β€˜NO’ responses to meta-prompts
  • The meta prompt can be composed of multiple questions

    • Thus, the complete score is defined as:

Math Data Selection

  • Using Qwen-72B model