Top-k sampling

  • Just consider the top k probabilities for sampling

Temperature

  • Let be the temperature. Then, . If , this makes the resulting softmax distribution more peaked. Inversely, it will become more uniform. Increasing the temperature makes the model more “creative” but probably less accurate.

In-context learning

  • provide the model with few-shot demonstrations as part of the input

Chain of thoughts (CoT)

  • CoT, introduced by Wei et al. (2023), enables an LLM to condition its generation on its intermediate reasoning steps when answering multi-step problems, thereby augmenting the LLM’s reasoning ability on complex problems such as math word problems.
  • This technique is really useful for improving performance on complicated reasoning tasks that would require humans to spend more than one second solving. Tokens can have very different information density, so give language models time to think.
  • Few-shot CoT: chain of thought prompting, where a few chain of thought demonstrations are provided as exemplars in prompting.
  • Zero-shot CoT: just add “let’s think step-by-step” to the prompt.

Self-consistency CoT (SC-CoT):

Wang et al. (2023) found that sampling multiple reasoning traces and answers from the model and selecting the final answer through majority voting can significantly improve large language model performance on multiple-choice question-answering benchmarks. Meditron applies SC-CoT prompting using a decoding temperature of 0.8, sample 5 generations, extract the answer options from each generation, and use majority voting to select the final prediction.