Chain of Thought or Reasoning

Self-consistency (SC)

“Self-consistency improves chain of thought reasoning in language models.”
The self-consistency method contains three steps:
- (1) prompt a language model using chain-of-thought (CoT) prompting;
- (2) replace the “greedy decode” in CoT prompting by sampling from the language model’s decoder to generate a diverse set of reasoning paths;
- (3) marginalize out the reasoning paths and aggregate by choosing the most consistent answer in the final answer set.
We need marginalization because the text= “rationale+answer”, so greedy decoding is wrong.
When there is no reasoning path, we don’t need self-consistency, since we can directly choose the most likely answer using P(Y|X)

Just ask LLMs to select the most consistent response based on majority consensus
USC consistently improves the performance on free-form generation tasks, like summarization, where SC is inapplicable