🤖 Harold's Notes

Search

❯

❯

❯

❯

Generation algorithms

❯

Lookahead decoding

Lookahead decoding

Aug 03, 20251 min read

https://lmsys.org/blog/2023-11-21-lookahead-decoding/

Lookahead decoding is a new approach to speculative decoding that doesn’t require a draft model. Instead, the model itself is used in two branches:
a lookahead branch, which predicts and extends candidate N-grams (short sequences of N tokens)
- The lookahead branch is similar to the draft model in regular speculative decoding
a verification branch, which verifies the candidates
- the verification branch has the same role as the oracle model.

Limitations of speculative decoding

The maximum speedup that speculative decoding based methods can achieve is limited by the token acceptance rate, or equivalently, how accurately the draft model can predict the main model’s outputs.
Creating an accurate draft model is non-trivial, often requiring extra training and careful tuning in the face of traffic changes over time.

Graph View

Backlinks

No backlinks found

Created with Quartz v4.2.3 © 2025