Speculative decoding
Speculative decoding is a sampling method for accelerating text generation. It accelerates the process by employing a smaller language model to produce candidate text samples. These candidates are evaluated by a larger model, and only approved text is accepted.

Speculative decoding is typically used to:
- Enhance diversity in output.
- Reduce repetition.
- Improve quality and contextuality.
- Explore various ideas.
- Adapt to different interpretations.
- Mitigate bias.
- Enhance user experience.
- Choose the best response.