🤖 Harold's Notes

Search

❯

❯

❯

❯

Slowness of vanilla inference

Slowness of vanilla inference

Jul 03, 20241 min read

Given a 10 token prompt and 100 tokens to generate, because an LLM is auto-regressive, the total number of tokens to process (sequentially) is: $10 + 11 + ... + 109 = 5950$ tokens
More generally, to generate $n$ tokens, you need $n$ forward passes with increasing context leading $O (n^{2})$ tokens processed.
If vanilla attention, each forward pass takes $O (n^{2})$ too.

Graph View

Backlinks

No backlinks found

Created with Quartz v4.2.3 © 2024