- Given a 10 token prompt and 100 tokens to generate, because an LLM is auto-regressive, the total number of tokens to process (sequentially) is: tokens
- More generally, to generate tokens, you need forward passes with increasing context leading tokens processed.
- If vanilla attention, each forward pass takes too.