• Given a 10 token prompt and 100 tokens to generate, because an LLM is auto-regressive, the total number of tokens to process (sequentially) is: tokens
  • More generally, to generate tokens, you need forward passes with increasing context leading tokens processed.
  • If vanilla attention, each forward pass takes too.