-
If the model is really big, it may require model sharding because the Qualcomm DSP is a 32bit system and has a 4GB size limit .
- For example for Llama 3 8B models, we need to shard the model into 4, but ExecuTorch still packages it into one PTE file.
Passes or transformation
-
See Preprocessing for definition