🤖 Harold's Notes

Search

❯

❯

❯

❯

❯

❯

QNN (Qualcomm)

Nov 26, 20241 min read

https://github.com/pytorch/executorch/blob/main/examples/demo-apps/android/LlamaDemo/docs/delegates/qualcomm_README.md
If the model is really big, it may require model sharding because the Qualcomm DSP is a 32bit system and has a 4GB size limit .
- For example for Llama 3 8B models, we need to shard the model into 4, but ExecuTorch still packages it into one PTE file.

Passes or transformation

See Preprocessing for definition
Overview of all passes

Graph View

Backlinks

Workflow

Created with Quartz v4.2.3 © 2024