High-level (export, transformation, and compilation)
-
Export the model
- capture the pytorch program as a graph
-
Compile the exported model to an ExecuTorch program
- Given an exported model from step 1, convert it to an executable format called an ExecuTorch program that the runtime can use for inference.
- entry point for various optimizations
- quantization
- further compiling subgraphs down to on-device specialized hardware accelerators to improve latency.
- memory planning, i.e. plan the location of intermediate tensors to reducememory footprint.
-
Run the ExecuTorch program on a target device.
- input → output (nothing eager, execution plan already calculated in step 1 and 2)
Architectural Components
Program preparation
-
leverage pytorch 2 compiler to do AOT (ahead-of-time) torch.export()
-
compile to edge dialect + compile to executorch program Edge Compilation