-
https://pytorch.org/executorch/stable/tutorials/export-to-executorch-tutorial.html
-
The torch.export() process operates on a graph that is agnostic to the edge device where the code is ultimately executed.
-
During the edge compilation step, we work on representations that are Edge specific.
Edge Dialect.
-
to_edge
-
DType specialization
-
Scalar to tensor conversion
-
Converting all ops to the
executorch.exir.dialects.edge
namespace. -
Note that this dialect is still backend (or target) agnostic.
-
edge_program: EdgeProgramManager = to_edge([aten_dialect]
Backend Dialect
to_backend
- With the Edge dialect, there are two target-aware ways to further lower the graph to the Backend Dialect. At this point, delegates for specific hardware can perform many operations. For example, Core ML on iOS, QNN on Qualcomm, or TOSA on Arm can rewrite the graph. The options at this level are:
-
Backend Delegate. The entry point to compile the graph (either full or partial) to a specific backend.
- The compiled graph is swapped with the semantically equivalent graph during this transformation.
- The compiled graph will be offloaded to the backend (aka
delegated
) later during the runtime for improved performance.
-
User-defined passes. Target-specific transforms can also be performed by the user. Good examples of this are kernel fusion, async behavior, memory layout conversion, and others.
-
Compile to ExecuTorch program
to_executorch
-
The Edge program above is good for compilation, but not suitable for the runtime environment. On-device deployment engineers can lower the graph that can be efficiently loaded and executed by the runtime.
-
On most Edge environments, dynamic memory allocation/freeing has significant performance and power overhead. It can be avoided using AOT memory planning, and a static execution graph.
- The ExecuTorch runtime is static (in the sense of graph representation, but control flow and dynamic shapes are still supported). To avoid output creation and return, all functional operator representations are converted to out variants (outputs passed as arguments).
- Optionally, users can apply their own memory planning algorithms. For example, there can be specific layers of memory hierarchy for an embedded system. Users can have their customized memory planning to that memory hierarchy.
-
The program is emitted to the format that the ExecuTorch runtime can recognize.
- The emitted program can be serialized to flatbuffer format.