https://pytorch.org/executorch/stable/tutorials/export-to-executorch-tutorial.html
The torch.export() process operates on a graph that is agnostic to the edge device where the code is ultimately executed.
During the edge compilation step, we work on representations that are Edge specific.

Edge Dialect.

to_edge
DType specialization
Scalar to tensor conversion
Converting all ops to the executorch.exir.dialects.edge namespace.
Note that this dialect is still backend (or target) agnostic.
edge_program: EdgeProgramManager = to_edge([aten_dialect]

Backend Dialect

to_backend

edge_program: EdgeProgramManager = to_edge(aten_dialect)
to_be_lowered_module = edge_program.exported_program()
 
from executorch.exir.backend.backend_api import LoweredBackendModule, to_backend
 
# Import the backend
from executorch.exir.backend.test.backend_with_compiler_demo import (  # noqa
    BackendWithCompilerDemo,
)
 
# Lower the module
lowered_module: LoweredBackendModule = to_backend(
    "BackendWithCompilerDemo", to_be_lowered_module, []
)

With the Edge dialect, there are two target-aware ways to further lower the graph to the Backend Dialect. At this point, delegates for specific hardware can perform many operations. For example, Core ML on iOS, QNN on Qualcomm, or TOSA on Arm can rewrite the graph. The options at this level are:
- Backend Delegate. The entry point to compile the graph (either full or partial) to a specific backend.
  - The compiled graph is swapped with the semantically equivalent graph during this transformation.
  - The compiled graph will be offloaded to the backend (aka delegated) later during the runtime for improved performance.
- User-defined passes. Target-specific transforms can also be performed by the user. Good examples of this are kernel fusion, async behavior, memory layout conversion, and others.

Compile to ExecuTorch program

to_executorch

from executorch.exir import ExecutorchBackendConfig, ExecutorchProgramManager
from executorch.exir.passes import MemoryPlanningPass
 
executorch_program: ExecutorchProgramManager = edge_program.to_executorch(
    ExecutorchBackendConfig(
        passes=[],  # User-defined passes
        memory_planning_pass=MemoryPlanningPass(),  # Default memory planning pass
    )
)

The Edge program above is good for compilation, but not suitable for the runtime environment. On-device deployment engineers can lower the graph that can be efficiently loaded and executed by the runtime.
On most Edge environments, dynamic memory allocation/freeing has significant performance and power overhead. It can be avoided using AOT memory planning, and a static execution graph.
- The ExecuTorch runtime is static (in the sense of graph representation, but control flow and dynamic shapes are still supported). To avoid output creation and return, all functional operator representations are converted to out variants (outputs passed as arguments).
- Optionally, users can apply their own memory planning algorithms. For example, there can be specific layers of memory hierarchy for an embedded system. Users can have their customized memory planning to that memory hierarchy.
The program is emitted to the format that the ExecuTorch runtime can recognize.
- The emitted program can be serialized to flatbuffer format.

🤖 Harold's Notes

Explorer

Edge Compilation

Edge Dialect.

Backend Dialect

Compile to ExecuTorch program

Graph View

Table of Contents

Backlinks