- https://pytorch.org/executorch/main/kernel-library-overview.html
- https://github.com/pytorch/executorch/blob/main/kernels/README.md
How kernels are handled
-
An ExecuTorch program encodes instructions that describe the computation that should be performed by the program.
-
Many of these instructions will correspond to calling a specific ATen operator, for example
aten.convolution
.- one of the core design principles of ExecuTorch is that the signature of an operator should be separate from the implementation of the operator.
- runtime does not ship with any standard implementation for ATen operators
- users must make sure to link against kernel libraries that contain implementations of the operators required by their ExecuTorch program
-
A kernel library is simply a collection of ATen operator implementations that follow a common theme or design principle
First-party kernel libraries
Portable Kernel Library
- https://github.com/pytorch/executorch/tree/main/kernels/portable
- Optimized for
- Correctness (straightforward implementations of ATen operators that are strictly consistent with the original implementation of the operator in PyTorch’s ATen library)
- Readability
- Portability (just as portable as the ExecuTorch runtime, no external deps or unsanctioned features of C++)
- Operator coverage (implementation for every operator listed as a Core ATen operator)
- Example:
add.out
Optimized Kernel Library
-
https://github.com/pytorch/executorch/tree/main/kernels/optimized
-
Many operator implementations in the Optimized Kernel Library are inspired or based off of the corresponding implementation in PyTorch’s ATen library, so in many cases one can expect the same degree of performance.
-
Generally speaking, operators in the Optimized Kernel Library are optimized in one of two ways:
- Using CPU vector intrinsics
- Using optimized math libraries, such as
sleef
andOpenBLAS
-
Example:
add.out
Quantized kernel library
-
https://github.com/pytorch/executorch/tree/main/kernels/quantized
-
Quantized Kernel Library: Handles quantization and dequantization operations.
Writing your custom kernel
-
https://pytorch.org/executorch/main/kernel-library-custom-aten-kernel.html
-
In order to author and use custom kernels implementing core ATen ops, using the YAML based approach is recommended, because it provides full fledged support on
- combining kernel libraries and define fallback kernels;
- using selective build to minimize the kernel size
-
A Custom operator is any operator that an ExecuTorch user defines outside of PyTorch’s
native_functions.yaml
.
Constraints
Out-variants only
ExecuTorch only supports out-style operators, where:
-
The caller provides the output Tensor or Tensor list in the final position with the name
out
. -
The C++ function modifies and returns the same
out
argument.- If the return type in the YAML file is
()
(which maps to void), the C++ function should still modifyout
but does not need to return anything.
- If the return type in the YAML file is
-
Conventionally, these out operators are named using the pattern
<name>.out
or<name>.<overload>_out
. -
ExecuTorch only supports operators that return a single
Tensor
, or the unit type()
(which maps tovoid
). It does not support returning any other types, including lists, optionals, tuples, or scalars likebool
.- Since all output values are returned via an
out
parameter, ExecuTorch ignores the actual C++ function return value. But, to be consistent, functions should always returnout
when the return type is non-void
.
- Since all output values are returned via an
Supported argument types
- ExecuTorch does not support all of the argument types that core PyTorch supports. Here’s a list of the argument types they currently support:
1. Follow Executorch internal README
- https://github.com/pytorch/executorch/blob/main/kernels/README.md
- clarifies how to register and test the custom kernel, but unclear how to use it in Pytorch code yet
2. C++ API for Custom Ops (unclear if it works yet)
-
clarifies how to link a kernel to pytorch, but no clear how to build it
-
The C++ API only uses C++ macros
EXECUTORCH_LIBRARY
andWRAP_TO_ATEN
for kernel registration, also without selective build support.- It makes this API faster in terms of development speed, since users don’t have to do YAML authoring and build system tweaking.
-
Similar to
TORCH_LIBRARY
in PyTorch,EXECUTORCH_LIBRARY
takes the operator name and the C++ function name and register them into ExecuTorch runtime. -
Example in codebase
linear_scratch_example
:- kernel impl (https://github.com/pytorch/executorch/blob/main/kernels/portable/cpu/op_linear_scratch_example.cpp)
- custom_ops.yaml (https://github.com/pytorch/executorch/blob/main/kernels/portable/custom_ops.yaml)
- linking the kernel (https://github.com/pytorch/executorch/blob/main/kernels/portable/targets.bzl)
Prepare custom kernel implementation
- Define your custom operator schema for both functional variant (used in AOT compilation) and out variant (used in ExecuTorch runtime). The schema needs to follow PyTorch ATen convention (see
native_functions.yaml
). For example:
- Then write your custom kernel according to the schema using ExecuTorch types, along with APIs to register to ExecuTorch runtime:
- Write a wrapper for the op to show up in Pytorch (separate .cpp file)
Compiling and linking the custom kernel
- In our
CMakeLists.txt
that builds the binary/application, we need to add custom_linear.h/cpp into the binary target.- We can build a dynamically loaded library (.so or .dylib) and link it as well.
Using it in Pytorch
- Link it into the PyTorch runtime:
- We need to package custom_linear.h, custom_linear.cpp and custom_linear_pytorch.cpp into a dynamically loaded library (.so or .dylib)
- load it into our python environment
import torch
torch.ops.load_library("libcustom_linear.so/dylib")
# Now we have access to the custom op, backed by kernel implemented in custom_linear.cpp.
op = torch.ops.myop.custom_linear.default