- https://pytorch.org/executorch/main/kernel-library-overview.html
- https://github.com/pytorch/executorch/blob/main/kernels/README.md
How kernels are handled
-
An ExecuTorch program encodes instructions that describe the computation that should be performed by the program.
-
Many of these instructions will correspond to calling a specific ATen operator, for example
aten.convolution.- one of the core design principles of ExecuTorch is that the signature of an operator should be separate from the implementation of the operator.
- runtime does not ship with any standard implementation for ATen operators
- users must make sure to link against kernel libraries that contain implementations of the operators required by their ExecuTorch program
-
A kernel library is simply a collection of ATen operator implementations that follow a common theme or design principle
First-party kernel libraries
Portable Kernel Library
- https://github.com/pytorch/executorch/tree/main/kernels/portable
- Optimized for
- Correctness (straightforward implementations of ATen operators that are strictly consistent with the original implementation of the operator in PyTorch’s ATen library)
- Readability
- Portability (just as portable as the ExecuTorch runtime, no external deps or unsanctioned features of C++)
- Operator coverage (implementation for every operator listed as a Core ATen operator)
- Example:
add.out
Optimized Kernel Library
-
https://github.com/pytorch/executorch/tree/main/kernels/optimized
-
Many operator implementations in the Optimized Kernel Library are inspired or based off of the corresponding implementation in PyTorch’s ATen library, so in many cases one can expect the same degree of performance.
-
Generally speaking, operators in the Optimized Kernel Library are optimized in one of two ways:
- Using CPU vector intrinsics
- Using optimized math libraries, such as
sleefandOpenBLAS
-
Example:
add.out
Quantized kernel library
-
https://github.com/pytorch/executorch/tree/main/kernels/quantized
-
Quantized Kernel Library: Handles quantization and dequantization operations.
Writing your custom kernel
-
https://pytorch.org/executorch/main/kernel-library-custom-aten-kernel.html
-
In order to author and use custom kernels implementing core ATen ops, using the YAML based approach is recommended, because it provides full fledged support on
- combining kernel libraries and define fallback kernels;
- using selective build to minimize the kernel size
-
A Custom operator is any operator that an ExecuTorch user defines outside of PyTorch’s
native_functions.yaml.
Constraints
Out-variants only
ExecuTorch only supports out-style operators, where:
-
The caller provides the output Tensor or Tensor list in the final position with the name
out. -
The C++ function modifies and returns the same
outargument.- If the return type in the YAML file is
()(which maps to void), the C++ function should still modifyoutbut does not need to return anything.
- If the return type in the YAML file is
-
Conventionally, these out operators are named using the pattern
<name>.outor<name>.<overload>_out. -
ExecuTorch only supports operators that return a single
Tensor, or the unit type()(which maps tovoid). It does not support returning any other types, including lists, optionals, tuples, or scalars likebool.- Since all output values are returned via an
outparameter, ExecuTorch ignores the actual C++ function return value. But, to be consistent, functions should always returnoutwhen the return type is non-void.
- Since all output values are returned via an
Supported argument types
- ExecuTorch does not support all of the argument types that core PyTorch supports. Here’s a list of the argument types they currently support:
1. Follow Executorch internal README
- https://github.com/pytorch/executorch/blob/main/kernels/README.md
- clarifies how to register and test the custom kernel, but unclear how to use it in Pytorch code yet
2. C++ API for Custom Ops (unclear if it works yet)
-
clarifies how to link a kernel to pytorch, but no clear how to build it
-
The C++ API only uses C++ macros
EXECUTORCH_LIBRARYandWRAP_TO_ATENfor kernel registration, also without selective build support.- It makes this API faster in terms of development speed, since users don’t have to do YAML authoring and build system tweaking.
-
Similar to
TORCH_LIBRARYin PyTorch,EXECUTORCH_LIBRARYtakes the operator name and the C++ function name and register them into ExecuTorch runtime. -
Example in codebase
linear_scratch_example:- kernel impl (https://github.com/pytorch/executorch/blob/main/kernels/portable/cpu/op_linear_scratch_example.cpp)
- custom_ops.yaml (https://github.com/pytorch/executorch/blob/main/kernels/portable/custom_ops.yaml)
- linking the kernel (https://github.com/pytorch/executorch/blob/main/kernels/portable/targets.bzl)
Prepare custom kernel implementation
- Define your custom operator schema for both functional variant (used in AOT compilation) and out variant (used in ExecuTorch runtime). The schema needs to follow PyTorch ATen convention (see
native_functions.yaml). For example:
custom_linear(Tensor weight, Tensor input, Tensor(?) bias) -> Tensor
custom_linear.out(Tensor weight, Tensor input, Tensor(?) bias, *, Tensor(a!) out) -> Tensor(a!)- Then write your custom kernel according to the schema using ExecuTorch types, along with APIs to register to ExecuTorch runtime:
// custom_linear.h/custom_linear.cpp
#include <executorch/runtime/kernel/kernel_includes.h>
Tensor& custom_linear_out(const Tensor& weight, const Tensor& input, optional<Tensor> bias, Tensor& out) {
// calculation
return out;
}
// opset namespace myop
EXECUTORCH_LIBRARY(myop, "custom_linear.out", custom_linear_out);- Write a wrapper for the op to show up in Pytorch (separate .cpp file)
// custom_linear_pytorch.cpp
#include "custom_linear.h"
#include <torch/library.h>
at::Tensor custom_linear(const at::Tensor& weight, const at::Tensor& input, std::optional<at::Tensor> bias) {
// initialize out
at::Tensor out = at::empty({weight.size(1), input.size(1)});
// wrap kernel in custom_linear.cpp into ATen kernel
WRAP_TO_ATEN(custom_linear_out, 3)(weight, input, bias, out);
return out;
}
// standard API to register ops into PyTorch
TORCH_LIBRARY(myop, m) {
m.def("custom_linear(Tensor weight, Tensor input, Tensor(?) bias) -> Tensor", custom_linear);
m.def("custom_linear.out(Tensor weight, Tensor input, Tensor(?) bias, *, Tensor(a!) out) -> Tensor(a!)", WRAP_TO_ATEN(custom_linear_out, 3));
}Compiling and linking the custom kernel
- In our
CMakeLists.txtthat builds the binary/application, we need to add custom_linear.h/cpp into the binary target.- We can build a dynamically loaded library (.so or .dylib) and link it as well.
#CMakeLists.txt
# For target_link_options_shared_lib
include(${EXECUTORCH_ROOT}/build/Utils.cmake)
# Add a custom op library
add_library(custom_op_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/custom_op.cpp)
# Include the header
target_include_directory(custom_op_lib PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/include)
# Link ExecuTorch library
target_link_libraries(custom_op_lib PUBLIC executorch)
# Define a binary target
add_executable(custom_op_runner PUBLIC main.cpp)
# Link this library with --whole-archive !! IMPORTANT !! this is to avoid the operators being stripped by linker
target_link_options_shared_lib(custom_op_lib)
# Link custom op lib
target_link_libraries(custom_op_runner PUBLIC custom_op_lib)Using it in Pytorch
- Link it into the PyTorch runtime:
- We need to package custom_linear.h, custom_linear.cpp and custom_linear_pytorch.cpp into a dynamically loaded library (.so or .dylib)
- load it into our python environment
import torch
torch.ops.load_library("libcustom_linear.so/dylib")
# Now we have access to the custom op, backed by kernel implemented in custom_linear.cpp.
op = torch.ops.myop.custom_linear.default