- https://pytorch.org/tutorials/advanced/cpp_extension.html
- How toĀ useĀ CUDA and C++ files to write customized kernel ops:
How to (high level)
- ToĀ create a customĀ op:
- WriteĀ CUDA kernels inĀ .cuĀ files
- Create C++ wrapper functions inĀ .cppĀ files
- Define the Python interface using pybind11 or the PyTorch C++ extension system
- UseĀ setuptools to compileĀ and linkĀ the code
- reatingĀ a PyTorch autograd FunctionĀ class
Python interface (.py)
- can use
from torch.utils.cpp_extension import load_inline
CUDA filesĀ (.cu):
- Contains CUDA kernel implementations
- WrittenĀ in CUDA C++
- Compiled with NVIDIAāsĀ CUDA compiler (nvcc)
C++ filesĀ (.cpp):
C++ interface
- Contains CPU implementations and CUDA kernelĀ launchers
- e.g.
Python binding
- Can use pybind11 within C++
HeaderĀ filesĀ (.h):
- Contain function declarations and interfaceĀ definition
How to (low level)
-
C++ extensions come in two flavors: They can be built āahead of timeā with
setuptools
, or ājust in timeā viatorch.utils.cpp_extension.load()
. -
Running example = lltm operation
-
We want to be able to write
import lltm_cpp
in our code
Folder setup and building
- The folder structure is
pytorch/
lltm-extension/
lltm.cpp
setup.py ## if we're using setuptools
- With
setuptools
, this issetup.py
- and you run
python setup.py install
- and you run
- With JIT compiling
Writing the C++ op
- Letās say we need the derivative of the sigmoid for the backward pass
<torch/extension.h>
is the one-stop header to include all the necessary PyTorch bits to write C++ extensions. It includes:- The ATen library, which is our primary API for tensor computation,
- pybind11, which is how we create Python bindings for our C++ code,
- Headers that manage the details of interaction between ATen and pybind11.
Exposing the functions
- Once you have your operation written in C++ and ATen, you can use pybind11 to bind your C++ functions or classes into Python in the C++ files
Mixed C++/CUDA
-
https://pytorch.org/tutorials/advanced/cpp_extension.html#writing-a-mixed-c-cuda-extension
-
The general strategy for writing a CUDA extension is to
-
first write a C++ file which defines the functions that will be called from Python, and binds those functions to Python with pybind11.
- Furthermore, this file will also declare functions that are defined in CUDA (
.cu
) files. - The C++ functions will then do some checks and ultimately forward its calls to the CUDA functions.
- Furthermore, this file will also declare functions that are defined in CUDA (
-
In the CUDA files, we write our actual CUDA kernels and the interfaces that do (the kernel launches). .
-
The
cpp_extension
package will then take care of compiling the C++ sources with a C++ compiler likegcc
and the CUDA sources with NVIDIAāsnvcc
compiler. This ensures that each compiler takes care of files it knows best to compile.
-
Defining the CUDA file
.cu
extension- NVCC can reasonably compile C++11, thus we still have ATen and the C++ standard library available to us (but not
torch.h
).
- NVCC can reasonably compile C++11, thus we still have ATen and the C++ standard library available to us (but not
- Note that
setuptools
cannot handle files with the same name but different extensions, so if you use thesetup.py
method instead of the JIT method, you must give your CUDA file a different name than your C++ file
Integrating into Pytorch
- https://pytorch.org/tutorials/advanced/cpp_extension.html#integrating-a-c-cuda-operation-with-pytorch
- Just like previously, can use
setuptools
or JIT compiling, the args are slightly different