How to (high level)

  • ToĀ create a customĀ op:
    • WriteĀ CUDA kernels inĀ .cuĀ files
    • Create C++ wrapper functions inĀ .cppĀ files
    • Define the Python interface using pybind11 or the PyTorch C++ extension system
    • UseĀ setuptools to compileĀ and linkĀ the code
    • reatingĀ a PyTorch autograd FunctionĀ class

Python interface (.py)

  • can use from torch.utils.cpp_extension import load_inline

CUDA filesĀ (.cu):

  • Contains CUDA kernel implementations
  • WrittenĀ in CUDA C++
  • Compiled with NVIDIAā€™sĀ CUDA compiler (nvcc)

C++ filesĀ (.cpp):

C++ interface

  • Contains CPU implementations and CUDA kernelĀ launchers
  • e.g.

Python binding

  • Can use pybind11 within C++

HeaderĀ filesĀ (.h):

  • Contain function declarations and interfaceĀ definition

How to (low level)

Folder setup and building

  • The folder structure is
pytorch/
  lltm-extension/
    lltm.cpp
    setup.py  ## if we're using setuptools
  • With setuptools, this is setup.py
    • and you run python setup.py install
from setuptools import setup, Extension
from torch.utils import cpp_extension
 
setup(name='lltm_cpp',
      ext_modules=[cpp_extension.CppExtension('lltm_cpp', ['lltm.cpp'])],
      cmdclass={'build_ext': cpp_extension.BuildExtension})
  • With JIT compiling
from torch.utils.cpp_extension import load
lltm_cpp = load(name="lltm_cpp", sources=["lltm.cpp"])

Writing the C++ op

  • Letā€™s say we need the derivative of the sigmoid for the backward pass
#include <torch/extension.h>
#include <iostream>
 
torch::Tensor d_sigmoid(torch::Tensor z) {
  auto s = torch::sigmoid(z);
  return (1 - s) * s;
}
  • <torch/extension.h> is the one-stop header to include all the necessary PyTorch bits to write C++ extensions. It includes:
    • The ATen library, which is our primary API for tensor computation,
    • pybind11, which is how we create Python bindings for our C++ code,
    • Headers that manage the details of interaction between ATen and pybind11.

Exposing the functions

  • Once you have your operation written in C++ and ATen, you can use pybind11 to bind your C++ functions or classes into Python in the C++ files
PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
  m.def("forward", &lltm_forward, "LLTM forward");
  m.def("backward", &lltm_backward, "LLTM backward");
}

Mixed C++/CUDA

  • https://pytorch.org/tutorials/advanced/cpp_extension.html#writing-a-mixed-c-cuda-extension

  • The general strategy for writing a CUDA extension is to

    • first write a C++ file which defines the functions that will be called from Python, and binds those functions to Python with pybind11.

      • Furthermore, this file will also declare functions that are defined in CUDA (.cu) files.
      • The C++ functions will then do some checks and ultimately forward its calls to the CUDA functions.
    • In the CUDA files, we write our actual CUDA kernels and the interfaces that do (the kernel launches). .

    • The cpp_extension package will then take care of compiling the C++ sources with a C++ compiler like gcc and the CUDA sources with NVIDIAā€™s nvcc compiler. This ensures that each compiler takes care of files it knows best to compile.

Defining the CUDA file

  • .cu extension
    • NVCC can reasonably compile C++11, thus we still have ATen and the C++ standard library available to us (but not torch.h).
  • Note that setuptools cannot handle files with the same name but different extensions, so if you use the setup.py method instead of the JIT method, you must give your CUDA file a different name than your C++ file

Integrating into Pytorch