How to (high level)

  • ToĀ create a customĀ op:
    • WriteĀ CUDA kernels inĀ .cuĀ files
    • Create C++ wrapper functions inĀ .cppĀ files
    • Define the Python interface using pybind11 or the PyTorch C++ extension system
    • UseĀ setuptools to compileĀ and linkĀ the code
    • reatingĀ a PyTorch autograd FunctionĀ class

Python interface (.py)

  • can use from torch.utils.cpp_extension import load_inline

CUDA filesĀ (.cu):

  • Contains CUDA kernel implementations
  • WrittenĀ in CUDA C++
  • Compiled with NVIDIAā€™sĀ CUDA compiler (nvcc)

C++ filesĀ (.cpp):

C++ interface

  • Contains CPU implementations and CUDA kernelĀ launchers
  • e.g.

Python binding

  • Can use pybind11 within C++

HeaderĀ filesĀ (.h):

  • Contain function declarations and interfaceĀ definition

How to (low level)

Folder setup and building

  • The folder structure is
    lltm.cpp  ## if we're using setuptools
  • With setuptools, this is
    • and you run python install
from setuptools import setup, Extension
from torch.utils import cpp_extension
      ext_modules=[cpp_extension.CppExtension('lltm_cpp', ['lltm.cpp'])],
      cmdclass={'build_ext': cpp_extension.BuildExtension})
  • With JIT compiling
from torch.utils.cpp_extension import load
lltm_cpp = load(name="lltm_cpp", sources=["lltm.cpp"])

Writing the C++ op

  • Letā€™s say we need the derivative of the sigmoid for the backward pass
#include <torch/extension.h>
#include <iostream>
torch::Tensor d_sigmoid(torch::Tensor z) {
  auto s = torch::sigmoid(z);
  return (1 - s) * s;
  • <torch/extension.h> is the one-stop header to include all the necessary PyTorch bits to write C++ extensions. It includes:
    • The ATen library, which is our primary API for tensor computation,
    • pybind11, which is how we create Python bindings for our C++ code,
    • Headers that manage the details of interaction between ATen and pybind11.

Exposing the functions

  • Once you have your operation written in C++ and ATen, you can use pybind11 to bind your C++ functions or classes into Python in the C++ files
  m.def("forward", &lltm_forward, "LLTM forward");
  m.def("backward", &lltm_backward, "LLTM backward");

Mixed C++/CUDA


  • The general strategy for writing a CUDA extension is to

    • first write a C++ file which defines the functions that will be called from Python, and binds those functions to Python with pybind11.

      • Furthermore, this file will also declare functions that are defined in CUDA (.cu) files.
      • The C++ functions will then do some checks and ultimately forward its calls to the CUDA functions.
    • In the CUDA files, we write our actual CUDA kernels and the interfaces that do (the kernel launches). .

    • The cpp_extension package will then take care of compiling the C++ sources with a C++ compiler like gcc and the CUDA sources with NVIDIAā€™s nvcc compiler. This ensures that each compiler takes care of files it knows best to compile.

Defining the CUDA file

  • .cu extension
    • NVCC can reasonably compile C++11, thus we still have ATen and the C++ standard library available to us (but not torch.h).
  • Note that setuptools cannot handle files with the same name but different extensions, so if you use the method instead of the JIT method, you must give your CUDA file a different name than your C++ file

Integrating into Pytorch