4.5. Bridging Python and C/C++ Functions¶

Developers frequently encounter the need to incorporate custom operators into a machine learning framework. These operators implement new models, optimizers, data processing functions, and more. Custom operators, in particular, often require implementation in C/C++ to achieve optimized performance. They also have Python interfaces, facilitating developers to integrate custom operators with existing machine learning workflows written in Python. This section will delve into the implementation details of this process.

The Python interpreter, being implemented in C, enables the invocation of C and C++ functions within Python. Contemporary machine learning frameworks such as TensorFlow, PyTorch, and MindSpore rely on pybind11 to automatically generate Python functions from underlying C and C++ functions. This mechanism is known as Python binding. Prior to the advent of pybind11, Python binding was accomplished using one of the following approaches:

C-APIs in Python: This approach necessitates the inclusion of Python.h in C++ programs and the utilization of Python’s C-APIs to execute Python operations. To effectively work with C-APIs, developers must possess a comprehensive understanding of Python’s internal implementation, such as managing reference counting.
Simplified Wrapper and Interface Generator (SWIG): SWIG serves as a bridge between C/C++ code and Python, and it played a significant role in the initial development of TensorFlow. Utilizing SWIG involves crafting intricate interface statements and relying on SWIG to automatically generate C code that interfaces with Python’s C-APIs. However, due to the lack of readability in the generated code, the maintenance costs associated with it tend to be high.
Python ``ctypes`` module: This module encompasses a comprehensive range of types found in the C language and allows direct invocation of dynamic link libraries (DLLs). However, a limitation of this module is its heavy reliance on native C types, which results in insufficient support for customized types.
CPython: In basic terms, CPython can be described as the fusion of Python syntax with static types from the C language. It facilitates the retention of Python’s syntax while automatically translating CPython functions into C/C++ code. This functionality empowers developers to seamlessly incorporate invocations of C/C++ functions within the CPython environment.
Boost::Python (a C++ library): Boost::Python allows for the exposure of C++ functions as Python functions. It operates on similar principles to Python’s C-APIs but provides a more user-friendly interface. However, the reliance on the Boost library introduces a significant dependency on third-party components, which can be a potential drawback for Boost::Python.

In comparison to the above Python binding approaches, pybind11 shares similarities with Boost::Python in terms of simplicity and usability. However, pybind11 stands out due to its focus on supporting C++ 11 and eliminating dependencies on Boost. As a lightweight Python library, pybind11 is particularly suitable for exposing numerous Python functions in complex C++ projects such as the machine learning system discussed in this book. The combination of Code ch02/code2.5.1 and Code ch02/code2.5.2 is an example of adding a custom operator to Pytorch with the integration of C++ and Python:

In C++:

ch02/code2.5.1

//custom_add.cpp
#include <torch/extension.h>
#include <pybind11/pybind11.h>

torch::Tensor custom_add(torch::Tensor a, torch::Tensor b) {
    return a + b;
}

PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
    m.def("custom_add", &custom_add, "A custom add function");
}

In Python:

ch02/code2.5.2

import torch
from torch.utils.cpp_extension import load

# Load the C++ extension
custom_extension = load(
    name='custom_extension',
    sources=['custom_add.cpp'],
    verbose=True
)
# Use your custom add function
a = torch.randn(10)
b = torch.randn(10)
c = custom_extension.custom_add(a, b)