Machine Learning, AI and Programming

Interfacing C++ with Cython

In the last post we saw how we can use Cython programming language to boost speed of Python programs. For a simple program like finding primes upto a certain N, we obtained a gain of around 50-60x with Cython as compared to a naive Python implementation. This is significant when we are going to deploy our codes in production. A complex system will have multiple such programs calling each other in a complicated way and a delay of 10-100 ms in each program can cause an total delay of more than a few seconds in the overall system.

In this post we are going to look at another aspect of Cython, i.e. instead of 'Cythonizing' Python codes, we will wrap pure C++ codes with Cython and let Python modules import them. This is mostly how Python packages are developed and deployed.

In order to demonstrate that, I will be implementing a 'Keyed Priority Queue'. Python already has a priority queue implementation in the form of the 'heapq' package, but heapq do not allow us to delete or update an arbitrary element efficiently. Obviously to delete an element from the heap array, set the element to be equal to the last element in the heap array, delete the last element and then run 'heapify'. 

This is O(N) run-time complexity, but ideally we should be able to delete an element in O(logN) time complexity. Similarly for updating an element (update->heapify). Our implementation uses an intermediate hash map data structure.

Each element in our heap has an unique identifier e.g. index of the element in the original array (non-heap). The hash function maps this unique identifier to the position of the element in the heap array. This allows us to access any element in O(1) time. Deleting or updating after that is in O(logN). Here are the specifications of our data structure:

  • Each element is a 3-integer-tuple (x, y, z), where 'z' is the index of the element in the original array and is also the key to the hash function.
  • Each element is represented as a struct of 3 integers within the class.
  • The heap is a min-heap on 'x' but uses the values 'y' as a tie-breaker. Thus for any two elements A and B, A will be 'above' B in the heap (or position of A will be less than half the position of B) if either (A.x < B.x) or (A.x == B.x and A.y > B.y)
  • The data structure allows us to perform the following operations:
    • pop(): Deletes the root and returns it. 
    • pop(key): Deletes an element identified by key and returns it.
    • push(value): Adds an element to the heap.
    • set(key, value): Updates an element identified by key to the new 'value'.
    • get(key): Fetches an element identified by key.
    • size(): Returns the current size of the heap.

Updating an element in a heap is an important feature required in certain algorithms like the Dijkstra's Shortest Path Finding algorithm.

The idea is to write a C++ class from scratch that implements the KeyedPriorityQueue data structure and then write a Cython wrapper class over it that is callable as a Python package. So lets' begin the C++ implementation first.

Create a C++ header file 'KeyedPriorityQueue.h' and add the following code:


#include <iostream> 
#include <map>
#include <vector>

struct HeapNode {
    int a;
    int b;
    int c;

class KeyedPriorityQueue {
        std::vector<HeapNode> heap;
        std::map<int, int> heap_ref;
        bool comparator(int i, int j);
        int swap(int i, int j);
        void adjust_parent(int index);
        void adjust_child(int index);
        HeapNode remove(int index);
        KeyedPriorityQueue(std::vector<HeapNode> heap);
        int size();
        HeapNode pop();
        HeapNode pop(int key);
        void push(HeapNode value);
        HeapNode get(int key);
        void set(int key, HeapNode value);


Note that only the functions that we want to expose to the Cython wrapper are declared as 'public', remaining all are declared as 'private'. The header file is only used to declare the class methods and the struct 'HeapNode'. We will define the methods in the actual .cpp file.

Create a file named 'KeyedPriorityQueue.cpp' and add the following code:

#include "KeyedPriorityQueue.h"

KeyedPriorityQueue::KeyedPriorityQueue () {}

KeyedPriorityQueue::KeyedPriorityQueue (std::vector<HeapNode> heap) {
    for (int i=0; i < static_cast<int>(heap.size()); i++) {

KeyedPriorityQueue::~KeyedPriorityQueue () {}
int KeyedPriorityQueue::size () {
    return static_cast<int>(this->heap.size());

bool KeyedPriorityQueue::comparator(int i, int j) {
    int n = this->size();

    bool cond1 = i < n && j < n && this->heap[i].a > this->heap[j].a;
    bool cond2 = i < n && j < n && this->heap[i].a == this->heap[j].a && this->heap[i].b < this->heap[j].b;

    return cond1 || cond2;

int KeyedPriorityQueue::swap (int i, int j) {
    HeapNode x = this->heap[i];
    HeapNode y = this->heap[j];
    HeapNode temp1 = x;
    int temp2 = this->heap_ref[x.c];
    this->heap[i] = y;
    this->heap_ref[x.c] = this->heap_ref[y.c];
    this->heap[j] = temp1;
    this->heap_ref[y.c] = temp2;

    return j;

void KeyedPriorityQueue::adjust_parent (int index) {
    while (index > 0 && this->comparator(index/2, index)){
        index = this->swap(index, index/2);

void KeyedPriorityQueue::adjust_child (int index) {
    int n = this->size();

    while (true) {
        bool a = 2*index+1 < n && this->comparator(index, 2*index+1);
        bool b = 2*index+2 < n && this->comparator(index, 2*index+2);

        if (a && b) {
            if (this->comparator(2*index+1, 2*index+2)) {
                index = this->swap(index, 2*index+2);
            else {
                index = this->swap(index, 2*index+1);
        else if (a) {
            index = this->swap(index, 2*index+1);
        else if (b) {
            index = this->swap(index, 2*index+2);
        else {

HeapNode KeyedPriorityQueue::remove (int index) {
    HeapNode out = this->heap[index];

    if (index < this->size()-1) {
        this->heap[index] = this->heap.back();
        HeapNode x = this->heap[index];
        this->heap_ref[x.c] = index;


    if (index < this->size()) {
        if (index > 0 && this->comparator(index/2, index)) {
        else {

    return out;

HeapNode KeyedPriorityQueue::pop () {
    return this->remove(0);

HeapNode KeyedPriorityQueue::pop (int key) {
    HeapNode out = {0, 0, -1};
    if (this->heap_ref.find(key) != this->heap_ref.end()) {
        return this->remove(this->heap_ref[key]);
    return out;

HeapNode KeyedPriorityQueue::get (int key) {
    HeapNode out = {0, 0, -1};
    if (this->heap_ref.find(key) != this->heap_ref.end()) {
        return this->heap[this->heap_ref[key]];

    return out;

void KeyedPriorityQueue::push (HeapNode value) {
    int key = value.c;
    if (this->heap_ref.find(key) == this->heap_ref.end()) {
        this->heap_ref[value.c] = this->size()-1;

void KeyedPriorityQueue::set (int key, HeapNode value) {
    if (this->heap_ref.find(key) != this->heap_ref.end()) {

Import the 'KeyedPriorityQueue.h' at the top.

Some short descriptions of each of the private methods:

  • 'comparator' returns True if the elements at positions 'i' and 'j' should be swapped based on the condition else False.
  • 'swap' method swaps element at position 'i' with element at position 'j' and returns the new position of 'i' i.e. 'j'.
  • 'adjust_parent' method moves the element at 'index' up the tree to the correct position so that it is not further possible to swap an element with its parent.
  • 'adjust_child' method moves the element at 'index' down the tree. It checks an element with its two children (left and right) and then swaps it with one of them based on condition. Repeat until its not further possible to swap with any child.
  • 'remove' method deletes an element at 'index'. It replaces the element at 'index' with the last element of the heap array. Then depending on whether the new element at 'index' is in conflict with its parent or a child, the element is moved up or down respectively.

We are now done with developing the C++ codes. In order to ensure that there is no compile time error in the C++ codes, run

'g++ KeyedPriorityQueue.cpp'

from the command line and if it compiles successfully, it implies that there are no compile time errors.

Now we need to define our Cython wrapper class. Note that we need Cython wrapper because we cannot return C++ objects in a Python program. Create a file named 'PyKeyedPriorityQueue.pyx' and then add the following code:

from libcpp.vector cimport vector
import collections

cdef extern from "KeyedPriorityQueue.h":
    cdef struct HeapNode:
        int a, b, c
    cdef cppclass KeyedPriorityQueue:
        KeyedPriorityQueue() except +
        KeyedPriorityQueue(vector[HeapNode]) except +
        int size()
        HeapNode pop()
        HeapNode pop(int key)
        HeapNode get(int key)
        void set(int key, HeapNode value)
        void push(HeapNode value)

cdef class PyKeyedPriorityQueue(object):
    cdef KeyedPriorityQueue kpqueue
    def __cinit__(self, arr):
        cdef vector[HeapNode] heap_arr
        cdef HeapNode node
        for x, y, z in arr:
            node.a, node.b, node.c = x, y, z
        self.kpqueue = KeyedPriorityQueue(heap_arr)
    def size(self):
        return self.kpqueue.size()
    def pop(self, key=None):
        cdef HeapNode out
        if key is None:
            out = self.kpqueue.pop()
            out = self.kpqueue.pop(key)
        return (out.a, out.b, out.c)
    def get(self, key):
        cdef HeapNode out = self.kpqueue.get(key)
        return (out.a, out.b, out.c)
    def set(self, key, value):
        cdef HeapNode node
        node.a, node.b, node.c = value
        return self.kpqueue.set(key, node)
    def push(self, value):
        cdef HeapNode node
        node.a, node.b, node.c = value
        return self.kpqueue.push(node)

Note that in order to use our C++ class 'KeyedPriorityQueue' in our Cython program, we need to import it. This is done by the following code:

cdef extern from "KeyedPriorityQueue.h":
    cdef struct HeapNode:
        int a, b, c
    cdef cppclass KeyedPriorityQueue:
        KeyedPriorityQueue() except +
        KeyedPriorityQueue(vector[HeapNode]) except +
        int size()
        HeapNode pop()
        HeapNode pop(int key)
        HeapNode get(int key)
        void set(int key, HeapNode value)
        void push(HeapNode value)

This imports both the struct definition as well as the 'KeyedPriorityQueue' class definition along with the public class members and methods. This is Cython equivalent of C++ header file.

We define our Cython version of the 'KeyedPriorityQueue' as 'PyKeyedPriorityQueue' class. Instead of redefining our methods in the Cython class 'PyKeyedPriorityQueue', we initialize a 'KeyedPriorityQueue' object 'kpqueue' which will call the C++ equivalents of these methods instead and return the appropriate Python objects. The method '__cinit__' is Cython equivalent of Python '__init__' which initializes the C++ constructor 'KeyedPriorityQueue()' and creates a heap out of the input array.

Also note that unlike C++, the Cython wrapper class 'PyKeyedPriorityQueue' has only one pop() method with 'key' as an optional parameter. This is because in Python, function overloading does not work as C++.

Now once we are done with our data structure definition for 'KeyedPriorityQueue', we need to build the codes in our '' file:

from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext

ext_modules=[ Extension("PyKeyedPriorityQueue",
              sources=["PyKeyedPriorityQueue.pyx", "KeyedPriorityQueue.cpp"], language='c++')]

  name = "PyKeyedPriorityQueue",
  cmdclass = {"build_ext": build_ext},
  ext_modules = ext_modules)

Note that we have to specify in the setup file that the compilation language is C++ and not in C (which is default).

Run the following command:

python build_ext --inplace

which creates the necessary build files that can be distributed as a Python package.

We can test the program by creating a random Numpy integer array of N=1 million elements (of 3 integers each). Then call the 'PyKeyedPriorityQueue' with this dummy array, which will create a heap data structure out of it. Each element of the heap would be a HeapNode struct of 3 integers.

from PyKeyedPriorityQueue import PyKeyedPriorityQueue
import numpy as np
n = 1000000
arr1 = np.random.randint(low=0, high=50, size=(n,))
arr2 = np.random.randint(low=100, high=500, size=(n,))
arr3 = np.arange(n)

arr = np.vstack((arr1, arr2, arr3)).T
queue = PyKeyedPriorityQueue(arr)

Adding all the elements in the array to the heap, takes O(N*logN) time complexity but this can be achieved more efficiently in O(N) using the 'heapify' algorithm. But it still quite fast, takes about 2 seconds to add 1 million elements to the heap.

We can test the individual methods of the heap such as get(), pop(), push() etc. by calling them with the 'queue' object.

print queue.get(1)
queue.push([13, 268, n])
print queue.pop()
print queue.pop(10)
print queue.size()

Thus now we have a fully working Keyed Priority Queue data structure in place. Looking forward to use more of Cython in the near future.

External Resources:


Tags: , , , , , ,