## Building a shared library in C and using it in a Python program

Figure 1

How do old-time languages such as C, Fortran and others survive in a world with Python, Ruby and so on?

There is plenty legacy code still around which need maintaining, of course. And there are (will always be?) a few specific applications where low level is needed. But one of the great things with software is building upon old stuff using new tools, which brings us to our topic today: building a shared library containing some of our C stuff and using it in nice and comfy Python. Figure 1 shows an example of what we can achieve by using graphical tools available in Python to improve our existing code’s text-based output. More on that later on.

For our purposes, we consider shared libraries as a collection of compiled objects condensed into a single file, which may then be called by other software. This is, of course, a simplification. A longer discussion about shared and static libraries can be found in [1].

## Trees, part IV – Benchmarking Red-black and AVL trees

In our previous installments we implemented two of the most well-known self-balancing binary search trees: AVL and Red-black trees.

We had a few classes on AVL trees in our basic data structures & algorithms class back in college, which made its implementation far less of a challenge than the Red-black tree. So besides the fundamental guidance of CLRS I had to do quite some googling to get it working. While googling I noticed there were quite a lot of questions about which (AVL or RB) tree was “better” in some sense, be it insertion, search time, deletion time, etc. Most textbooks and articles dismiss this question just by stating the factor differences in either trees’ worst case heights, as we briefly mentioned in the past installment. If you’re anything like me, however, you’ll want to see some comparisons where the trees are actually tested. So I decided to do some simple benchmarking to test those theoretical worst-cases. Here’s what I found out.

First off, we need at least two cases: worst and average case. As we know from the previous installments, the worst possible case for BST insertion is when you are inserting continuously increasing or decreasing values, e.g. 1, 2, 3, 4, … . In this case, a pure BST would behave exactly like a (doubly) linked list, while self-balancing trees should should spread out node distribution. The worst possible searches would be the top or bottom values, i.e. those close to the end of the “list”: a pure BST would have to traverse the entire list (n time), while self-balancing trees should enjoy a $k~log(n)$ time with some factor k.

What would an “average case” look like? Hard to say; depend on what is average for your application. It might just be the case that sequences are the average case. Since we can’t define a “universal” average case and for the sake of simplicity, we will define the average case as a sequence of random numbers drawn from C’s rand() function (one might argue that this is actually the “best” case since on the long run the BST will “naturally” become quite reasonably balanced, but let’s not get picky about terminology).

Figure 1

## Trees, part III – Red-black tree

In our last installment on trees, we studied and implemented the AVL tree. The AVL tree is one of many self-balancing binary search trees, a special kind of BST that enforces sub-linear operation costs by maintaining tree height close to the theoretical minimum of $log_{2}(n)$. This is usually done by what is called tree rotation, which is basically moving around tree nodes (and updating some special node properties).

As you can see in the Wikipedia page¹, AVL trees guarantee that the tree height is strictly less than $\approx 1.44~log_{2}(n)$, while Red-black trees have a slightly worse threshold of $\approx 2~log_{2}(n)$; thus, AVL trees will provide significantly better search times than Red-black trees. However, while AVL trees may need to do $O(log(n))$ rotations after each insertion, Red-black trees must do at most 2 rotations per insertion. So either one may be your tree of choice depending on the application: if search time is critical but data doesn’t get updated too often, an AVL tree will perform better; whereas a Red-black tree will perform better in scenarios where data is constantly being changed.

Self-balancing BSTs add some kind of property to tree nodes that make way for tree balancing: with AVL trees, it was the “balance factor”. With Red-black trees, a “color” property is added to each node. This leads us to the Red-black tree properties:

1. Every node is either red or black
2. Every leaf is black
3. If a node is red, then both its children are black
4. Every path from a node to any of its descendant leafs contains the same number of black nodes