Thursday, December 31, 2015

Kd-trees and the 29-bit pointer trick

Several popular kd-tree implementations use a mysterious 29-bit integer in place of child pointers. LuxRender does it (kdtree.h), POV-Ray does it (bsptree.h), and so do many other projects.

That might strike you as a bit surprising.

For one, the number 29 is a little random. It isn't mentioned in the popular literature [1][2][3], and the whole notion feels contrary to how binary trees are usually implemented.

I'd like to explore this trick a bit more and explain why it's useful. To do that, we should start at the beginning and take a closer look at how space-partitioning trees work.

3D Space Partitioning

Let's say that all of the objects we want to render or simulate live in a box.

We'd like to split the box into smaller pieces to facilitate fast searches and intersection tests. There are 3 different ways to partition this box with axis-aligned splitting planes: an up/down split (y = a), a left/right split (x = b), and a front/back split (z = c).

These splitting planes can be used to form a tree structure: the kd-tree. The root node splits the world into two halves. Child nodes split these halves into successively smaller pieces.

A 3d-tree visualization from Wikipedia [k-d tree]

An Efficient Tree Structure

All that's left is to find an efficient way to represent kd-trees.

Internal nodes split the box into two parts. That means that each internal node contains a splitting plane of the form "<axis> = <splitting-position>" and two children. Leaf nodes are collections of objects, which means they just contain a single pointer.

To save space, we can store the tree nodes in a resizable array. Now, an internal node only needs a pointer to one of its children because we can guarantee that its sibling is right next to it in the array.

A kd-tree represented as an array, with explicit child pointers shown

That was the crucial piece! The final tree node structure breaks down as follows:
  • We need 32 bits to represent the splitting position (just a regular float).
  • We need 2 bits to represent the splitting axis (since there are only 3 possible axes).
  • We use the least-significant bit to check for internal nodes. Alignment rules guarantee that this bit is set to 0 in leaf nodes.
  • Finally, we need a pointer to the first child (i.e, an index into the node array). To make everything fit nicely in a 64-bit qword, we'll use 64 - 32 - 2 - 1 = 29 bits!
You can use a union to represent both kinds of tree nodes conveniently.


On my machine, this representation lets me fit 8 tree nodes into a single cacheline. This is a significant space-savings  (~77% on x86_64) over the one-fat-pointer-per-child approach. It's probably faster too (caveat -- I haven't measured this effect).

The downside is that this representation makes it harder to modify the tree. If you're working with static scenes and have a good heuristic for finding splitting planes, this shouldn't matter much.