Several popular kd-tree implementations use a mysterious 29-bit integer in place of child pointers. LuxRender does it (kdtree.h), POV-Ray does it (bsptree.h), and so do many other projects.
That might strike you as a bit surprising.
For one, the number 29 is a little random. It isn't mentioned in the popular literature [1][2][3], and the notion of storing an integer instead of a pair of child pointers may seem unconventional.
I'd like to explore this technique a bit more and explain why it's useful. To do that, it's helpful to start at the beginning by taking a closer look at how space-partitioning trees work.
That might strike you as a bit surprising.
For one, the number 29 is a little random. It isn't mentioned in the popular literature [1][2][3], and the notion of storing an integer instead of a pair of child pointers may seem unconventional.
I'd like to explore this technique a bit more and explain why it's useful. To do that, it's helpful to start at the beginning by taking a closer look at how space-partitioning trees work.
3D Space Partitioning
Let's say that all of the objects we want to render or simulate live in a box.
We'd like to split the box into smaller pieces to facilitate fast searches and intersection tests. There are 3 different ways to partition this box with axis-aligned splitting planes: an up/down split (y = a), a left/right split (x = b), and a front/back split (z = c).
These splitting planes can be used to form a tree structure: the kd-tree. The root node splits the world into two halves. Child nodes split these halves into successively smaller pieces.
A 3d-tree visualization from Wikipedia [k-d tree] |
An Efficient Tree Structure
All that's left is to find an efficient way to represent kd-trees.
Internal nodes split the box into two parts. That means that each internal node contains a splitting plane of the form "<axis> = <splitting-position>" and two children. Leaf nodes are collections of objects, which means they just contain a single pointer.
To save space, we can store the tree nodes in a resizable array. Now, an internal node only needs a pointer to one of its children because we can guarantee that its sibling is right next to it in the array.
A kd-tree represented as an array, with explicit child pointers shown |
That was the crucial piece! The final tree node structure breaks down as follows:
- We need 32 bits to represent the splitting position (just a regular float).
- We need 2 bits to represent the splitting axis (since there are only 3 possible axes).
- We use the least-significant bit to check for internal nodes. Alignment rules guarantee that this bit is set to 0 in leaf nodes.
- Finally, we need a pointer to the first child (i.e, an index into the node array). To make everything fit nicely in a 64-bit qword, we'll use 64 - 32 - 2 - 1 = 29 bits!
You can use a union to represent both kinds of tree nodes conveniently.
Wrap-Up
On my machine, this representation lets me fit 8 tree nodes into a single cacheline. This is a significant space-savings (~77% on x86_64) over the one-fat-pointer-per-child approach. It's probably faster too (caveat -- I haven't measured this effect).
The downside is that this representation makes it harder to modify the tree. If you're working with static scenes and have a good heuristic for finding splitting planes, this shouldn't matter much.