It's a vedant

Thursday, December 31, 2015

Kd-trees and the 29-bit pointer trick

Several popular kd-tree implementations use a mysterious 29-bit integer in place of child pointers. LuxRender does it (kdtree.h), POV-Ray does it (bsptree.h), and so do many other projects.

That might strike you as a bit surprising.

For one, the number 29 is a little random. It isn't mentioned in the popular literature [1][2][3], and the notion of storing an integer instead of a pair of child pointers may seem unconventional.

I'd like to explore this technique a bit more and explain why it's useful. To do that, it's helpful to start at the beginning by taking a closer look at how space-partitioning trees work.

3D Space Partitioning

Let's say that all of the objects we want to render or simulate live in a box.

We'd like to split the box into smaller pieces to facilitate fast searches and intersection tests. There are 3 different ways to partition this box with axis-aligned splitting planes: an up/down split (y = a), a left/right split (x = b), and a front/back split (z = c).

These splitting planes can be used to form a tree structure: the kd-tree. The root node splits the world into two halves. Child nodes split these halves into successively smaller pieces.

A 3d-tree visualization from Wikipedia [k-d tree]

An Efficient Tree Structure

All that's left is to find an efficient way to represent kd-trees.

Internal nodes split the box into two parts. That means that each internal node contains a splitting plane of the form "<axis> = <splitting-position>" and two children. Leaf nodes are collections of objects, which means they just contain a single pointer.

To save space, we can store the tree nodes in a resizable array. Now, an internal node only needs a pointer to one of its children because we can guarantee that its sibling is right next to it in the array.

A kd-tree represented as an array, with explicit child pointers shown

That was the crucial piece! The final tree node structure breaks down as follows:

We need 32 bits to represent the splitting position (just a regular float).
We need 2 bits to represent the splitting axis (since there are only 3 possible axes).
We use the least-significant bit to check for internal nodes. Alignment rules guarantee that this bit is set to 0 in leaf nodes.
Finally, we need a pointer to the first child (i.e, an index into the node array). To make everything fit nicely in a 64-bit qword, we'll use 64 - 32 - 2 - 1 = 29 bits!

You can use a union to represent both kinds of tree nodes conveniently.

Wrap-Up

On my machine, this representation lets me fit 8 tree nodes into a single cacheline. This is a significant space-savings (~77% on x86_64) over the one-fat-pointer-per-child approach. It's probably faster too (caveat -- I haven't measured this effect).

The downside is that this representation makes it harder to modify the tree. If you're working with static scenes and have a good heuristic for finding splitting planes, this shouldn't matter much.

References

[1] Fast Ray Tracing Using K-D Trees. Fussell, Subramaniam.

[2] On building fast kd-Trees for Ray Tracing, and on doing that in O(N log N). Wald, Havran.

[3] Cache Sensitive Representation for BSP Trees. Havran.

Friday, May 29, 2015

Transition to blogger

I used to post things on tumblr, but I didn't like the format of the site very much. Here are a few posts I think are worth preserving links to:

Higher-order programming in C

Tumblr no longer displays the inline gists from this post. Here's the implementation of apply, and here's a demo.

Questions for Interviewers (followup here)
Hacking an FPS for fun but not profit

Thursday, May 28, 2015

Tagging pointers in Rust

I've been learning some Rust after hearing about their 1.0 release. I'm very impressed by the compiler, the documentation, and the language itself.

The first thing I wrote in Rust was a routine to tag pointers. (If you haven't heard of pointer tagging, it's just a way of storing information within a pointer. Pointers tend to be at least 32-bit aligned, leaving the two least-significant bits of the pointer free for tags.) It works by using Rust's unsafe-cast mechanism (std::mem::transmute) to do a little bit flipping:

fn tag_pointer<T>(p: &T, bit: u8) -> &T {
    unsafe {
        let pod: usize = mem::transmute(p);
        let pod = pod | (1 << bit);
        mem::transmute(pod)
    }
}

fn untag_pointer<T>(p: &T, bit: u8) -> &T {
    unsafe {
        let pod: usize = mem::transmute(p);
        let pod = pod & (!(1 << bit) as usize);
        mem::transmute(pod)
    }
}

fn check_pointer_tag<T>(p: &T, bit: u8) -> u8 {
    unsafe {
        let pod: usize = mem::transmute(p);
        ((pod >> bit) & 1) as u8
    }
}

To check that this works, we can do:

let p = &0; // Take the address of an integer.
let pn = tag_pointer(p, 0);
assert_eq!(check_pointer_tag(p, 0), 0);
assert_eq!(check_pointer_tag(pn, 0), 1);
assert_eq!(p, untag_pointer(pn, 0));

N.b: it isn't safe to dereference a tagged pointer :-). Here is a safe, alternate version which implements Deref and DerefMut.

This code isn't particularly tricky or interesting, but maybe it'll be useful to someone.