If you are going down the voxel way, odds are at some point you will need to include polygonal assets in your workflow.
For me it happened when I integrated the output of the architecture L-System into the voxel world. Architecture is produced as a collection of polygonal volumes, I needed to translate those volumes into voxels.
As usual, it had to be fast. It also needed to be accurate, Sharp features in the original polygon mesh should appear in the voxelized version.
There are several approaches to voxelization. One is to use the GPU to render slices of the polygonal object and then construct the voxel information out of the slices. Another way to do it is to compute a 3D distance field, that is, the distance from each point in space to the closest polygon surface. In GPU Gems 3 there is a nice implementation of this method. And then there is the classical method, which is to shoot rays to the polygonal solid and construct the voxels from the intersections.
I chose to implement a flavor of the classical method. The clean intersection points it produced would help later when reconstructing the surface. I also suspected the ray-tracing nature of this approach would translate well into OpenCL or CUDA.
In particular my method is inspired on this one (PDF link). The article is very detailed, but if you want to skip reading it here is the basic idea:
It uses a regular grid. Each element in the grid is a voxel. One of the coordinate planes is used to build a Quad-tree. All the triangles in the polygonal mesh are projected into this plane and inserted in the Quad-tree. I stole an image from the article above that illustrates this phase:
In this image all the triangles are indexed in a Quad-tree aligned to the XZ plane (assuming Y goes up).
The next step is to shoot rays perpendicular to the Quad-tree plane. Each ray is intersected with all the triangles found in that cell of the Quad-tree. This is actually why a Quad-tree is used. It allows to test only the triangles that potentially intersect the ray.
Each ray may intersect several triangles along its path. For each voxel that is visited by the ray, the algorithm must find a way to tell whether it is inside or outside the solid. This is actually simple. If we assume the polygonal volume is closed, then it is possible to count how many intersections we passed before getting to the current voxel. If the number is odd, the voxel is inside the volume, if it is even, it is outside.
This other image, also from the same article, shows this principle:
And that's it. Once you have sent enough rays to cover the grid, the voxelized solid is there.
This method, however, could not produce the results I needed. It only tells you if a voxel is solid or not. I needed a 3D density field. Also I wanted it to preserve the sharp corners in the original mesh, which is something this method doesn't consider. And last but not least, this algorithm is not very fast. Ray-casting on CPU quickly becomes expensive as grid resolution goes up.
This is what I did:
First, instead of having just one Quad-tree, I built three of them. One for each coordinate plane. I realized I had to cast rays along the three main axis instead of just one as the original method does.
Why? The original method only cares about voxel occupancy. In my case I needed to record the normals at the intersection points. If rays are shot only in the Y direction, any polygons perpendicular to the plane XZ would never be intersected, their normals would be unknown. And it is not only that. For a sharp edge to be reconstructed, you need at least two intersections inside a voxel. To reconstruct a vertex, you need three rays.
The following screenshot shows how sharp features are reconstructed. This is a cube that is not aligned with any of the coordinate axis, still its sharp features are preserved.
The Quad-trees are built on the CPU as a pre-processing phase. It takes less than 1% of the entire cost of the voxelization, so it is something I won't move into the GPU soon.
Next, the OpenCL voxelization kernel runs. Each instance of the kernel processes a single ray. Before invoking the kernel I make sure all the rays are packaged in sequence, regardless of their direction.
The ray has access to the corresponding Quad-tree so it can iterate through the list of triangles in that cell of the Quad-tree. For each triangle it will test if there is an intersection. If that is the case, it writes the coordinates to an array.
Then comes a twist. The voxelization method I described before relies on counting how many intersections were found up to the current voxel. For this to work, the list of intersections needs to be sorted. Sorting the intersections in the CPU is no problem, but inside the OpenCL kernel it could get messy very quickly. How to get around this?
I realized that it was still possible to determine if a voxel was inside or outside without having to sort the intersections. The trick is to "flip" the occupancy state of all voxels preceding the current intersection point. If the count is odd, the voxel will flip an odd number of times and it will remain set at the end. If the counter is even, the voxel will finish empty.
You can see this in the following series of images. The intersection points are marked in red, and as you can see they arrive in any order:
I'm satisfied with the speed of this method. Anyway there are some optimizations I could do. If some restrictions are imposed on the polygonal solids, it will be possible to insert triangles already sorted in the Quad-tree. This means the OpenCL rasterization only will need to do one pass per ray. Right now each intersection requires its own pass.
But I'm still on the look for a faster voxelization method. If you know of something better please let me know.