The demo can be seen at: https://hectorarellanodev.github.io/WebGLVoxelizer/

Calculating the distance field of a mesh in real time can be done using the jump flood algorithm (more information in this post: https://blog.demofox.org/2016/02/29/fast-voronoi-diagrams-and-distance-dield-textures-on-the-gpu-with-the-jump-flooding-algorithm/), the idea is to calculate the initial seeds for the jump flood passes voxelizing the geometry, then the jump flood is used to obtain a Voronoi diagram from those initial seeds, and finally a distance pass is done to evaluate the closest distance from each texel to the corresponding seed.

This can be done straightforward with compute shaders, but sadly WebGL only provides a rasterisation pipeline which makes things a little bit harder. The good part is that the voxelization is the only issue with the previous algorithm.

Voxelization can be done in a conservative mode using TressFX voxelization shader found at https://github.com/Unity-Technologies/com.unity.demoteam.mesh-to-sdf (thanks Douglas Lilliequist for guiding me to this repo) the process is done using a compute shader that evaluates the bounding box for each triangle of a mesh, then it calculates the voxels inside the bounding box and calculate the distance between those voxels and the corresponding triangle. Finally the distanced are saved inside a buffer using atomics to account for the minimum distance on each voxel.

The process works like a charm for small triangles, meaning that it provides very nice result for dense meshes, but its bottleneck comes down when it handles triangles that are too big in relationship to the voxelization space. The shader uses a 3 dimensional loop that can become very slow when there are too many voxels inside the bounding box of a triangle.

*Image 1. In 2D the small triangle requires few voxels to be evaluated while big triangle requires more voxels to be checked.*

The previous image shows the bottleneck of the algorithm; covering all the voxels for the small triangle in image 1 the voxelizer requires a 2 dimensional loop of 6×4 steps, which means that the kernel will evaluate up to 24 voxels, for the big triangle in image 1 the kernel will evaluate a 2 dimensional loop of 14×11, meaning that the same shader will need to check distances for 154 voxels.

For three dimensions the loops become much more expensive since there is an additional direction that increases the amount of voxels. Since the shader works really fast with small triangles it is better to have a mesh made of many small triangles than meshes made with a few big triangles, this keeps multi dimensional loops on check for each triangle to evaluate.

The TressFX shader provided a very nice initial route to implement the voxelization, but there are some caveats that needed to be solved before, these are:

– There are no compute shaders in WebGL.

– There are no storage buffers nor atomic operations in WebGL.

– There is no way to save information in more that one place in memory in the rasterisation pipeline, you can only update either a texel/fragment or an attribute memory position with the transformFeedback pass in WebGL. This issue can be seen in the following image.

*Image 2. A single triangle requires to update multiple fragments when voxelizing it.*

The first issue was not much of a problem, there are a lot of simulations using GPGU in WebGL which do show that it’s possible to simulate compute shaders under certain conditions, so that wouldn’t be much of an issue.

The atomics issue can be solved using the depth test, basically the idea is to use the distances as depth values, making the rasteriser to update fragments only when the distance found between a voxel and a triangle is smaller than previous distance saved. This way the algorithm would update the texels with the smallest distances.

The third part is the where real problems arise since the compute shader updates as many positions in the storage buffers as voxels are inside the bounding box of the triangle. GPGPU techniques would not be able to save different texels in the same draw call, and making multiple passes is out of the question since there could be big triangles that would require to update thousands of voxels. This issue put things on hold for a while.

**Some Months Later…**

…I had a conversation with Edan Kwan where he commented that voxelizing particles into a grid is actually straightforward, you only need to read the position of the particle and update the 3d grid which turned out to be really useful to implement voxel cone tracing effects over the particles.

The conversation got me thinking… what if we could treat triangles as particles, meaning what if we could provide the triangles information in a way that the triangles are so small that they fit inside the voxels? That would mean that the bounding box of each triangle would be no more bigger that the voxel containing it.

The idea, kind of absurd, would solve the third caveat, meaning that since triangles would be as small as each voxel then the algorithm would only update/draw the information of one voxel, the one that aligns somehow with the bounding box.

*Image 3. Splitting a triangle to make additional triangles as small as the voxels containing them.*

Then I remember another conversation I had Vicente Lucendo, we were discussing the idea of splitting triangles in the GPU, the concept is simple, take an input triangle, evaluate the middle point of each side generating three new vertices, and then return four triangles from the original vertices and the ones that are generated.

*Image 4. Four triangles are generated from the mid points of the sides of ABC.*

Suddenly the idea of making the triangles smaller, to the point that it would be like the size of voxels, was making more sense to me. That being said there were some new issues that needed to be addressed which are memory and performance. The algorithm is taking a mesh, splitting the triangles to a point where they are so small that would fit inside a voxel and then run the voxelization over those triangles, but how many triangles would be generated, shouldn’t that take too much performance or memory from the GPU?

Well… it turns out that GPUs do have a lot of processing power and memory these days, the main issue of this algorithm turns out to be the graphics API, right now GPUs can render scenes with hundreds of thousands of triangles in a mobile device, so it would not be a hardware issue. It is counter intuitive to generate more triangles to voxelize, but at least it would allow to use the rasteriser in WebGL.

The remaining issue arises from the fact that even if a triangle is as small as a voxel it does not mean that it always be inside a single voxel. The triangle could have one of its vertices inside another voxel and since the algorithm requires conservative rasterisation then it requires to update more than one voxel, taking back the issue of multiple voxels update.

At least each triangle would only affect the neighbourhood of the voxel containing most of the triangle, which means that if triangles are as small as voxels, it would only affect the adjacent voxels as a worst case scenario. This means that we could either run the voxelization pass on multiple passes to cover all the adjacent voxels for each triangle, or better yet, we could use instancing to solve the neighbourhood problem.

*Image 5. One triangle as small as a voxel can cover adjacent voxels (the ones in yellow), this can be solved traversing the neighbourhood using instances of the point used to voxelize.*

The voxelization also requires to scatter the data over the 3d texture which means that the voxelization needs to be done inside the vertex shader. Knowing that the algorithm needs to run in the vertex shader and the instancing solves the problem of the adjacent voxels then it is a matter of voxelizing following the next steps:

– The draw call launches a geometry made of points, sending as many points as triangles obtained after splitting the triangles from the original mesh.

– Each point would activate the vertex shader which would read the three vertices of the triangle that would be evaluated. The gl_VertexID can be used to obtain the positions of each vertex of the triangle from a texture.

– The three vertices are used to calculate the average position of the triangle, which define the center voxel where the distance is evaluated. That would be the green voxel from image 5.

– The adjacent voxels (the blue ones in image 5) can be obtained checking the neighbourhood from the center voxel, this can be done using the instances which are defined with gl_InstanceID, for 3D one voxel is surrounded with 26 adjacent voxels, so the draw call can be done with 27 instances, where each instance represents one adjacent voxel. Each instance would evaluate its offset from the center voxel and calculate the corresponding distance from the evaluated triangle.

– WebGL does not allow to update a 3d texture directly, so the algorithm updates a 2d texture that simulates a 3d texture saving the data using the rasteriser, this also allows the depth test provide the same result as atomics in the compute shader. Finally the texture position in 2D is calculated from the 3d position of each voxel evaluated.

**Splitting Triangles**

The remaining part of the voxelization algorithm is how to split the triangles. The first approach would be to use the GPU, this way all the information is already allocated into a texture, but WebGL does not provide geometry shaders; instead the splitting pass could preallocate memory predicting the amount of passes required to have the desired threshold for all the triangles.

The issue is that not all the triangles have the same size inside a mesh, some triangles can be bigger, much bigger than others, and it would make the splitting consume too much memory, it would also generate too many triangles in some scenarios (when a mesh is created with big and small triangles).

The best solution is to split the triangles in the CPU, creating new triangles from each previous one until a certain threshold is met. If the resulting triangles are below the threshold they don’t need to be split again and can be left as such in the buffer. This would provide a buffer with organised triangles and no unused memory in between them.

Workers are used to split the triangles where the triangle splitter performs a final optimisation useful to avoid generating too many triangles. The initial split separates triangles that are below a second threshold, which is used to define what the algorithm considers “big triangles”.

**Big Triangles**

Big triangles are triangles that, if split, would generate too many triangles to satisfy the voxel size threshold condition. This can happen when trying to voxelize meshes like planes made of two triangles (like floors). The resulting splitting would generate thousands of small triangles where the original information requires just two evaluate two original triangles.

Big triangles are treated different in the voxelization algorithm, they are separated into a different buffer once they are found, and they are not split but instead are evaluated into a different shader.

The new shader is executed inside a fragment shader, evaluating the 3D grid texture (remember that is simulated with a 2d texture so the rasteriser can be used) as a whole and evaluating the distance field of each of the big triangles for every voxel present in the texture as a fragment. When voxelizing a plane of two triangles the algorithm evaluates the distance field of a triangle two times and saves the minimum distance of the triangles passed as uniforms. The shader can receive up to 64 big triangles.

With this final update the whole process separates the voxelization into different passes:

– Triangle splitting of the original mesh to satisfy voxel size condition.

– Separate the big triangles from the original triangles into a different buffer.

– Evaluate the voxelization of the split triangles using the voxelization shader in the vertex shader.

– Evaluate the distance field of all the big triangles using a fragment shader, updating the resulting 2D texture from the previous pass, the depth test is used to update the voxels with the minimum distance between the original voxelization and the distance from the pass of the big triangles shader.

The best part is the whole algorithm only needs to split the triangles once, saving them into textures which can be then re voxelized in world space passing the model matrix of the mesh, this is useful for techniques like voxel cone tracing, that requires voxelization in real time.

The code for the demo can be found in the following URL:

The code for the triangle splitter can be found inside the triangleSplitter.js file, the voxelization can be read inside the vs-voxelizer.js file and the big triangles shader can be found inside the BigTriangles.js file.