### GPU Marching Cubes from Point Clouds in WebGL (part 2: marching cubes steps)

15, Jan 2017/Categories 3D, Sin categoría, webGL/No Comments

The author assumes that the reader understands the implementation of the marching cubes algorithm in the CPU, also has knowledge on how to perform a stream compaction over a texture using histopyramids. Please refer to http://paulbourke.net/geometry/polygonise/ and http://www.miaumiau.cat/2016/10/stream-compaction-in-webgl/ for more information about these subjects.

Marching Cubes Steps:

The marching cubes algorithm can be separated in two steps which are the active voxels allocation and the vertices/normals generation. The first step uses a iso level value that defines a surface from the potential, where the active voxels are the ones that are intersected by the corresponding generated surface. To check the intersections all the edges from each voxel are evaluated.

An intersection is obtained if the potential value from one of the corners of the edge is below than the iso level and the other corner’s potential value is above. The different intersections represent one of the 256 possible combinations defined in the marching cubes implementation. These combinations, expressed in tables, help to know the total amount of vertices to generate, and the edges where these vertices would be allocated.

The second step is responsible to generate the vertices positions and normals in the active voxels, this is done taking into account the quantity of vertices/triangles required for the case obtained in the previous step. This pass allocates the edge where the the vertex will be placed, and sets the vertex position with a linear interpolation between the two corners from the edge using the iso level as an interpolator.

To apply the previous two steps explained it’s important to translate the different tables in the implementation from Paul Bourke into two textures.

The first texture saves the 15 possible vertices to generate for each of the 256 combinations presented in the marching cubes, the data can be obtained from the “triTable” found here. This two dimensional data table is expanded in a one dimensional array containing 3840 values (notice that 16th value from each dimension is always -1, so this last one is always discarded), and then allocated inside a 64×64 2D floating point texture. Each fragment saves the corresponding edge where the vertex would be generated in the active voxel where values with -1 represent that no vertices need to be created for that edge. The resulting texture is shown below.

The previous texture is a color encoded visualization from the original floating point texture, each square represents a position where the color defines the edge where the vertex will be allocated. Notice that every 15 squares a new combination is defined, and for each combination the colored squares represent the vertices to generate, and the black ones the “-1” values meaning that there’s no more data needed for that combination.

The second texture to create is also based from the “triTable” used before, but with a different interpretation, the idea is to save the amount to triangles to generate on each one of the cases, these 256 values are saved inside a 16×16 2D floating point texture where all the channels contain the same value; the amount of triangles to generate for each case can be obtained counting the values that are different from -1 and dividing the result by three. The following texture is shows the visual representation of the data.

Active Voxels Evaluation:

With the two previous textures it’s time to evaluate the active voxels, to do so the 3D expanded texture is used to check a condition where each corner of the voxels is tested to evaluate if the corresponding value is below the iso level required. The implementation from Paul Bourke uses bit masking to define which case is used based on a 12 bit number; in the current implementation, a floating point sum is used to represent the case that defines the triangles to generate since GLSL doesn’t allow bit manipulation. The process is done drawing a quad where the fragment shader evaluates all the voxels defined in the 2D texture to evaluate, doing so the whole 3D voxel space is evaluated at once. The following code executes the active voxels search.

precision mediump float;
precision mediump sampler2D;
//Texture container the potential values in the corners of the voxels.
uniform sampler2D uCornersTexture;
//Texture holding the numbers of triangles to generate on each of the 256 cases from the MC
uniform sampler2D uTrianglesIndex;
//Iso level used to define the surface required from the potential.
uniform float uRange;
varying vec2 vText;
//Constants used to simulate the 3D texture in a 2D texture
const vec4 C_3D = vec4(1. / 4096., 256., 16., 1. / 16.);
//Function used to evaluate the 2D index from a 3D position.
vec2 index2D(vec3 pos) {

return C_3D.x * (pos.xz + C_3D.y * vec2(mod(pos.y, C_3D.z), floor(pos.y * C_3D.w)) + vec2(0.5));

}
void main(void) {

//Obtain the 3D voxel position of the corresponding fragment to evaluate.
vec2 position = floor(vText / C_3D.x);
vec3 pos3D = vec3(mod(position.y, C_3D.y), C_3D.z * floor(position.y / C_3D.y) + floor(position.x / C_3D.y), mod(position.x, C_3D.y));
//The MC case to use in the voxel evaluated is calculated as the sum of corners that are below the iso level required.
float c = step(texture2D(uCornersTexture, index2D(pos3D)).r, uRange)
+ 2. * step( texture2D(uCornersTexture, index2D(pos3D + vec3(1., 0., 0.))).r, uRange)
+ 4. * step( texture2D(uCornersTexture, index2D(pos3D + vec3(1., 1., 0.))).r, uRange)
+ 8. * step( texture2D(uCornersTexture, index2D(pos3D + vec3(0., 1., 0.))).r, uRange)
+ 16. * step( texture2D(uCornersTexture, index2D(pos3D + vec3(0., 0., 1.))).r, uRange)
+ 32. * step( texture2D(uCornersTexture, index2D(pos3D + vec3(1., 0., 1.))).r, uRange)
+ 64. * step( texture2D(uCornersTexture, index2D(pos3D + vec3(1., 1., 1.))).r, uRange)
+ 128. * step( texture2D(uCornersTexture, index2D(pos3D + vec3(0., 1., 1.))).r, uRange);
c *= step(c, 254.);
//The total triangles to generate are obtained knowing which one of the 256 cases is required and reading the
//amount triangles from the 16×16 texture provided.
float totalTrianglesToGenerate = texture2D(uTrianglesIndex, vec2(mod(c, 16.), floor(c / 16.)) / 16.).r;
//The resulting fragment saves the amount of triangles to generate and the MC case obtained.
gl_FragColor = vec4(vec3(totalTrianglesToGenerate * 3.), c);

}

The previous shader works over all the fragments generated on the quad, where its UV position is used as a key to obtain the 3D position of the voxel simulated in the texture 2D, this is done using the data from the C_3D constants previously explained.

The sampler uTrianglesIndex is the second texture commented at the beginning of the post. Notice that the indexing is based on the actual marching cubes case obtained from the corresponding voxel/fragment. The variable “uCornersTexture” represents the values of each voxel’s corners potential saved in a texture3D, this is generated from the point cloud animation and it’s evaluated against a iso level defined with the uniform named uRange.

The shader starts finding the 3D position for each fragment, then it evaluates the corresponding corners and checks if the value is below the required iso level (uRange), if the test passes the case variable is modified to finally define which case from the 256 combinations is the one to implement, this is saved in the “c” variable. Also notice that since the case 0 and 255 represent the case where all the vertices are below or over the iso level no triangles will be generated, hence the C value is defined as zero.

Once the marching cubes case is defined the uTrianglesIndex sampler is used to obtain the total amount of vertices to generate for the corresponding voxel. Finally the output of the shader is generated encoding the amount of triangles in the RGB channel and the marching cubes case in the alpha channel. The output of this program is a texture3D which has the active voxels scattered on a 2D texture, this means that a stream compaction pass should be applied to the output of this shader to calculate the corresponding vertices only in the active cells. The result of this step can be seen in the following image.

The same shader can me modified to write a second debugging texture that shows the amount of triangles allocated per voxel, this would give a visual representation of how things are working in the first step. The idea is to use multiple render targets and rewrite the output the shader as follows:

//The resulting fragment saves the amount of triangles to generate and the MC case obtained.
gl_FragData[0] = vec4(vec3(totalTrianglesToGenerate * 3.), c);

//Color implementation to check the amount of triangles required inside each voxel
gl_FragData[1] = vec4(0.);
if(totalTrianglesToGenerate == 1.) gl_FragData[1] = vec4(0., 0., 1., 1.);
if(totalTrianglesToGenerate == 2.) gl_FragData[1] = vec4(0., 1., 0., 1.);
if(totalTrianglesToGenerate == 3.) gl_FragData[1] = vec4(1., 0., 0., 1.);
if(totalTrianglesToGenerate == 4.) gl_FragData[1] = vec4(0., 1., 1., 1.);
if(totalTrianglesToGenerate == 5.) gl_FragData[1] = vec4(1., 0., 1., 1.);

The second output represent a color visualization of the amount of triangle to generate, color convention is defined below:

– blue: one triangle.
– green: two triangles.
– red: three triangles.
– blue + green (aquamarine): four triangles.
– blue + red (purple): five triangles.

The image below shows the visual representation obtained from the active cells step, notice that most of the cells require two triangles per voxel (green color), meaning that the algorithm would be calculating a median of six vertices per active voxel.

Stream Compaction and vertices generation:

Once the active voxels are allocated and each case is defined for them, a stream compaction method should be applied to the resulting texture avoiding the generation of triangles over non active voxels. To do so the current implementation uses the expansion ability of the histopyramids allowing to allocate offsets of one particular value in the output stream during the compaction process.

The compaction process is partially based on this post. In the implementation explained in the previous article one initial base texture is generated from the data texture to compress. This base texture is based on a binary image where each position represent where the data is allocated. Inserting the binary texture in the reduction process the algorithm returns the amount of data (values) in the texture.

Instead of using a binary texture, the base texture would be the result from the active voxels step, where each “active “ fragment represents the amount of vertices required for each active voxel. Using this texture in the reduction process allows to know the total vertices required to generate.

Once the reduction process obtains the total amount of vertices, the compaction reorganize the texture from the active voxels step, generating contiguous offsets of the same voxel information as many times as vertices are defined for each voxel. The following image shows the difference between a compaction of a binary texture and the texture obtained from the active voxels step.

Since the compaction process also returns a unique key for each offset, the result represents the memory required to allocate the amount of vertices to generate, arranged by voxels and indexed by local voxels offsets. This means that the same shader can be used the generate the vertices and normals using the marching cubes algorithm.

The new compaction/expansion and vertices/normals shader is shown below.

#extension GL_EXT_draw_buffers : require
precision highp float;
precision highp sampler2D;
//Pyramid texture containing all the reduction steps.
uniform sampler2D uPyramid;
//Active voxels texture.
uniform sampler2D uBase;
//Potential corners values.
uniform sampler2D uPot;
//triTable data from Paul Bourke 64×64 texture
uniform sampler2D uTrianglesIndexes;
//level 0 of the pyramid.
uniform sampler2D uTotal;
//Uniform used to defined normals quality
uniform bool uFastNormals;
//iso level used to define the surface from the potential
uniform float uRange;
varying vec2 vText;
const float EPSILON = 0.00001;
//Inverse of the size from the 3D texture 256^3
const float p = 1. / 256.;
//Corners for any given voxel.
const vec3 p0 = vec3(p, 0., 0.);
const vec3 p1 = vec3(p, p, 0.);
const vec3 p2 = vec3(0., p, 0.);
const vec3 p3 = vec3(0., 0., p);
const vec3 p4 = vec3(p, 0., p);
const vec3 p5 = vec3(p, p, p);
const vec3 p6 = vec3(0., p, p);
//Constants used to simulate the 3D texture
const vec4 C3D = vec4(4096., 256., 16., 1. / 16.);
//Calculate the 2D index to read the data in the 2D texture based in a 3D position
vec2 index2D(vec3 pos) {

return (pos.xz + C3D.y * vec2(mod(pos.y, C3D.z), floor(pos.y / C3D.z)) + vec2(0.5)) / C3D.x;

}
void main(void) {

//Evaluate the 1D index of the fragment evaluated.
float vI = dot(floor(C3D.x * vText), vec2(1., C3D.x));
//If the fragment’s key is higher than the total of vertices needed to create the execution is halted.
else {

/*
*This is the compaction process, it’s explained in http://www.miaumiau.cat/2016/10/stream-compaction-in-webgl/
*/
float offset = C3D.x – 2.;
float k = 1. / C3D.x;
vec2 relativePosition = k * vec2(offset, 0.);
vec4 partialSums = texture2D(uPyramid, relativePosition);
float start = 0.;
vec4 starts = vec4(0.);
vec4 ends = vec4(0.);
float div0 = 1. / C3D.y;
float diff = 2.;
vec4 m = vec4(0.);
vec2 position = vec2(0.);
vec4 vI4 = vec4(vI);
//12 steps are required to parse the different levels of the pyramid.
for(int i = 1; i < 12; i++) {
offset -= diff;
diff *= 2.;
relativePosition = position + k * vec2(offset, 0.);
ends = partialSums.wzyx + vec4(start);
starts = vec4(start, ends.xyz);
m = vec4(greaterThanEqual(vI4, starts)) * vec4(lessThan(vI4, ends));
relativePosition += m.y * vec2(k, 0.) + m.z * vec2(0., k) + m.w * vec2(k, k);
start = dot(m, starts);
position = 2. * (relativePosition – k * vec2(offset, 0.));
partialSums = texture2D(uPyramid, relativePosition);

}
ends = partialSums.wzyx + vec4(start);
starts = vec4(start, ends.xyz);
m = vec4(greaterThanEqual(vI4, starts)) * vec4(lessThan(vI4, ends));
position += m.y * vec2(k, 0.) + m.z * vec2(0., k) + m.w * vec2(k, k);

/*
* MARCHING CUBES TO GENERATE THE VERTICES
* POSITIONS AND NORMALS
*/
//This data contains the 2D position of the voxel reallocated, the index offset for the vertex to generate in the corresponding voxel
//and the MC case used for that voxel.
vec4 data = vec4(position * C3D.x, vI – dot(m, starts), texture2D(uBase, position).a);
//Up to 15 vertices can be generated per voxel, the current vertex to generate is saved in this variable
float currentVertex = data.b;
//Calculate the 3D position of the voxel based on the 2D position in the scattered data.
data.xyz = p * vec3(mod(data.y, C3D.y), C3D.z * floor(data.y * p) + floor(data.x * p), mod(data.x, C3D.y));
//Obtain the one dimensional index to read the corresponding edge to use.
float mcIndex = 15. * data.a + currentVertex;
//Obtain the edge to use from the voxel using the previous index and reading the data from the triTable texture.
vec4 mcData = texture2D(uTrianglesIndexes, vec2(mod(mcIndex, 64.), floor(mcIndex / 64.)) / 64.);
mcIndex = mcData.r;
//To obtain the two points that define the edge the original implementation uses a set of if conditionals.
//The shader makes a sum of all the corners using masks to discard the values that are not needed.
vec4 m0 = vec4(mcIndex);
//Check if values are in the edge
vec4 m1 = vec4(equal(m0, vec4(0., 1., 2., 3.)));
vec4 m2 = vec4(equal(m0, vec4(4., 5., 6., 7.)));
vec4 m3 = vec4(equal(m0, vec4(8., 9., 10., 11.)));
//The two points are the voxel position plus the point active using the mask calculated before.
vec3 b0 = data.rgb + m1.y * p0 + m1.z * p1 + m1.w * p2 + m2.x * p3 + m2.y * p4 + m2.z * p5 + m2.w * p6 + m3.y * p0 + m3.z * p1 + m3.w * p2;
vec3 b1 = data.rgb + m1.x * p0 + m1.y * p1 + m1.z * p2 + m2.x * p4 + m2.y * p5 + m2.z * p6 + m2.w * p3 + m3.x * p3 + m3.y * p4 + m3.z * p5 + m3.w * p6;
//Potential values in the corresponding corners
vec4 n0 = texture2D(uPot, index2D(C3D.y * b0));
vec4 n1 = texture2D(uPot, index2D(C3D.y * b1));
vec2 diff1 = vec2(uRange – n0.a, n1.a – n0.a);
//Value used to evaluate the linear interpolation between the two corners points to define the position of the vertex to generate.
vec3 mult = vec3(lessThan(abs(vec3(diff1.x, uRange – n1.a, -diff1.y)), vec3(0.)));
vec3 normalA = vec3(0.);
vec3 normalB = vec3(0.);
/*
*or applying a 3D Sobel kernel.
*/
if(uFastNormals) {

vec2 deltaX = index2D(C3D.y * (b0 + vec3(p, 0., 0.)));
vec2 deltaY = index2D(C3D.y * (b0 + vec3(0., p, 0.)));
vec2 deltaZ = index2D(C3D.y * (b0 + vec3(0., 0., p)));
normalA = normalize(-vec3(n0.a – texture2D(uPot, deltaX).a, n0.a – texture2D(uPot, deltaY).a, n0.a – texture2D(uPot, deltaZ).a));

deltaX = index2D(C3D.y * (b1 + vec3(p, 0., 0.)));
deltaY = index2D(C3D.y * (b1 + vec3(0., p, 0.)));
deltaZ = index2D(C3D.y * (b1 + vec3(0., 0., p)));
normalB = normalize(-vec3(n1.a – texture2D(uPot, deltaX).a, n1.a – texture2D(uPot, deltaY).a, n1.a – texture2D(uPot, deltaZ).a));

} else {

//If more smooth gradients are required, a higher order Sobel operator is used to calculate them.
//this gives a more smoothed surface at the expense of less performance.
float op = 1.;
float scaler = 1.;
vec3 S1A_X = vec3(1., op, 1.);
vec3 S2A_X = vec3(op, scaler * op, op);
vec3 S3A_X = vec3(1., op, 1.);
vec3 S1B_X = vec3(0.);
vec3 S2B_X = vec3(0.);
vec3 S3B_X = vec3(0.);
vec3 S1C_X = vec3(-1., -op, -1.);
vec3 S2C_X = vec3(-op, -scaler * op, -op);
vec3 S3C_X = vec3(-1., -op, -1.);
vec3 S1A_Y = vec3(1., op, 1.);
vec3 S2A_Y = vec3(0., 0., 0.);
vec3 S3A_Y = vec3(-1., -op, -1.);
vec3 S1B_Y = vec3(op, scaler * op, op);
vec3 S2B_Y = vec3(0., 0., 0.);
vec3 S3B_Y = vec3(-op, -scaler * op, -op);
vec3 S1A_Z = vec3(-1., 0., 1.);
vec3 S2A_Z = vec3(-op, 0., op);
vec3 S3A_Z = vec3(-1., 0., 1.);
vec3 S1B_Z = vec3(-op, 0., op);
vec3 S2B_Z = vec3(-scaler * op, 0., scaler * op);
vec3 S3B_Z = vec3(-op, 0., op);
vec3 f1A = vec3(texture2D(uPot, index2D(C3D.y * (b0 + vec3(-p, p, p)))).a, texture2D(uPot, index2D(C3D.y * (b0 + vec3(0., p, p)))).a, texture2D(uPot, index2D(C3D.y * (b0 + vec3(p, p, p)))).a);
vec3 f2A = vec3(texture2D(uPot, index2D(C3D.y * (b0 + vec3(-p, 0., p)))).a, texture2D(uPot, index2D(C3D.y * (b0 + vec3(0., 0., p)))).a, texture2D(uPot, index2D(C3D.y * (b0 + vec3(p, 0., p)))).a);
vec3 f3A = vec3(texture2D(uPot, index2D(C3D.y * (b0 + vec3(-p, -p, p)))).a, texture2D(uPot, index2D(C3D.y * (b0 + vec3(0., -p, p)))).a, texture2D(uPot, index2D(C3D.y * (b0 + vec3(p, -p, p)))).a);
vec3 f1B = vec3(texture2D(uPot, index2D(C3D.y * (b0 + vec3(-p, p, 0.)))).a, texture2D(uPot, index2D(C3D.y * (b0 + vec3(0., p, 0.)))).a, texture2D(uPot, index2D(C3D.y * (b0 + vec3(p, p, 0.)))).a);
vec3 f2B = vec3(texture2D(uPot, index2D(C3D.y * (b0 + vec3(-p, 0., 0.)))).a, texture2D(uPot, index2D(C3D.y * (b0 + vec3(0., 0., 0.)))).a, texture2D(uPot, index2D(C3D.y * (b0 + vec3(p, 0., 0.)))).a);
vec3 f3B = vec3(texture2D(uPot, index2D(C3D.y * (b0 + vec3(-p, -p, 0.)))).a, texture2D(uPot, index2D(C3D.y * (b0 + vec3(0., -p, 0.)))).a, texture2D(uPot, index2D(C3D.y * (b0 + vec3(p, -p, 0.)))).a);
vec3 f1C = vec3(texture2D(uPot, index2D(C3D.y * (b0 + vec3(-p, p, -p)))).a, texture2D(uPot, index2D(C3D.y * (b0 + vec3(0., p, -p)))).a, texture2D(uPot, index2D(C3D.y * (b0 + vec3(p, p, -p)))).a);
vec3 f2C = vec3(texture2D(uPot, index2D(C3D.y * (b0 + vec3(-p, 0., -p)))).a, texture2D(uPot, index2D(C3D.y * (b0 + vec3(0., 0., -p)))).a, texture2D(uPot, index2D(C3D.y * (b0 + vec3(p, 0., -p)))).a);
vec3 f3C = vec3(texture2D(uPot, index2D(C3D.y * (b0 + vec3(-p, -p, -p)))).a, texture2D(uPot, index2D(C3D.y * (b0 + vec3(0., -p, -p)))).a, texture2D(uPot, index2D(C3D.y * (b0 + vec3(p, -p, -p)))).a);
float xGradient = dot(f1A, S1A_X) + dot(f2A, S2A_X) + dot(f3A, S3A_X) + dot(f1B, S1B_X) + dot(f2B, S2B_X) + dot(f3B, S3B_X) + dot(f1C, S1C_X) + dot(f2C, S2C_X) + dot(f3C, S3C_X);
float yGradient = dot(f1A, S1A_Y) + dot(f2A, S2A_Y) + dot(f3A, S3A_Y) + dot(f1B, S1B_Y) + dot(f2B, S2B_Y) + dot(f3B, S3B_Y) + dot(f1C, S1A_Y) + dot(f2C, S2A_Y) + dot(f3C, S3A_Y);
float zGradient = dot(f1A, S1A_Z) + dot(f2A, S2A_Z) + dot(f3A, S3A_Z) + dot(f1B, S1B_Z) + dot(f2B, S2B_Z) + dot(f3B, S3B_Z) + dot(f1C, S1A_Z) + dot(f2C, S2A_Z) + dot(f3C, S3A_Z);
f1A = vec3(texture2D(uPot, index2D(C3D.y * (b1 + vec3(-p, p, p)))).a, texture2D(uPot, index2D(C3D.y * (b1 + vec3(0., p, p)))).a, texture2D(uPot, index2D(C3D.y * (b1 + vec3(p, p, p)))).a);
f2A = vec3(texture2D(uPot, index2D(C3D.y * (b1 + vec3(-p, 0., p)))).a, texture2D(uPot, index2D(C3D.y * (b1 + vec3(0., 0., p)))).a, texture2D(uPot, index2D(C3D.y * (b1 + vec3(p, 0., p)))).a);
f3A = vec3(texture2D(uPot, index2D(C3D.y * (b1 + vec3(-p, -p, p)))).a, texture2D(uPot, index2D(C3D.y * (b1 + vec3(0., -p, p)))).a, texture2D(uPot, index2D(C3D.y * (b1 + vec3(p, -p, p)))).a);
f1B = vec3(texture2D(uPot, index2D(C3D.y * (b1 + vec3(-p, p, 0.)))).a, texture2D(uPot, index2D(C3D.y * (b1 + vec3(0., p, 0.)))).a, texture2D(uPot, index2D(C3D.y * (b1 + vec3(p, p, 0.)))).a);
f2B = vec3(texture2D(uPot, index2D(C3D.y * (b1 + vec3(-p, 0., 0.)))).a, texture2D(uPot, index2D(C3D.y * (b1 + vec3(0., 0., 0.)))).a, texture2D(uPot, index2D(C3D.y * (b1 + vec3(p, 0., 0.)))).a);
f3B = vec3(texture2D(uPot, index2D(C3D.y * (b1 + vec3(-p, -p, 0.)))).a, texture2D(uPot, index2D(C3D.y * (b1 + vec3(0., -p, 0.)))).a, texture2D(uPot, index2D(C3D.y * (b1 + vec3(p, -p, 0.)))).a);
f1C = vec3(texture2D(uPot, index2D(C3D.y * (b1 + vec3(-p, p, -p)))).a, texture2D(uPot, index2D(C3D.y * (b1 + vec3(0., p, -p)))).a, texture2D(uPot, index2D(C3D.y * (b1 + vec3(p, p, -p)))).a);
f2C = vec3(texture2D(uPot, index2D(C3D.y * (b1 + vec3(-p, 0., -p)))).a, texture2D(uPot, index2D(C3D.y * (b1 + vec3(0., 0., -p)))).a, texture2D(uPot, index2D(C3D.y * (b1 + vec3(p, 0., -p)))).a);
f3C = vec3(texture2D(uPot, index2D(C3D.y * (b1 + vec3(-p, -p, -p)))).a, texture2D(uPot, index2D(C3D.y * (b1 + vec3(0., -p, -p)))).a, texture2D(uPot, index2D(C3D.y * (b1 + vec3(p, -p, -p)))).a);
xGradient = dot(f1A, S1A_X) + dot(f2A, S2A_X) + dot(f3A, S3A_X) + dot(f1B, S1B_X) + dot(f2B, S2B_X) + dot(f3B, S3B_X) + dot(f1C, S1C_X) + dot(f2C, S2C_X) + dot(f3C, S3C_X);
yGradient = dot(f1A, S1A_Y) + dot(f2A, S2A_Y) + dot(f3A, S3A_Y) + dot(f1B, S1B_Y) + dot(f2B, S2B_Y) + dot(f3B, S3B_Y) + dot(f1C, S1A_Y) + dot(f2C, S2A_Y) + dot(f3C, S3A_Y);
zGradient = dot(f1A, S1A_Z) + dot(f2A, S2A_Z) + dot(f3A, S3A_Z) + dot(f1B, S1B_Z) + dot(f2B, S2B_Z) + dot(f3B, S3B_Z) + dot(f1C, S1A_Z) + dot(f2C, S2A_Z) + dot(f3C, S3A_Z);

}

//Save the vertex position, uses a linear interpolation between the two corners points of the edge and the iso value required.
gl_FragData[0] = vec4(mult.x * b0 + mult.y * b1 + mult.z * b0 + (1. – dot(mult, mult)) * mix(b0, b1, (diff1.x) / (diff1.y)), mcData.b);
//Save the vertex normal, calculate as the ponderated median of the two normals from the corners.
gl_FragData[1] = vec4(normalize(- (n0.a * normalA + n1.a * normalB) / max(n0.a + n1.a, EPSILON)), mcIndex);

}

}

This shader is divided into two steps, the vertices compaction/expansion process (highlighted in blue) and the vertex position and normal generation for each one of the vertices reallocated (highlighted in green). This fragment shader is also run over a quad that represents the 2D representation of the 3D voxel space. The uniforms for the programs are explained below.

uPyramid: input texture that contains all the different levels of the pyramid generation, is created while doing the reduction process of the histopyramids.

uBase: input texture containing the active cells result from the previous step.

uPot: input texture containing the potential values of the corners from the voxels of the 3D world, calculated in the corners step.

uTrianglesIndexes: input texture containing the edges where the vertices will be allocated, it’s based on the triTable from the Marching Cubes implementation of Paul Bourke.

uTotal: input texture of 1×1 containing the total amount of vertices to generate, calculated during the reduction process, this data could also be read using the uPyramid texture.

uFastNormals: input boolean value used to define if the normals are calculated using forward differences or a Sobel kernel.

uRange: iso level used to define the surface required from the potential.

The shader starts evaluating if the one dimensional index key of the fragment to evaluate is lower than the total amount of vertices required, if it’s higher the vertex is discarded. This is done to avoid useless calculations over the quad. If the fragment’s key is below the shader keeps the execution.

The compaction process reallocate the vertices needed for each active voxels in the fragments that pass the first condition, it uses the masks explained in here using the optimizations for the reduction process and the compaction step.

The result for this first step is the corresponding 2D position of the voxel allocated in the active voxels texture, this 2D position will be repeated as many times as vertices are needed for the corresponding voxel in the subsequent fragments. Once this position is obtained the shader continues to the second part, the positions and normals generation.

The second part of the shader starts calculating the 3D position of the voxel based on the 2D position from the input texture, and it reads the marching cubes combination obtained in the active voxels step, finally it calculates the key offset (vertex number) of the fragment relative to the corresponding voxel.

With those values calculated the next step is to obtain the corresponding edge where the vertex will be allocated, this is done calculating an index used to read the data from the triTable texture containing the edges. The index is saved in the mcIndex variable, and the corresponding edge saved inside the variable called mcData.

Paul Bourke’s original implementation uses a set of if conditionals to check which edge to use depending on the vertex to calculate inside the voxel. In the current implementation the shader uses masks to discard the edges that are not required inside sums of all the possible corners and the position of the voxel. These sums returns the positions of the edge’s corners saved in the variables b0 and b1.

The corresponding potential values for these corners are read in the corners texture using b0 and b1 as input indexes, the results are saved in the n1 and n1 variables, then two more variables “diff1” and “mult” are saved to perform the linear interpolation that handles the position of the vertex over the edge.

With the gradients calculated and the potentials of the corners obtained, the vertex normal and position is evaluated as a linear interpolation of the corner’s positions using the corresponding potentials datas and the iso surface value to make the interpolation. The shader outputs two textures with the compacted data for positions and normals shown below.

Vertex positions generated from the last step.

Vertex normals generated from the last step, encoded in the [0-1] space for visualization.

Notice that the current implementation recalculates vertices for adjacent triangles, meaning that much of the information from the last two textures is repeated. In order to implement the third method commented in http://http.developer.nvidia.com/GPUGems3/gpugems3_ch01.html a second histopyramid would be required.

The idea is to compact the information containing only the edges 0, 3 and 8 from the active cells step (with the second histopyramid) and calculate the corresponding data (vertex positions and normals for 0, 3 and 8 edges) with the last shader explained. The indexes can be calculated associating the original texture obtained in the active cells step, and the one with the 0, 3 and 8 edges.

The output of this implementation would be the positions, normals and indexes textures.

Tests done with this approach show that the current hystopyramid implementation becomes the bottleneck of the application in terms of time execution, hence requiring a second one for the indexes brings more penalties than benefits for performance. With a faster stream compaction method the idea of generating indexes would improve the performance of the algorithm in general.

Divergence and Branching:

The reader will notice that the last shader uses two conditional branches, one to discard the fragments where their 1D positional keys are higher than the total amount of vertices to generate, and a second one for the selection of the gradient type.

The gradient branching don’t present any divergence overt the threads since the instructions will be the same ones for each one of them, that’s because all the threads use the same boolean (uniform) passed to the shader.

The discard branching will present divergence on few warps/wavefronts since the data is compacted, this means that mostly each warp/wavefront containing 32 threads will execute the same code for all the fragments that generates positions and normals. On the other hand all the warps/wavefronts containing fragments with higher indexes would be grouped and discarded at the same time.

Divergence will be observed in the warps/wavefronts where there’s a group of fragment discarded, and the rest keep the execution (position/normal calculation), the different paths will force the warp to deal with both control paths and show a 50% performance loss on those warps. These warps would be allocated in the close to horizontal limit that separates the data generated and the empty data in the texture
This is one of the main reasons why data shouldn’t be calculated in a scattered fashion over a texture, in that case every warp could potential show divergence and performance would drop for each one of them, compaction limits the amount of warps where divergence happen.

Divergence can also be presented in the color implementation for the active cells evaluation, this is because the different colors are evaluated using a different set of if conditionals based on the result of each fragment’s different data, but since this is only meant for debugging no further optimizations where done in this section of the shader.

If the user would like to have a non divergence implementation for the color definition, it’s recommended to use float converted boolean masks on a weighted sum of the colors expected, it would represent more instructions, but all the threads will execute the same instructions at the same time.

More information about GPU pipeline, warps, branching and divergence can be found in the following articles https://developer.nvidia.com/content/life-triangle-nvidias-logical-pipeline and http://cs.nyu.edu/courses/spring12/CSCI-GA.3033-012/lecture5.pdf

The last part of these series will explain the javascript implementation of the shaders explained, and will provide different examples with different models at 128^3 and 256^3 resolutions.

### GPU Marching Cubes from Point Clouds in WebGL (part 1: Potential Generation)

10, Jan 2017/Categories 3D, Sin categoría, webGL/No Comments

In order to render fluids in webGL from particles simulations, an implicit surface method has to be used to create a mesh from a point cloud that represents the current state of the simulation on each frame. To do so there are many algorithms that can be applied, being the Marching Cubes a good approach for webGL since its implementation can take advantage of histopyramids to accelerate the process in the GPU.

This algorithm is an iso surface extraction method from a potential field that uses a divide and conquer process locating the surface inside a user defined 3D grid structure (voxels), generating triangles where the iso surface intersects the voxels of the 3D space. The intersection of the iso surface with each voxel can be defined by the cut of each of the edges from the cube that represents the voxel.

The algorithm checks the value of the two vertices that define the edge in order to check if one of the edges from the voxel has been crossed by the surface. If the two vertices are below or above the iso surface value the edge hasn’t been crossed, on the other hand if one of the vertices is below and the other vertex is above, the edge has been crossed by the iso surface.

Each crossed edge from the voxel generates a vertex position that will be part of one of the triangles to create in the corresponding cube, and since there are a limited amount of edges a finite amount of triangles positions can be defined, being a total of 256 possible combinations present in the algorithm with up to five triangles per voxel. The final vertex position between the two borders is evaluated based on a linear interpolation of the potential value from the two vertex of the edge, using the iso surface value as the input parameter for the interpolation.

The first image (taken from http://paulbourke.net/geometry/polygonise/) shows the vertices and edge numbering convention used for the marching cubes, and the second image shows how one voxel is crossed by the iso surface in the edges 11, 2 and 3, generating a set of three vertices in the edges commented, the triangle generated is allocated on those three vertices obtained.

Since the calculation of each vertex position can be done individually the process can be easy parallelized using the GPU. Also the total amount of vertices to generate can be obtained in advance, this allows to allocate the required memory to generate the geometry, since the process relays heavily on tables that define the amounts of vertices per voxel.

A common approach to implement the marching cubes algorithm in the GPU is to use geometry shaders since the method is expected to generate a set of vertices from a single voxel query, this represents a big limitation in webGL because the current pipeline only supports vertex and fragment shaders. To overcome this impediment histopyramids can provide an effective way to allocate the required vertices needed to create the triangles.

The complete process for the surface generation can be separated in three different blocks, which are the potential generation, active voxels evaluation and finally the generation of the corresponding vertices positions with its normals for the triangles to render. This three blocks are defined in the following diagram.

3D Voxels Compacted Texture Generation

The marching cubes algorithm requires a potential field that has to be generated from the particle cloud. This is done using a separable blur that spreads the particle data generating the required potentials, the resulting blur creates a simulated non signed distance field blurring in the 3 main axes where the coefficients and the the radius of the blur control the smoothness of the surface.

Since the blurring process is a quite demanding step it can become the bottleneck of the surface generation, it all depends on the size of the 3D texture used to allocate the voxels where the particles reside. In order to improve the performance of the blurring process a 3D compacted texture is generated, using the RGBA channels of the texture to compress the different depth buckets of a conventional 3D texture.

The idea is to take advantage of the vector capabilities of the GPU to place all the voxel data in the four channels of each fragment, that would allow to blur the 3D texture on a smaller texture size. As an example a 256^3 texture can be simulated on a 4096×4096 non compressed 2D texture, where the space will be segmented in 256 buckets of 256×256 pixels.

The previous texture can be compressed using a 2048×2048 2D texture where the “R” channel holds the 256×256 size buckets which depths are between [0-64), the “G” channel holds the buckets which depths are between [64-128), the “B” channel holds the buckets which depths are between [128-192) and the “A” channel holds the buckets between depths allocated in [192-256).

The image above shows two examples of a 256^3 compacted 3D textures. A simple inspection shows 64 (8X8) buckets of 256×256 pixels, where the depths can be read counting the bucket position and the corresponding RGBA color displayed. Notice that the same 2D position can represent more than one bucket (left 3D texture), in this case the white pixels represent that there’s a voxel with data for all the four depths encoded in those fragments.

A scattering program is written to populate the 3d texture where the vertex shader is responsible to allocate the particles inside the texture, and the fragment shader only assigns the corresponding color to each fragment. The following shaders are used to generate the required program

//2D index to allocate the position of the particle in the corresponding texture.
attribute vec2 aVertexIndex2D;
//Texture holding the positions of the particles from the simulation.
uniform highp sampler2D uPositionTexture;
//Size of the particle to draw in the 3d texture.
uniform float uSize;
//Displacement on the main axis to simulate the size of the particle in 3D
uniform float uDesface;
varying vec2 vV2I;
varying vec3 vPos;
//Constants used to simulate the 3D texture
const vec4 c3D = vec4(1. / 2048., 256., 8., 1. / 8.);
varying vec4 vColor;
void main(void) {

vV2I = aVertexIndex2D;
//obtain the position from the particle in [0-1] space.
vPos = texture2D(uPositionTexture, aVertexIndex2D).rgb;
//Define the 3D position in voxel space [0-256].
vec3 gPP = floor(vPos * c3D.y);
//Displace the main axis position to simulate a cube particle in 3D
gPP.y -= uDesface;
//Obtain the corresponding 2D position for the 2D texture.
vec2 gP = c3D.x * (gPP.xz + c3D.y * vec2(mod(gPP.y, c3D.z), floor(gPP.y * c3D.w)) + vec2(0.5));
//This defines the corresponding depth of the particle, it’s used to encode it in the
//corresponding channel.

float zLevel = floor(gP.y);
gP.y = fract(gP.y);
gP = 2. * gP – vec2(1.);
//The color of the channel depends on the corresponding depth of the particle.
vColor = vec4(equal(vec4(zLevel), vec4(0., 1., 2., 3.)));
//Set the size of the particle to render.
gl_PointSize = uSize;
gl_Position = vec4(gP, 0., 1.0);

}

precision highp float;
varying vec4 vColor;
void main(void) {

gl_FragColor = vColor;

}

The vertex shader uses the attribute aVertexIndex2D to define the UV required to read the 3d position of each particle represented as a vertex, these positions are saved in the uPositionTexture uniform. The uniform uSize is used to define the size of the particle in the voxel space, this is useful when there’re not too many particles in the simulation, or the geometry to generate is based on filaments generated from particles.

The constant c3D is a vector that is used to allocate the data in buckets that simulate the array of depths from a 3D texture. The corresponding channels are defined below:
c3D.x: inverse of the size of the texture 2D used to simulate the texture 3D (usually 1024 or 2048).
c3D.y: bucket side size for the texture3D simulated (usually 128 from 128^3 or 256 from 256^3)
c3D.z: number of buckets allocated in the horizontal row from the 2D texture. Usually 8 or 16 buckets.

Since the particles can have sizes bigger than one pixel, it’s required to represent them as cubes in the 3D texture. To do so the vertex shader uses the uOffset uniform, this value represents the displacement in the depth axis (in units) that the shader should offset to render the particle. This would represent the cube using the different slices to represent the size in 3D.

This means that the program has to be invoked as many times as the size of the particles, incrementing the value of uOffset on each pass to render the corresponding slices of the particle’s cube in the 3D texture.

One depth value is saved in the variable zLevel to define which channel will be used, since only 64 buckets can be represented per channel the zValue will define a value from [0-4). If the zValue is 0 the corresponding voxel will be encoding in the “R” channel, for a zValue of 1 the “G” channel is used, for zValue equal of 2 the “B” channel is used. Finally for a zValue of 3 the “A” channel holds the corresponding voxel.

3D Blur for Potential Field Generation

Once the 3D texture is filled with the particles a second program is required to generate the potentials using a 3D blur. This is done using a quad pass, and two different fragment shaders, one for the blur corresponding the non depth axis, and another fragment shader for the blur corresponding the depth axis. The fragment shader for the depth axis is defined below.

precision highp float;
precision highp sampler2D;
uniform sampler2D uDataTexture;
uniform float uSteps;
varying vec2 vText;
//strength of the pixel to blur agains the other ones used.
const float coeficient = 2.;
const vec4 c3D = vec4(1. / 2048., 256., 8., 1. / 8.);
const float border = 1.;
//Evaluates the 2d index value in the simulated 3D texture from the 3D pos.
vec2 index2D(vec3 pos) {

return c3D.x * (pos.xz + c3D.y * vec2(mod(pos.y, c3D.z), floor(pos.y * c3D.w)) + vec2(0.5));

}
void main(void) {

//Obtain the 3D pos of the corresponding fragment.
vec2 pos = floor(vText / c3D.x);
vec3 pos3D = vec3(mod(pos.y, c3D.y), c3D.z * floor(pos.y / c3D.y) + floor(pos.x / c3D.y), mod(pos.x, c3D.y));
vec3 newPos3D = vec3(0.);
vec2 uv = vec2(0.);
vec4 blend = vec4(0.);
float zLevel = 0.;
//Obtain the z level for the corresponding fragment.
float currentZLevel = floor(pos3D.y / 64.);
for (float i = 0.; i <= 40.; i ++) {
if(i > 2. * uSteps) break;
float j = i – uSteps;
//Obtain the new 3D pos of the fragment to use for blurring.
newPos3D = pos3D + j * vec3(0., 1., 0.);
//Obtain the z level for the new fragment to read.
zLevel = floor(newPos3D.y / 64.);
uv = index2D(newPos3D);
uv.y = fract(uv.y);
float k = j == 0. ? coeficient : 1.;
vec4 newBucket = texture2D(uDataTexture, uv);
//If the new fragment is in the same depth range than the original fragment to blur then the same channels are used.
//If the new zLevel is different than the current Z level the blurring have to be done taking into account the
//channel differences between the two fragments.

vec3 cases = vec3(bvec3(zLevel < currentZLevel, zLevel == currentZLevel, zLevel > currentZLevel));
blend += k * (vec4(0., newBucket.rgb) * cases.x + newBucket * cases.y + vec4(newBucket.gba, 0.) * cases.z);

}
blend /= (2. * uSteps + coeficient);
//This avoids to spread information between the different buckets.
blend *= float(mod(pos.x, 256.) > border && mod(pos.y, 256.) > border && mod(pos.x, 256.) < 255. - border && mod(pos.y, 256.) < 255. - border); gl_FragColor = blend;

}

Fragment shader for two non depth coordinates blur:

precision mediump float;
precision mediump sampler2D;
uniform sampler2D uDT;
uniform vec3 uAxis;
uniform float uSteps;
varying vec2 vText;
const float coeficient = 2.;
void main(void) {

vec4 blend = vec4(0.);
for (float i = 0.; i <= 40.; i ++) {
if(i > 2. * uSteps) break;
float j = i – uSteps;
float k = j == 0. ? coeficient : 1.;
blend += k * texture2D(uDT, vText + j * uAxis.xy);

}
blend /= (2. * uSteps + coeficient);
gl_FragColor = blend;

}

Since the 3D texture is simulated using contiguous buckets in the 2D texture, the blur for the depth axis has to take into account the depth range for every fragment used in the blurring process, hence a vector is defined to know if the depth of the fragment to use is in the range of the depth from the fragment to blur. There are three different scenarios depending on the two depth values (fragment to blur “currentZLevel” and fragment to use for blurring “zLevel”), these are defined below.

if zLevel is equal than the currentZlevel, the original channels are blended with the new channels with the following equation blend += newBucket.rgba.
If zLevel is below the currentZlevel, the original channels are blurred with the new data using the following equation… blend += vec4(0, newBucket.rgb).
if zLevel is over the currentZlevel, the original channels are blurred with the new data using the following equation… blend += vec4(newBucket.gba, 0).

The three previous scenarios are used to match the compressed RGBA data between the two values, these equations are not necessary for the two other axis (non depth ones) since the blurring is done inside the same depth bucket.

For the non depth axis a simple box blur is used defining the blurring direction based on a user provided uniform uAxis, this allows to make the 3D blur in three different separable passes. Notice that the filtering done is based on box coefficients, but these could be modified to make a gaussian filter or any type of filtering that could provide better results for the potential to simulate.

The following image shows the result of the potential generation for the two previous 3D textures, notice that for the left image the buckets are separated avoiding blurring between the buckets. It can be seen in the left 3D texture that the appearance of the green and dark blue colors show how the different depths are being blended, meaning that the potential is being softened among the different depths.

Corners Calculation in the Potential Field

Once the potential is obtained, the marching cubes algorithm requires to evaluate the values in the corners of the voxels. In order to avoid software trilinear interpolations another quad pass is done over the result from the 3D blur passes to calculate values in the corners. Since it’s a quad evaluation the same vertex shader from the previous passes can be reused, and the corresponding fragment shader for the program is defined below.

precision highp float;
precision highp sampler2D;
uniform sampler2D uDataTexture;
varying vec2 vText;
vec3 data[7];
const vec4 c3D = vec4(1. / 2048., 256., 8., 1. / 8.);
vec2 index2D(vec3 pos) {

return c3D.x * (pos.xz + c3D.y * vec2(mod(pos.y, c3D.z), floor(pos.y * c3D.w)) + vec2(0.5));

}
void main(void) {

vec2 pos = floor(vText / c3D.x);
vec3 pos3D = vec3(mod(pos.y, c3D.y), c3D.z * floor(pos.y / c3D.y) + floor(pos.x / c3D.y), mod(pos.x, c3D.y));
data[0] = vec3(-1., -1., -1.);
data[1] = vec3(0., -1., -1.);
data[2] = vec3(0., 0., -1.);
data[3] = vec3(-1., 0., -1);
data[4] = vec3(-1., -1., 0.);
data[5] = vec3(0., -1., 0.);
data[6] = vec3(-1., 0., 0.);
float currentZLevel = floor(pos3D.y / 64.);
vec2 uv = index2D(pos3D);
uv.y = fract(uv.y);
vec4 corner = texture2D(uDataTexture, uv);
vec3 newPos3D = vec3(0.);
float zLevel = 0.;
for(int i = 0; i < 7; i ++) {
newPos3D = pos3D + data[i];
zLevel = floor(newPos3D.y / 64.);
uv = index2D(newPos3D);
uv.y = fract(uv.y);
vec4 newBucket = texture2D(uDataTexture, uv);
vec3 cases = vec3(bvec3(zLevel < currentZLevel, zLevel == currentZLevel, zLevel > currentZLevel));
corner += vec4(0., newBucket.rgb) * cases.x + newBucket * cases.y + vec4(newBucket.gba, 0.) * cases.z;

}
gl_FragColor = vec4(corner * 0.125);

}

Notice that the corners evaluation is done using the compressed blurred texture from the previous pass, this means that the same blending equations have to be used to match the different channels based on the depths regions of the fragments to use. With this final pass the algorithm has two potentials textures, one for center values and other for corner values, this last one is the texture used for the marching cubes steps.

Voxel Potential Field Expansion

For the marching cubes steps it’s not necessary to work with a compressed texture, hence a expansion step is used to obtain a 3D texture where each voxel is represented using a single fragment from the new texture. This is also done using a fragment shader defined below.

precision highp float;
precision highp sampler2D;
uniform sampler2D uDataTexture;
const vec4 c3D_h = vec4(1. / 4096., 256., 16., 1. / 16.);
const vec4 c3D_l = vec4(1. / 2048., 256., 8., 1. / 8.);
varying vec2 vText;
void main(void) {

//Obtain the 3D position of the corresponding fragment in the expanded texture
vec2 pos = floor(vText / c3D_h.x);
vec3 pos3D = vec3(mod(pos.y, c3D_h.y), c3D_h.z * floor(pos.y / c3D_h.y) + floor(pos.x / c3D_h.y), mod(pos.x, c3D_h.y));
//Define the depth range for the corresponding fragment.
float zLevel = floor(pos3D.y / 64.);
vec2 uv = c3D_l.x * (pos3D.xz + c3D_l.y * vec2(mod(pos3D.y, c3D_l.z), floor(pos3D.y * c3D_l.w)) + vec2(0.5));
uv.y = fract(uv.y);
//Set the value of the fragment based on the depth and the corresponding channel.
gl_FragColor = vec4(dot(texture2D(uDataTexture, uv), vec4(equal(vec4(zLevel), vec4(0., 1., 2., 3.)))));

}

The previous shader expands a 3D compressed RGBA texture allocated in a 2048×2048 2D texture where the visual buckets are allocated in 8 x 8 array of 256×256 fragments buckets. The result is a 4096×4096 texture where the visual buckets are uncompressed into an array of 16 x 16 buckets of 256×256 fragments. The different constant values for the expansion are defined in c3D_h and c3D_l.

The shader evaluates the 3D position of the fragment in the expanded texture and search for that value in the compressed texture, using the depth range to define with RGBA channel will be read to obtain the corresponding information. The image below shows the result of the expansion from the texture.

The left side of the previous image shows the compressed data in a 2048^2 area, allowing to calculate the blur and corners evaluation much faster than using a non compressed data. A speed up to 4X is gained based on the fact that the compressed texture area is 4 times smaller than the non compressed one. The right side shows the expanded result where the 256 depth buckets can be inspected visually without color interpretations.

Once the potential is generated it’s time to obtain the active voxels to use for the marching cubes, and generate the corresponding vertices, this will be explained in a second post, finally a third post will explain the javascript implementation and provide different examples with it.