MaskedOcclusionCulling
Example code for the research paper "Masked Software Occlusion Culling"; implements an efficient alternative to the hierarchical depth buffer algorithm.
Install / Use
/learn @GameTechDev/MaskedOcclusionCullingREADME
MaskedOcclusionCulling
This code accompanies the research paper "Masked Software Occlusion Culling", and implements an efficient alternative to the hierarchical depth buffer algorithm. Our algorithm decouples depth values and coverage, and operates directly on the hierarchical depth buffer. It lets us efficiently parallelize both coverage computations and hierarchical depth buffer updates.
Update May 2018
Added the ability to merge 2 depth buffers, this allows both an alterative method for parallelizing buffer creation and a way to reduce silhouette bleed when input data cannot be roughly sorted from front to back, for example rendering large terrain patches with foreground occluders in an open world game engine.
Requirements
This code is mainly optimized for AVX capable CPUs. However, we also provide SSE 4.1 and SSE 2 implementations for backwards compatibility. The appropriate implementation will be chosen during run-time based on the CPU's capabilities.
Notes on build time
The code is optimized for runtime performance and may require a long time to compile due to heavy code inlining. This can be worked around by compiling
a library file. An alternative solution is to disable whole program optimizations for the MaskedOcclusionCulling.cpp,
MaskedOcclusionCullingAVX2.cpp and MaskedOcclusionCullingAVX512.cpp files. It does not impact runtime performance, but greatly reduces the time of program linking.
<a name="cs"></a>Notes on coordinate systems and winding
Most inputs are given as clip space (x,y,w) coordinates assuming the same right handed coordinate system as used by DirectX and OpenGL (x positive right, y positive up and w positive in the view direction). Note that we use the clip space w coordinate for depth and disregard the z coordinate. Internally our masked hierarchical depth buffer stores depth = 1 / w.
The TestRect() function is an exception and instead accepts normalized device coordinates (NDC), (x' = x/w, y' = y/w), where the visible screen region
maps to the range [-1,1] for x' and y' (x positive right and y positive up). Again, this is consistent with both DirectX and OpenGL behavior.
By default, the screen space coordinate system used internally to access our hierarchical depth buffer follows DirectX conventions (y positive down), which is
not consistent with OpenGL (y positive up). This can be configured by changing the USE_D3D define. The screen space coordinate system affects the layout
of the buffer returned by the ComputePixelDepthBuffer() function, scissor rectangles (which are specified in screen space coordinates), and rasterization
tie-breaker rules if PRECISE_COVERAGE is enabled.
API / Tutorial
We have made an effort to keep the API as simple and minimal as possible. The rendering functions are quite similar to submitting DirectX or OpenGL drawcalls and we hope they will feel natural to anyone with graphics programming experience. In the following we will use the example project as a tutorial to showcase the API. Please refer to the documentation in the header file for further details.
Setup
We begin by creating a new instance of the occlusion culling object. The object is created using the static Create() function rather than a standard
constructor, and can be destroyed using the Destroy() function. The reason for using the factory Create()/Destroy() design pattern is that we want to
support custom (aligned) memory allocators, and that the library choses either the AVX-512, AVX or SSE implementation based on the CPU's capabilities.
MaskedOcclusionCulling *moc = MaskedOcclusionCulling::Create();
...
MaskedOcclusionCulling::Destroy(moc);
The created object is empty and has no hierarchical depth buffer attached, so we must first allocate a buffer using the SetResolution() function. This function can
also be used later to resize the hierarchical depth buffer, causing it to be re-allocated. Note that the resolution width must be a multiple of 8, and the height
a multiple of 4. This is a limitation of the occlusion culling algorithm.
int width = 1920;
int height = 1080;
moc.SetResolution(width, height); // Set full HD resolution
After setting the resolution we can start rendering occluders and performing occlusion queries. We must first clear the hierarchical depth buffer
// Clear hierarchical depth buffer to far depth
moc.ClearDepthBuffer();
Optional The SetNearClipPlane() function can be used to configure the distance to the near clipping plane to make the occlusion culling renderer match your DX/GL
renderer. The default value for the near plane is 0 which should work as expected unless your application relies on having onscreen geometry clipped by
the near plane.
float nearClipDist = 1.0f;
moc.SetNearClipPlane(nearClipDist); // Set near clipping dist (optional)
Occluder rendering
The RenderTriangles() function renders triangle meshes to the hierarchical depth buffer. Similar to DirectX/OpenGL, meshes are constructed from a vertex array
and an triangle index array. By default, the vertices are given as (x,y,z,w) floating point clip space coordinates, but the z-coordinate is ignored and
instead we use depth = 1 / w. We expose a TransformVertices() utility function to transform vertices from (x,y,z,1) model/world space to (x,y,z,w) clip
space, but you can use your own transform code as well. For more information on the TransformVertices() function, please refer to the documentaiton in the
header file.
The triangle index array is identical to a DirectX or OpenGL triangle list and connects vertices to form triangles. Every three indices in the array form a new triangle, so the size of the array must be a multiple of 3. Note that we only support triangle lists, and we currently have no plans on supporting other primitives such as strips or fans.
struct ClipSpaceVertex { float x, y, z, w; };
// Create an example triangle. The z component of each vertex is not used by the
// occlusion culling system.
ClipspaceVertex triVerts[] = { { 5, 0, 0, 10 }, { 30, 0, 0, 20 }, { 10, 50, 0, 40 } };
unsigned int triIndices[] = { 0, 1, 2 };
unsigned int nTris = 1;
// Render an example triangle
moc.RenderTriangles(triVerts, triIndices, nTris);
Transform It is possible to include a transform when calling RenderTriangles(), by passing the modelToClipSpace parameter. This is equivalent to calling TransformVertices(), followed
by RenderTriangles(), but performing the transform as shown in the example below typically
leads to better performance.
// Example matrix swapping the x and y coordinates
float swapxyMatrix[4][4] = {
{0,1,0,0},
{1,0,0,0},
{0,0,1,0},
{0,0,0,1}};
// Render triangle with transform.
moc.RenderTriangles(triVerts, triIndices, nTris, swapxyMatrix);
Backface Culling By default, clockwise winded triangles are considered backfacing and are culled when rasterizing occluders. However, you can
configure the RenderTriangles() function to backface cull either clockwise or counter-clockwise winded triangles, or to disable backface culling
for two-sided rendering.
// A clockwise winded (normally backfacing) triangle
ClipspaceVertex cwTriVerts[] = { { 7, -7, 0, 20 },{ 7.5, -7, 0, 20 },{ 7, -7.5, 0, 20 } };
unsigned int cwTriIndices[] = { 0, 1, 2 };
// Render with counter-clockwise backface culling, the triangle is drawn
moc->RenderTriangles((float*)cwTriVerts, cwTriIndices, 1, nullptr, BACKFACE_CCW);
The rasterization code only handles counter-clockwise winded triangles, so configurable backface culling is implemented by re-winding clockwise winded triangles
on the fly. Therefore, other culling modes than BACKFACE_CW may decrease performance slightly.
Clip Flags RenderTriangles() accepts an additional parameter to optimize polygon clipping. The calling application may disable any clipping plane if it can
guarantee that the mesh does not intersect said clipping plane. In the example below we have a quad which is entirely on screen, and we can disable
all clipping planes. Warning it is unsafe to incorrectly disable clipping planes and this may cause the program to crash or perform out of bounds
memory accesses. Consider this a power user feature (use CLIP_PLANE_ALL to clip against the full frustum when in doubt).
// Create a quad completely within the view frustum
ClipspaceVertex quadVerts[]
= { { -150, -150, 0, 200 },{ -10, -65, 0, 75 },{ 0, 0, 0, 20 },{ -40, 10, 0, 50 } };
unsigned int quadIndices[] = { 0, 1, 2, 0, 2, 3 };
unsigned int nTris = 2;
// Render the quad. As an optimization, indicate that clipping is not required
moc.RenderTriangles((float*)quadVerts, quadIndices, nTris, nullptr, BACKFACE_CW, CLIP_PLANE_NONE);
Vertex Storage Layout Finally, the RenderTriangles() supports configurable vertex storage layout. The code so far has used an array of structs (AoS) layout based
on the ClipSpaceVertex struct, and this is the default behaviour. You may use the VertexLayout struct to configure the memory layout of the vertex data. Note that
the vertex pointer passed to the RenderTriangles() should point at the x coordinate of the first vertex, so there is no x coordinate offset specified in the struct.
struct VertexLayout
{
int mStride; // Stride between vertices
int mOffsetY; // Offset to vertex y coordinate
int mOffsetW; // Offset to vertex w coordinate
};
For example, you can configure a struct of arrays (SoA) layout as follows
// A triangle specified on struct of arrays (SoA) form
float SoAVerts[] = {
10, 10, 7, // x-coordinates
-10, -7, -10, // y-coordinates
10, 10, 10 // w-coordinates
};
// Set vertex layout (stride, y offset, w offset)
VertexLayout SoAVertexLayout(sizeof(float), 3 * sizeof(float), 6 * sizeof(float));
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
groundhog
398Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
sec-edgar-agentkit
10AI agent toolkit for accessing and analyzing SEC EDGAR filing data. Build intelligent agents with LangChain, MCP-use, Gradio, Dify, and smolagents to analyze financial statements, insider trading, and company filings.
Kiln
4.7kBuild, Evaluate, and Optimize AI Systems. Includes evals, RAG, agents, fine-tuning, synthetic data generation, dataset management, MCP, and more.
