IDKEngine

Feature list:

Wavefront Path Tracer with Ray Sorting and OIDN
High Quality SweepSAH BVH with PreSplitting + GPU Refitting
Real-Time Voxel Global Illumination
AMD FSR2 and Temporal Anti Aliasing
Mesh Shaders + Multi Draw Indirect + Bindless Textures + lots of OpenGL...
glTF support including animations and various extensions
Custom Collision Detection against triangle meshes
CoD-Modern-Warfare Bloom
Ray Traced Shadows
Variable Rate Shading
Order Independent Transparency
Ray marched Volumetric Lighting
GPU Frustum + Hi-Z Culling
Screen Space Reflections
Screen Space Ambient Occlusion
Atmospheric Scattering
Asynchronous texture loading
Camera capture and playback with video output

Required OpenGL: 4.6 + ARB_bindless_texture + EXT_shader_image_load_formatted + KHR_shader_subgroup + any of (ARB_shader_viewport_layer_array, AMD_vertex_shader_layer, NV_viewport_array2)

Notes:

If OIDN is found in PATH or near the executable you are given the option to denoise path traced images
If gltfpack is found in PATH or near the executable you are given the option to compress glTF files on load
Doesn't work on Mesa radeonsi or Intel driver
I no longer have access to a NVIDIA GPU, so I can't guarantee NVIDIA exclusive features work at any given point

Controls

| Key | Action | |-----------------------|-------------------------------| | W, A, S, D | Move | | Space | Move Up | | Shift | Move faster | | E | Enter/Leave GUI Controls | | T | Resume/Stop Time | | R-Click in GUI | Select Object | | R-Click in FPS Cam | Shoot Shadow-Casting Light | | Control + R | Toogle Recording | | Control + Space | Toogle Replay | | G | Toggle GUI visibility | | V | Toggle VSync | | F11 | Toggle Fullscreen | | ESC | Exit | | 1 | Recompile all shaders |

Path Traced Render Samples

Path Traced Lumberyard Bistro Path Traced Intel Sponza Path Traced Temple

Voxel Global Illumination

1.0. Overview

VXGI (or Voxel Cone Tracing) is a global illumination technique developed by NVIDIA, originally published in 2011. Later when the Maxwell architecture (GTX-900 series) released, the implementation was improved using GPU specific features starting from that generation. I'll show how to use those in a bit.

The basic idea of VXGI is:

Voxelize the scene
Cone Trace voxelized scene for second bounce lighting

The voxelized representation is an approximation of the actual scene and Cone Tracing is an approximation of actual Ray Tracing. Trading accuracy for speed! Still, VXGI has the potential to naturally account for various lighting effects. Here is a video showcasing some of them. I think it's a great technique to implement in a hobby renderer, since it's conceptually easy to understand, gives decent results and you get to play with advanced OpenGL features!

2.0 Voxelization

Voxelized Sponza

This is a visualization of a 384-sized rgba16f-format 3D texture. It's the output of the voxelization stage. Every pixel/voxel is shaded normally using some basic lighting and classic shadow maps. I only have this single texture, but others might store additional information such as normals for multiple bounces. The voxelization happens using a single shader program. A basic vertex shader and a rather unusal but clever fragment shader.

Vertex Shader

There are two variables vec3: GridMin, GridMax. Those define the world space region which the voxel grid spans over. When rendering, triangles get transformed to world space like normally and then mapped from the range [GridMin, GridMax] to [-1, 1] (normalized device coordinates). Triangles outside the grid will not be voxelized. As the grid grows, voxel resolution decreases.

#version 460 core
layout(location = 0) in vec3 Position;

uniform vec3 GridMin, GridMax;
out vec3 NormalizedDeviceCoords;

void main() {
    vec3 fragPos = (ModelMatrix * vec4(Position, 1.0)).xyz;

    // transform fragPos from [GridMin, GridMax] to [-1, 1]
    NormalizedDeviceCoords = MapToNdc(fragPos, GridMin, GridMax);

    gl_Position = vec4(NormalizedDeviceCoords, 1.0);
}

vec3 MapToNdc(vec3 value, vec3 rangeMin, vec3 rangeMax) {
    return ((value - rangeMin) / (rangeMax - rangeMin)) * 2.0 - 1.0;
}

Fragment Shader

We won't have any color attachments. In fact there is no FBO at all. We will write into the 3D texture manually using OpenGL image store. Framebuffers are avoided because you can only ever attach a single texture-layer for rendering. Image store works on absolute integer coordinates, so to find the corresponding voxel position we can transform the normalized device coordinates.

#version 460 core
layout(binding = 0, rgba16f) restrict uniform image3D ImgVoxels;

in vec3 NormalizedDeviceCoords;

void main()  {
    vec3 uvw = NormalizedDeviceCoords * 0.5 + 0.5; // transform from [-1, 1] to [0, 1]
    ivec3 voxelPos = ivec3(uvw * imageSize(ImgVoxels)); // transform from [0, 1] to [0, imageSize() - 1]

    vec3 voxelColor = ...; // compute some basic lighting
    imageStore(ImgVoxels, voxelPos, vec4(voxelColor, 1.0));
}

Since we don't have any color or depth attachments we want to use an empty framebuffer. It's used to explicitly communicate OpenGL the render width & height, which normally is derived from the color attachments. To not miss triangles, face culling is off. Color and other writes are turned off implicitly by using the empty framebuffer. Clearing is done by a simple compute shader.

Now, running the voxelization as described so far gives me this. There are two obvious issues that I'll address.

Voxelization Attempt

2.1 Fixing flickering

Flickering happens because the world space position for different fragment shader invocations can get mapped to the same voxel, and the invocation that writes to the image at last is random. One decent solution is to store the max() of the already stored and the new voxel color. There are several ways to implement this in a thread-safe manner: Fragment Shader Interlock, CAS-Loop, Atomic Operations. Fragment Shader Interlock is not in OpenGL Core and generally slower (certainly on AMD). CAS-Loop is what I've seen the most but it's unstable and slow. So I decided to go with Atomic Operations, imageAtomicMax in particular.

layout(binding = 0, r32ui) restrict uniform uimage3D ImgVoxelsR;
layout(binding = 1, r32ui) restrict uniform uimage3D ImgVoxelsG;
layout(binding = 2, r32ui) restrict uniform uimage3D ImgVoxelsB;

void main() {
    uvec3 uintVoxelColor = floatBitsToUint(voxelColor);
    imageAtomicMax(ImgVoxelsR, voxelPos, uintVoxelColor.r);
    imageAtomicMax(ImgVoxelsG, voxelPos, uintVoxelColor.g);
    imageAtomicMax(ImgVoxelsB, voxelPos, uintVoxelColor.b);
}

Image atomics can only be performed on single channel integer formats, but the voxel texture is required to be at least rgba16f. So I create three additional r32ui-format intermediate textures to perform the atomic operations on. If you have Alpha it's best to premultiply it here. After voxelization, in a simple compute shader, they get merged into the final rgba16f texture.

2.2 Fixing missing voxels

Why are there so many missing voxels? Consider the floor. What do we see if we view it from the side.

Plane From Side

Well, there is a thin line, but technically even that shouldn't be visible. When this gets rasterized the voxelization fragment shader won't even run. The camera should have been looking along the Y axis, not Z, because this is the dominant axis that maximizes the amount of projected area (more fragment shader invocations). The voxelization vertex shader doesn't have a view matrix and adding one would be overkill. To make the "camera" look a certain axis we can simply swizzle the vertex positions.

This is typically implemented in a geometry shader by finding the dominant axis of the triangle's geometric normal and then swizzling the vertex positions accordingly. Geometry shaders are known to be very slow, so I went with a different approach.

uniform int RenderAxis; // Set to 0, 1, 2 for each draw

void main() {
    gl_Position = vec4(NormalizedDeviceCoords, 1.0);

    if (RenderAxis == 0) gl_Position = gl_Position.zyxw;
    if (RenderAxis == 1) gl_Position = gl_Position.xzyw;
}

The entire scene simply gets rendered 3 times, once from each axis. No geometry shader is used. This works great together with imageAtomicMax from 2.2 Fixing missing voxels, since the fragment shader doesn't just overwrite a voxel's color each draw.

Performance comparison on 11 million triangles Intel Sponza scene with AMD

IDKEngine

Install / Use

README

IDKEngine

Controls

Path Traced Render Samples

Voxel Global Illumination

1.0. Overview

2.0 Voxelization

Vertex Shader

Fragment Shader

2.1 Fixing flickering

2.2 Fixing missing voxels