Meshoptimizer

Mesh optimization library that makes meshes smaller and faster to render

Generate Convert Improve

Install / Use

/learn @zeux/Meshoptimizer

About this skill

Quality Score

0/100

README

🐇 meshoptimizer

Purpose

When a GPU renders triangle meshes, various stages of the GPU pipeline have to process vertex and index data. The efficiency of these stages depends on the data you feed to them; this library provides algorithms to help optimize meshes for these stages, as well as algorithms to reduce the mesh complexity and storage overhead.

The library provides a C and C++ interface for all algorithms; you can use it from C/C++ or from other languages via FFI (such as P/Invoke). If you want to use this library from Rust, you should use meshopt crate. JavaScript interface for some algorithms is available through meshoptimizer.js.

Two companion projects are developed and distributed alongside the library: gltfpack, a command-line tool that automatically optimizes glTF files, and clusterlod.h, a single-header C/C++ library for continuous level of detail using clustered simplification.

Installing

meshoptimizer is hosted on GitHub; you can download the latest release using git:

git clone -b v1.0 https://github.com/zeux/meshoptimizer.git

Alternatively you can download the .zip archive from GitHub.

The library is also available as a Linux package in several distributions (ArchLinux, Debian, FreeBSD, Nix, Ubuntu), as well as a Vcpkg port (see installation instructions) and a Conan package.

gltfpack is available as a pre-built binary on Releases page or via npm package. Native binaries are recommended since they are more efficient and support texture compression.

Building

meshoptimizer is distributed as a C/C++ header (src/meshoptimizer.h) and a set of C++ source files (src/*.cpp). To include it in your project, you can use one of two options:

Use CMake to build the library (either as a standalone project or as part of your project)
Add source files to your project's build system

The source files are organized in such a way that you don't need to change your build-system settings, and you only need to add the source files for the algorithms you use. They should build without warnings or special compilation options on all major compilers. If you prefer amalgamated builds, you can also concatenate the source files into a single .cpp file and build that instead.

To use meshoptimizer functions, simply #include the header meshoptimizer.h; the library source is C++, but the header is C-compatible.

Core pipeline

When optimizing a mesh, to maximize rendering efficiency you should typically feed it through a set of optimizations (the order is important!):

Indexing
Vertex cache optimization
(optional) Overdraw optimization
Vertex fetch optimization
Vertex quantization
(optional) Shadow indexing

Indexing

Most algorithms in this library assume that a mesh has a vertex buffer and an index buffer. For algorithms to work well and also for GPU to render your mesh efficiently, the vertex buffer has to have no redundant vertices; you can generate an index buffer from an unindexed vertex buffer or reindex an existing (potentially redundant) index buffer as follows:

Note: meshoptimizer generally works with 32-bit (unsigned int) indices, however when using C++ APIs you can use any integer type for index data by using the provided template overloads. By convention, remap tables always use unsigned int.

First, generate a remap table from your existing vertex (and, optionally, index) data:

size_t index_count = face_count * 3;
size_t unindexed_vertex_count = face_count * 3;
std::vector<unsigned int> remap(unindexed_vertex_count); // temporary remap table
size_t vertex_count = meshopt_generateVertexRemap(&remap[0], NULL, index_count,
    &unindexed_vertices[0], unindexed_vertex_count, sizeof(Vertex));

Note that in this case we only have an unindexed vertex buffer; when input mesh has an index buffer, it will need to be passed to meshopt_generateVertexRemap instead of NULL, along with the correct source vertex count. In either case, the remap table is generated based on binary equivalence of the input vertices, so the resulting mesh will render the same way. Binary equivalence considers all input bytes, including padding which should be zero-initialized if the vertex structure has gaps.

After generating the remap table, you can allocate space for the target vertex buffer (vertex_count elements) and index buffer (index_count elements) and generate them:

meshopt_remapIndexBuffer(indices, NULL, index_count, &remap[0]);
meshopt_remapVertexBuffer(vertices, &unindexed_vertices[0], unindexed_vertex_count, sizeof(Vertex), &remap[0]);

You can then further optimize the resulting buffers by calling the other functions on them in-place.

meshopt_generateVertexRemap uses binary equivalence of vertex data, which is generally a reasonable default; however, in some cases some attributes may have floating point drift causing extra vertices to be generated. For such cases, it may be necessary to quantize some attributes (most importantly, normals and tangents) before generating the remap, or use meshopt_generateVertexRemapCustom algorithm that allows comparing individual attributes with tolerance by providing a custom comparison function:

size_t vertex_count = meshopt_generateVertexRemapCustom(&remap[0], NULL, index_count,
    &unindexed_vertices[0].px, unindexed_vertex_count, sizeof(Vertex),
    [&](unsigned int lhs, unsigned int rhs) -> bool {
        const Vertex& lv = unindexed_vertices[lhs];
        const Vertex& rv = unindexed_vertices[rhs];

        return fabsf(lv.tx - rv.tx) < 1e-3f && fabsf(lv.ty - rv.ty) < 1e-3f;
    });

Vertex cache optimization

When the GPU renders the mesh, it runs the vertex shader for each vertex. Historically, GPUs used a small fixed-size post-transform cache (16-32 vertices) with different replacement policies to store the shader output and avoid redundant shader invocations. Modern GPUs still perform vertex reuse, but with substantially different mechanics: vertex invocations are batched into thread groups based on the input indices, and effective reuse depends on factors like vertex shader outputs and rasterizer throughput. To maximize the locality of reused vertex references, you have to reorder your triangles like so:

meshopt_optimizeVertexCache(indices, indices, index_count, vertex_count);

The details of vertex reuse vary between different GPU architectures, so vertex cache optimization uses an adaptive algorithm that produces a triangle sequence with good locality that works well across different GPUs. Alternatively, you can use an algorithm that optimizes specifically for fixed-size FIFO caches: meshopt_optimizeVertexCacheFifo (with a recommended cache size of 16). While it generally produces less performant results on most GPUs, it runs ~2x faster, which may benefit rapid content iteration.

Overdraw optimization

After transforming the vertices, GPU sends the triangles for rasterization which results in generating pixels that are usually first run through the depth test, and pixels that pass it get the pixel shader executed to generate the final color. As pixel shaders get more expensive, it becomes more and more important to reduce overdraw. While in general improving overdraw requires view-dependent operations, this library provides an algorithm to reorder triangles to minimize the overdraw from all directions, which you can run after vertex cache optimization like this:

meshopt_optimizeOverdraw(indices, indices, index_count, &vertices[0].x, vertex_count, sizeof(Vertex), 1.05f);

The overdraw optimizer needs to read vertex positions as a float3 from the vertex; the code snippet above assumes that the vertex stores position as float x, y, z.

When performing the overdraw optimization you have to specify a floating-point threshold parameter. The algorithm tries to maintain a balance between vertex cache efficiency and overdraw; the threshold determines how much the algorithm can compromise the vertex cache hit ratio, with 1.05 meaning that the resulting ratio should be at most 5% worse than before the optimization.

Note that depending on the renderer structure and target hardware, the optimization may or may not be beneficial; for example, mobile GPUs with tiled deferred rendering (PowerVR, Apple) would not benefit from this optimization. For vertex heavy scenes it's recommended to measure the performance impact to ensure that the reduced vertex cache efficiency is outweighed by the reduced overdraw.

Vertex fetch optimization

After the final triangle order has been established, we still can optimize the vertex buffer for memory efficiency. Before running the vertex shader GPU has to fetch the vertex attributes from the vertex buffer; the fetch is usually backed b

Related Skills

node-connect

338.0k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

83.4k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

338.0k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

commit-push-pr

83.4k

Commit, push, and open a PR