mmdbconvert

A command-line tool to merge multiple MaxMind MMDB databases and export to CSV, Parquet, or MMDB format.

Features

✅ Merge multiple MMDB databases - Combine GeoIP2 databases (e.g., Enterprise + Anonymous IP)
✅ Non-overlapping networks - Automatically resolves overlapping networks to smallest blocks
✅ Adjacent network merging - Combines adjacent networks with identical data for compact output
✅ Multiple output formats - Export to CSV, Parquet, or MMDB format
✅ Query-optimized Parquet - Integer columns enable 10-100x faster IP lookups
✅ Type-preserving MMDB output - Perfect type preservation for merged databases
✅ Flexible column mapping - Extract any fields from MMDB databases using JSON paths
✅ IPv4 and IPv6 support - Handle both IP versions seamlessly
✅ Type hints for Parquet - Native int64, float64, bool types for efficient storage

Installation

Binary Releases (Recommended)

Download pre-built binaries from the GitHub Releases page.

Architecture Guide:

amd64 = x86-64 / x64 (most common for Intel/AMD processors)

arm64 = ARM 64-bit (Apple Silicon, AWS Graviton, Raspberry Pi 4+)

darwin = macOS

Replace <VERSION> with the release version (e.g., 0.1.0)

Replace <ARCH> with your architecture (e.g., amd64 or arm64)

Linux

Using .deb package (Debian/Ubuntu):

Download the .deb file for your architecture from the releases page
Install using dpkg:

sudo dpkg -i mmdbconvert_<VERSION>_<ARCH>.deb

Using .rpm package (RedHat/CentOS/Fedora):

Download the .rpm file for your architecture from the releases page
Install using rpm:

sudo rpm -i mmdbconvert_<VERSION>_<ARCH>.rpm

Using tar.gz archive:

Download the Linux tar.gz file for your architecture from the releases page
Extract and install:

tar -xzf mmdbconvert_<VERSION>_linux_<ARCH>.tar.gz
sudo mv mmdbconvert/mmdbconvert /usr/local/bin/

macOS

Download the macOS tar.gz file for your architecture from the releases page:
- darwin_arm64 for Apple Silicon (M1/M2/M3/M4)
- darwin_amd64 for Intel Macs
Extract and install:

tar -xzf mmdbconvert_<VERSION>_darwin_<ARCH>.tar.gz
sudo mv mmdbconvert/mmdbconvert /usr/local/bin/

Windows

Download the Windows zip file for your architecture from the releases page
Extract the zip file
Add the mmdbconvert.exe binary to your PATH or run it directly from the extracted location

Using PowerShell:

# Extract (adjust filename to match your download)
Expand-Archive -Path mmdbconvert_<VERSION>_windows_<ARCH>.zip -DestinationPath .

# Run
.\mmdbconvert\mmdbconvert.exe --version

Note: ARM64 binaries are available for all platforms. Choose the appropriate architecture for your system.

From Source

go install github.com/maxmind/mmdbconvert/cmd/mmdbconvert@latest

Build Locally

git clone https://github.com/maxmind/mmdbconvert.git
cd mmdbconvert
go build -o mmdbconvert ./cmd/mmdbconvert

Quick Start

1. Create a Configuration File

Create config.toml:

[output]
format = "csv"
file = "output.csv"

[[databases]]
name = "city"
path = "/path/to/GeoIP2-City.mmdb"

[[columns]]
name = "country_code"
database = "city"
path = ["country", "iso_code"]

[[columns]]
name = "city_name"
database = "city"
path = ["city", "names", "en"]

2. Run the Tool

mmdbconvert config.toml

3. View the Output

head output.csv

network,country_code,city_name
1.0.0.0/24,AU,Sydney
1.0.1.0/24,CN,Beijing
1.0.4.0/22,AU,Melbourne

Note: The network column appears automatically because no [[network.columns]] sections were defined. By default, CSV output includes a CIDR column named network, while Parquet output includes start_int and end_int integer columns for faster IP lookups. You can customize network columns in the configuration.

Usage

# Basic usage
mmdbconvert config.toml

# Explicit config flag
mmdbconvert --config config.toml

# Suppress progress output
mmdbconvert --config config.toml --quiet

# Disable unmarshaler caching to reduce memory usage (several times slower)
mmdbconvert --config config.toml --disable-cache

# Show version
mmdbconvert --version

# Show help
mmdbconvert --help

Configuration

See docs/config.md for complete configuration reference.

CSV Output Example

[output]
format = "csv"
file = "geo.csv"

[output.csv]
delimiter = ","  # or "\t" for tab-delimited

[[network.columns]]
name = "network"
type = "cidr"

[[databases]]
name = "city"
path = "GeoIP2-City.mmdb"

[[columns]]
name = "country"
database = "city"
path = ["country", "iso_code"]

Parquet Output Example

[output]
format = "parquet"
file = "geo.parquet"

[output.parquet]
compression = "snappy"
row_group_size = 500000

# Integer columns for fast queries
[[network.columns]]
name = "start_int"
type = "start_int"

[[network.columns]]
name = "end_int"
type = "end_int"

[[databases]]
name = "city"
path = "GeoIP2-City.mmdb"

[[columns]]
name = "country"
database = "city"
path = ["country", "iso_code"]
type = "string"

[[columns]]
name = "latitude"
database = "city"
path = ["location", "latitude"]
type = "float64"

MMDB Output Example

[output]
format = "mmdb"
file = "merged.mmdb"

[output.mmdb]
database_type = "GeoIP2-City"
description = { en = "Merged GeoIP Database" }
record_size = 28

[[databases]]
name = "city"
path = "GeoIP2-City.mmdb"

# Use output_path to create nested structure
[[columns]]
name = "country_code"
database = "city"
path = ["country", "iso_code"]
output_path = ["country", "iso_code"]

[[columns]]
name = "city_name"
database = "city"
path = ["city", "names", "en"]
output_path = ["city", "names", "en"]

[[columns]]
name = "latitude"
database = "city"
path = ["location", "latitude"]
output_path = ["location", "latitude"]

[[columns]]
name = "longitude"
database = "city"
path = ["location", "longitude"]
output_path = ["location", "longitude"]

MMDB output features:

Perfect type preservation from source databases
Support for nested structures via output_path
Compatible with all MMDB readers (libmaxminddb, etc.)
Configurable record size (24, 28, or 32 bits)

Querying Parquet Files

Parquet files generated with integer columns (start_int, end_int) support extremely fast IP lookups (10-100x faster than string comparisons).

DuckDB Example

-- Lookup IP address 203.0.113.100 (integer: 3405803876)
SELECT * FROM read_parquet('geo.parquet')
WHERE start_int <= 3405803876 AND end_int >= 3405803876;

See docs/parquet-queries.md for comprehensive query examples and performance optimization guide.

Examples

Merging Multiple Databases

Combine GeoIP2 Enterprise with Anonymous IP data:

[output]
format = "parquet"
file = "merged.parquet"

[[network.columns]]
name = "start_int"
type = "start_int"

[[network.columns]]
name = "end_int"
type = "end_int"

[[databases]]
name = "enterprise"
path = "GeoIP2-Enterprise.mmdb"

[[databases]]
name = "anonymous"
path = "GeoIP2-Anonymous-IP.mmdb"

# Columns from Enterprise database
[[columns]]
name = "country_code"
database = "enterprise"
path = ["country", "iso_code"]

[[columns]]
name = "city_name"
database = "enterprise"
path = ["city", "names", "en"]

[[columns]]
name = "latitude"
database = "enterprise"
path = ["location", "latitude"]
type = "float64"

[[columns]]
name = "longitude"
database = "enterprise"
path = ["location", "longitude"]
type = "float64"

# Columns from Anonymous IP database
[[columns]]
name = "is_anonymous"
database = "anonymous"
path = ["is_anonymous"]
type = "bool"

[[columns]]
name = "is_anonymous_vpn"
database = "anonymous"
path = ["is_anonymous_vpn"]
type = "bool"

All Network Column Types

[[network.columns]]
name = "network"
type = "cidr"          # e.g., "203.0.113.0/24"

[[network.columns]]
name = "start_ip"
type = "start_ip"      # e.g., "203.0.113.0"

[[network.columns]]
name = "end_ip"
type = "end_ip"        # e.g., "203.0.113.255"

[[network.columns]]
name = "start_int"
type = "start_int"     # e.g., 3405803776 (IPv4 only)

[[network.columns]]
name = "end_int"
type = "end_int"       # e.g., 3405804031 (IPv4 only)

[[network.columns]]
name = "network_bucket"
type = "network_bucket"  # Bucket for efficient lookups. Requires split files.

Default network columns: If you don't define any [[network.columns]], mmdbconvert automatically provides sensible defaults based on output format:

CSV: Single network column (CIDR format) for human readability
Parquet: start_int and end_int columns for 10-100x faster IP queries

Note: start_int and end_int only work with IPv4 addresses unless you split your output into separate IPv4/IPv6 files via output.ipv4_file and output.ipv6_file. For single-file outputs that include IPv6 data, use string columns (start_ip, end_ip, cidr).

Network Bucketing for Analytics (BigQuery, etc.)

When loading network data into analytics platforms like BigQuery, range queries can be slow due to full table scans. The network_bucket column provides a join key that enables efficient queries by first filtering to a specific bucket.

Configuration:

[output]
format = "parquet"
ipv4_file = "geoip-v4.parquet"
ipv6_file = "geoip-v6.parquet"

[output.parquet]
ipv4_bucket_size = 16     # Optional, def

Mmdbconvert

Install / Use

README