Mmdbconvert
A command-line tool to merge multiple MaxMind MMDB databases and export to CSV, Parquet, or MMDB format.
Install / Use
/learn @maxmind/MmdbconvertREADME
mmdbconvert
A command-line tool to merge multiple MaxMind MMDB databases and export to CSV, Parquet, or MMDB format.
Features
- ✅ Merge multiple MMDB databases - Combine GeoIP2 databases (e.g., Enterprise + Anonymous IP)
- ✅ Non-overlapping networks - Automatically resolves overlapping networks to smallest blocks
- ✅ Adjacent network merging - Combines adjacent networks with identical data for compact output
- ✅ Multiple output formats - Export to CSV, Parquet, or MMDB format
- ✅ Query-optimized Parquet - Integer columns enable 10-100x faster IP lookups
- ✅ Type-preserving MMDB output - Perfect type preservation for merged databases
- ✅ Flexible column mapping - Extract any fields from MMDB databases using JSON paths
- ✅ IPv4 and IPv6 support - Handle both IP versions seamlessly
- ✅ Type hints for Parquet - Native int64, float64, bool types for efficient storage
Installation
Binary Releases (Recommended)
Download pre-built binaries from the GitHub Releases page.
Architecture Guide:
amd64= x86-64 / x64 (most common for Intel/AMD processors)arm64= ARM 64-bit (Apple Silicon, AWS Graviton, Raspberry Pi 4+)darwin= macOS- Replace
<VERSION>with the release version (e.g.,0.1.0)- Replace
<ARCH>with your architecture (e.g.,amd64orarm64)
Linux
Using .deb package (Debian/Ubuntu):
- Download the
.debfile for your architecture from the releases page - Install using dpkg:
sudo dpkg -i mmdbconvert_<VERSION>_<ARCH>.deb
Using .rpm package (RedHat/CentOS/Fedora):
- Download the
.rpmfile for your architecture from the releases page - Install using rpm:
sudo rpm -i mmdbconvert_<VERSION>_<ARCH>.rpm
Using tar.gz archive:
- Download the Linux tar.gz file for your architecture from the releases page
- Extract and install:
tar -xzf mmdbconvert_<VERSION>_linux_<ARCH>.tar.gz
sudo mv mmdbconvert/mmdbconvert /usr/local/bin/
macOS
- Download the macOS tar.gz file for your architecture from the releases page:
darwin_arm64for Apple Silicon (M1/M2/M3/M4)darwin_amd64for Intel Macs
- Extract and install:
tar -xzf mmdbconvert_<VERSION>_darwin_<ARCH>.tar.gz
sudo mv mmdbconvert/mmdbconvert /usr/local/bin/
Windows
- Download the Windows zip file for your architecture from the releases page
- Extract the zip file
- Add the
mmdbconvert.exebinary to your PATH or run it directly from the extracted location
Using PowerShell:
# Extract (adjust filename to match your download)
Expand-Archive -Path mmdbconvert_<VERSION>_windows_<ARCH>.zip -DestinationPath .
# Run
.\mmdbconvert\mmdbconvert.exe --version
Note: ARM64 binaries are available for all platforms. Choose the appropriate architecture for your system.
From Source
go install github.com/maxmind/mmdbconvert/cmd/mmdbconvert@latest
Build Locally
git clone https://github.com/maxmind/mmdbconvert.git
cd mmdbconvert
go build -o mmdbconvert ./cmd/mmdbconvert
Quick Start
1. Create a Configuration File
Create config.toml:
[output]
format = "csv"
file = "output.csv"
[[databases]]
name = "city"
path = "/path/to/GeoIP2-City.mmdb"
[[columns]]
name = "country_code"
database = "city"
path = ["country", "iso_code"]
[[columns]]
name = "city_name"
database = "city"
path = ["city", "names", "en"]
2. Run the Tool
mmdbconvert config.toml
3. View the Output
head output.csv
network,country_code,city_name
1.0.0.0/24,AU,Sydney
1.0.1.0/24,CN,Beijing
1.0.4.0/22,AU,Melbourne
Note: The
networkcolumn appears automatically because no[[network.columns]]sections were defined. By default, CSV output includes a CIDR column namednetwork, while Parquet output includesstart_intandend_intinteger columns for faster IP lookups. You can customize network columns in the configuration.
Usage
# Basic usage
mmdbconvert config.toml
# Explicit config flag
mmdbconvert --config config.toml
# Suppress progress output
mmdbconvert --config config.toml --quiet
# Disable unmarshaler caching to reduce memory usage (several times slower)
mmdbconvert --config config.toml --disable-cache
# Show version
mmdbconvert --version
# Show help
mmdbconvert --help
Configuration
See docs/config.md for complete configuration reference.
CSV Output Example
[output]
format = "csv"
file = "geo.csv"
[output.csv]
delimiter = "," # or "\t" for tab-delimited
[[network.columns]]
name = "network"
type = "cidr"
[[databases]]
name = "city"
path = "GeoIP2-City.mmdb"
[[columns]]
name = "country"
database = "city"
path = ["country", "iso_code"]
Parquet Output Example
[output]
format = "parquet"
file = "geo.parquet"
[output.parquet]
compression = "snappy"
row_group_size = 500000
# Integer columns for fast queries
[[network.columns]]
name = "start_int"
type = "start_int"
[[network.columns]]
name = "end_int"
type = "end_int"
[[databases]]
name = "city"
path = "GeoIP2-City.mmdb"
[[columns]]
name = "country"
database = "city"
path = ["country", "iso_code"]
type = "string"
[[columns]]
name = "latitude"
database = "city"
path = ["location", "latitude"]
type = "float64"
MMDB Output Example
[output]
format = "mmdb"
file = "merged.mmdb"
[output.mmdb]
database_type = "GeoIP2-City"
description = { en = "Merged GeoIP Database" }
record_size = 28
[[databases]]
name = "city"
path = "GeoIP2-City.mmdb"
# Use output_path to create nested structure
[[columns]]
name = "country_code"
database = "city"
path = ["country", "iso_code"]
output_path = ["country", "iso_code"]
[[columns]]
name = "city_name"
database = "city"
path = ["city", "names", "en"]
output_path = ["city", "names", "en"]
[[columns]]
name = "latitude"
database = "city"
path = ["location", "latitude"]
output_path = ["location", "latitude"]
[[columns]]
name = "longitude"
database = "city"
path = ["location", "longitude"]
output_path = ["location", "longitude"]
MMDB output features:
- Perfect type preservation from source databases
- Support for nested structures via
output_path - Compatible with all MMDB readers (libmaxminddb, etc.)
- Configurable record size (24, 28, or 32 bits)
Querying Parquet Files
Parquet files generated with integer columns (start_int, end_int) support
extremely fast IP lookups (10-100x faster than string comparisons).
DuckDB Example
-- Lookup IP address 203.0.113.100 (integer: 3405803876)
SELECT * FROM read_parquet('geo.parquet')
WHERE start_int <= 3405803876 AND end_int >= 3405803876;
See docs/parquet-queries.md for comprehensive query examples and performance optimization guide.
Examples
Merging Multiple Databases
Combine GeoIP2 Enterprise with Anonymous IP data:
[output]
format = "parquet"
file = "merged.parquet"
[[network.columns]]
name = "start_int"
type = "start_int"
[[network.columns]]
name = "end_int"
type = "end_int"
[[databases]]
name = "enterprise"
path = "GeoIP2-Enterprise.mmdb"
[[databases]]
name = "anonymous"
path = "GeoIP2-Anonymous-IP.mmdb"
# Columns from Enterprise database
[[columns]]
name = "country_code"
database = "enterprise"
path = ["country", "iso_code"]
[[columns]]
name = "city_name"
database = "enterprise"
path = ["city", "names", "en"]
[[columns]]
name = "latitude"
database = "enterprise"
path = ["location", "latitude"]
type = "float64"
[[columns]]
name = "longitude"
database = "enterprise"
path = ["location", "longitude"]
type = "float64"
# Columns from Anonymous IP database
[[columns]]
name = "is_anonymous"
database = "anonymous"
path = ["is_anonymous"]
type = "bool"
[[columns]]
name = "is_anonymous_vpn"
database = "anonymous"
path = ["is_anonymous_vpn"]
type = "bool"
All Network Column Types
[[network.columns]]
name = "network"
type = "cidr" # e.g., "203.0.113.0/24"
[[network.columns]]
name = "start_ip"
type = "start_ip" # e.g., "203.0.113.0"
[[network.columns]]
name = "end_ip"
type = "end_ip" # e.g., "203.0.113.255"
[[network.columns]]
name = "start_int"
type = "start_int" # e.g., 3405803776 (IPv4 only)
[[network.columns]]
name = "end_int"
type = "end_int" # e.g., 3405804031 (IPv4 only)
[[network.columns]]
name = "network_bucket"
type = "network_bucket" # Bucket for efficient lookups. Requires split files.
Default network columns: If you don't define any [[network.columns]],
mmdbconvert automatically provides sensible defaults based on output format:
- CSV: Single
networkcolumn (CIDR format) for human readability - Parquet:
start_intandend_intcolumns for 10-100x faster IP queries
Note: start_int and end_int only work with IPv4 addresses unless you
split your output into separate IPv4/IPv6 files via output.ipv4_file and
output.ipv6_file. For single-file outputs that include IPv6 data, use string
columns (start_ip, end_ip, cidr).
Network Bucketing for Analytics (BigQuery, etc.)
When loading network data into analytics platforms like BigQuery, range queries
can be slow due to full table scans. The network_bucket column provides a join
key that enables efficient queries by first filtering to a specific bucket.
Configuration:
[output]
format = "parquet"
ipv4_file = "geoip-v4.parquet"
ipv6_file = "geoip-v6.parquet"
[output.parquet]
ipv4_bucket_size = 16 # Optional, def
