vector2dggs

Python-based CLI tool to index vector files to DGGS in parallel, writing out to Parquet.

This is the vector equivalent of raster2dggs.

Currently this tool supports the following DGGSs:

... and the following geocode systems:

Geohash (points, polygons)

Contributions (especially for other DGGSs), suggestions, bug reports and strongly worded letters are all welcome.

Example use case for vector2dggs, showing parcels indexed to a high H3 resolution

Installation

This tool makes use of optional extras to allow you to install a limited subset of DGGSs.

If you want all possible:

pip install vector2dggs[all]

If you want only a subset, use the pattern pip install vector2dggs[rhp] (for one) or pip install vector2dggs[h3,s2] (for multiple).

A bare pip install vector2dggs will not install any DGGS backends.

Usage

Usage: vector2dggs h3 [OPTIONS] VECTOR_INPUT OUTPUT_DIRECTORY

  Ingest a vector dataset and index it to the H3 DGGS.

  VECTOR_INPUT is the path to input vector geospatial data. OUTPUT_DIRECTORY
  should be a directory, not a file or database table, as it will instead be
  the write location for an Apache Parquet data store.

Options:
  -v, --verbosity LVL             Either CRITICAL, ERROR, WARNING, INFO or
                                  DEBUG  [default: INFO]
  -r, --resolution [0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15]
                                  H3 resolution to index  [required]
  -pr, --parent_res [0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15]
                                  H3 Parent resolution for the output
                                  partition. Defaults to resolution - 6
  -id, --id_field TEXT            Field to use as an ID; defaults to a
                                  constructed single 0...n index on the
                                  original feature order.
  -k, --keep_attributes           Retain attributes in output. The default is
                                  to create an output that only includes H3
                                  cell ID and the ID given by the -id field
                                  (or the default index ID).
  -ch, --chunksize INTEGER        The number of rows per index partition to
                                  use when spatially partioning. Adjusting
                                  this number will trade off memory use and
                                  time.  [default: 50; required]
  -s, --spatial_sorting [hilbert|morton|geohash|none]
                                  Spatial sorting method when perfoming
                                  spatial partitioning.  [default: none]
  -crs, --cut_crs INTEGER         Set the coordinate reference system (CRS)
                                  used for cutting large geometries (see
                                  `--cut_threshold`). Defaults to the same CRS
                                  as the input. Should be a valid EPSG code.
  -c, --cut_threshold FLOAT       Cutting up large geometries into smaller
                                  geometries based on a target area. Units are
                                  assumed to match the input CRS units unless
                                  the `--cut_crs` is also given, in which case
                                  units match the units of the supplied CRS.
                                  If left unspecified, the threshold will be
                                  the maximum area of a cell at the parent
                                  resolution, in square metres or feet
                                  according to the CRS. A threshold of 0 will
                                  skip bissection entirely (effectively
                                  ignoring --cut_crs).
  -t, --threads INTEGER           Amount of threads used for operation
                                  [default: NUM_CPUS - 1]
  -cp, --compression TEXT         Compression method to use for the output
                                  Parquet files. Options include 'snappy',
                                  'gzip', 'brotli', 'lz4', 'zstd', etc. Use
                                  'none' for no compression.  [default:
                                  snappy]
  -lyr, --layer TEXT              Name of the layer or table to read when
                                  using an input that supports layers or
                                  tables
  -g, --geom_col TEXT             Column name to use when using a spatial
                                  database connection as input  [default:
                                  geom]
  --geo [none|point|polygon]      Select geometry encoding for the output:
                                  'none' for regular Parquet (no GeoParquet
                                  metadata), or 'point'/'polygon' to write
                                  GeoParquet (v1.1.0) with the corresponding
                                  geometry type.  [default: none]
  --tempdir PATH                  Temporary data is created during the
                                  execution of this program. This parameter
                                  allows you to control where this data will
                                  be written.
  -co, --compact                  Compact the H3 cells up to the parent
                                  resolution. Compaction requires an id_field.
  -o, --overwrite
  --version                       Show the version and exit.
  --help                          Show this message and exit.

Visualising output

Output is in the Apache Parquet format, a directory with one file per partition. With --geo point or --geo polygon output will be written as GeoParquet (v1.1.0) with the respective geometry types. GeoParquet can be visualised using desktop GIS tools.

The Apache Parquet output is indexed by an ID column (which you can specify), so it should be ready for two intended use-cases:

Joining attribute data from the original feature-level data onto computer DGGS cells.
Joining other data to this output on the DGGS cell ID. (The output has a column like {dggs}_\d, e.g. h3_09 or h3_12 according to the target resolution, zero-padded to account for the maximum resolution of the DGGS).

Compaction

Compaction is supported with the -co/--compact argument. The result respects overlapping polygons by considering each feature independently. (In the below example output for rHEALPix, cells are shown with opacity; overlap is visible where there is a darker shade.) This does mean that the index of the result is not necessarily unique (unless your input is a vector coverage, i.e. it does not have overlaps.)

Example of compaction of overlapping vector features with the rHEALPix DGGS

For development

In brief, to get started:

Install Poetry
Install GDAL
- If you're on Windows, pip install gdal may be necessary before running the subsequent commands.
- On Linux, install GDAL 3.8+ according to your platform-specific instructions, including development headers, i.e. libgdal-dev.
Create the virtual environment with poetry init. This will install necessary dependencies.
- If the installation of s2geometry fails, you may require SWIG to build it. (A command like conda install swig or sudo dnf install swig depending on your platform).
Subsequently, the virtual environment can be re-activated with poetry shell.

If you run poetry install -E all --with dev, the CLI tool will be aliased so you can simply use vector2dggs rather than poetry run vector2dggs, which is the alternative if you do not poetry install -E all --with dev.

For partial backend support you can consider poetry install --with dev -E h3 -E s2 etc. To check what is installed: poetry show --tree.

Alternatively, it is also possible to install using pip with pip install -e ., and bypass Poetry.

Code formatting

Please run black . before committing.

Tests

Tests are included. To run them, set up a poetry environment, then run:

python tests/test_vector2dggs.py

Test data are included at tests/data/.

Example commands

With a local GPKG:

vector2dggs h3 -v DEBUG -id title_no -r 12 -o ~/Downloads/nz-property-titles.gpkg ~/Downloads/nz-property-titles.parquet

With a PostgreSQL/PostGIS connection:

vector2dggs h3 -v DEBUG -id ogc_fid -r 9 -p 5 -t 4 --overwrite -lyr topo50_lake postgresql://user:password@host:port/db ./topo50_lake.parquet

Citation

@software{vector2dggs,
  title={{vector2dggs}},
  author={Ardo, James and Law, Richard},
  url={https://github.com/manaakiwhenua/vector2dggs},
  version={0.13.1},
  date={2026-03-19}
}

APA/Harvard

Ardo, J., & Law, R. (2023). vector2dggs (0.13.1) [Computer software]. https://github.com/manaakiwhenua/vector2dggs

Vector2dggs

Install / Use

README