KartAI

KartAi is a research project with the objective to use AI to improve the map data for buildings in Norway.

This repository is intended for people contributing and working on the KartAi project, and in order to use the scripts you need access to both image data sources and azure resources.

The repository allows you to easily create training data for any sort of vector data and a set of different image data sources, everything defined in config files passed to the scripts. It also allows for training a set of different implemented Tensorflow models.

KartAI

Colab notebook workshop

In the /workshop_material folder there is a notebook for testing the main features of this repository. If you want to try out the notebook you need two secrets. Contact the kartAI developers if you want to try it out!

Link to notebook: https://colab.research.google.com/github/kartAI/kartAI/blob/master/workshop_material/introduction_to_geospatial_ai_colab.ipynb

Prerequisites

In order to create the training data for building segmentation you need access to a WMS of aerial images, and a database containing existing vector-data of buildings.

Setup

Conda environment

To make sure you have correct versions of all packages we recommend using anaconda and their virtual environments. We use python 3.9.

This repo has an env.yml file to create the environment from (NB! This does not include pandasgui).

Run conda env create -f env.yml in order to install using the env file (remember to use python 3.9), and conda activate kartai to activate the environment.

Alternatively if you want to install all dependencies manually you can run conda create -n env-name python=3.9 and then install dependencies as you want.

Running scripts

To run the scripts, we have two options depending on your operating system: From the project root, run:

Unix: ./kai <args>

Windows: kai.bat <args>

Environment variables

To run the the program you need a env_sevrets.py file that contains secret enviroment variables, as well as whitelisting your ip-address. Contact a developer for access.

Models

Implemented models

Our implemented models are:

unet
resnet
bottleneck
bottleneck_cross (custom architecture)
CSP
CSP_cross (custom architecture)
unet-twin (custom architecture)

Download existing trained models

To download the trained models checkpoint files and the metadata files to view hyperparameters and performance you can run:

Unix:

./kai download_models

Windows:

kai.bat download_models

This will download all available models that are not already downloaded into /checkpoints directory.

Upload model

If a trained model was created but not uploaded to azure automatically (flag when running the training), you can upload the model by running:

Unix:

./kai upload_model -m {model_name}

Windows:

kai.bat upload_model -m {model_name}

Training data

Dataset is automatically created based on the given data sources defined in the Dataset config file.

Dataset config file

The dataset config file (used in the -c argument to create_training_data) is a json file describing the datasets used for training / validation / test. It has three main sections: "TileGrid", "ImageSources" and the image sets.

Main structure:

{
  "TileGrid": {
    "srid": 25832,
    "x0": 563000.0,
    "y0": 6623000.0,
    "dx": 100.0,
    "dy": 100.0
  },
  "ImageSources": [
    {
       ...
    },
    {
       ...
    }
  ],
  "ImageSets": [
    {
       ...
    },
    {
       ...
    }
  ]
}

Tile grid

The TileGrid defines the grid structure for the image tiles.

 "TileGrid": {
    "srid": 25832,
    "x0": 410000.0,
    "y0": 6420000.0,
    "dx": 100.0,
    "dy": 100.0
  }

All image tiles will be in the spatial reference system given by "srid". The tiles will be of size dx * dy, with tile (0, 0) having the lower left corner at (x0, y0), tile (1, 0) at (x0 + dx, y0) etc...

Image Sources

The ImageSources is a list of image sources: database layers, WMS/WCS services, file layers (shape, geojson, ...)

Example of Postgres image datasource:

{
  "name": "BuldingDb",
  "type": "PostgresImageSource",
  "host": "pg.buildingserver.org",
  "port": "5432",
  "database": "Citydatabase",
  "user": "databaseuser",
  "passwd": "MyVerySecretPW",
  "image_format": "image/tiff",
  "table": "citydb.building_polygon"
}

Example of WMS image datasource:

{
  "name": "OrtofotoWMS",
  "type": "WMSImageSource",
  "image_format": "image/tiff",
  "url": "https://waapi.webatlas.no/wms-orto/",
  "layers": ["ortofoto"],
  "styles": ["new_up"]
}

Example of image source. Note that "image_format" is for the format of the output cache-mosaic. The system uses GDAL for file handling, and all valid GDAL import formats (including .vrt - GDAL Virtual Format) can be read.

{
  "name": "Ortofoto_manual",
  "type": "ImageFileImageSource",
  "image_format": "image/tiff",
  "file_path": "training_data/cityarea/ortofoto/aerialimage.tif"
}

Example of vector file source. GDAL / OGR is used for reading image models, and all valid OGR import formats can be read.

{
  "name": "Building_smallset",
  "type": "VectorFileImageSource",
  "image_format": "image/tiff",
  "file_path": "training_data/cityarea/shape/building.shp",
  "srid": 25832
}

Example of project arguments, that will affect production of the dataset.

{
  "ProjectArguments": {
    "training_fraction": 1,
    "validation_fraction": 0,
    "shuffle_data": "True",
    "max_size": 100
  }
}

Created dataset format

Once a dataset is created there will be several files generated. Labels area created and saved to training_data/AzureByggDb/{tilegrid}/{tilesize}.

Default behaviour when creating a dataset is that we "lazy load" our data. This means that instead of downloading the actual images, we instead save the url used for fetching the data in the output data files. The actual data is downloaded once you start training a model with the given dataset.

If you instead want the skip this lazy loading, and download the data immediately, you can pass -eager True to the script.

Create Training Data Script

create_training_data

Arguments:

| Argument | Description | | -------- | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | -n | what to name the dataset | | -c | path to config file | | -eager | Option to download created data immediately, and not just the reference to the data (which is default behaviour, where data is downloaded the first time the data is used, i.e during training) | | --region | Polygon or multipolygon describing data area with coordinates in same system as defined in config (i.e EPSG:25832), WKT or geojson (geometry) format, directly in a text string or as a filename | | --x_min | x_min for bbox, alternative to --region | | --y_min | y_min for bbox, alternative to --region | | --x_max | x_max for bbox, alternative to --region | | --y_max | y_max for bbox, alternative to --region |

Example:

Unix:

With bbox: ./kai create_training_data -n medium_area -c config/dataset/bygg.json --x_min 618296.0 --y_min 6668145.0 --x_max 623495.0 --y_max 6672133.0

With region: `./kai create_training_data -n small_test_area -c config/dataset/bygg.json --region training_data/regio

KartAI

Install / Use

README

KartAI

Table of Contents

Colab notebook workshop

Prerequisites

Setup

Conda environment

Running scripts

Environment variables

Models

Implemented models

Download existing trained models

Upload model

Training data

Dataset config file

Tile grid

Image Sources

Created dataset format

Create Training Data Script