SkillAgentSearch skills...

UKCensusAPI

UK Census Data queries and downloads from python or R

Install / Use

/learn @virgesmith/UKCensusAPI
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

UK Census Data API

PyPI version Anaconda-Server Badge Anaconda-Server Badge

LicenseJOSS status DOI

Python (pip) build R-CMD-check

Update

This package has been something of a misnomer as it only used Nomisweb as its data source, which only provides full census data for England & Wales. (They do provide some UK key statistics and quick statistics tables).

Version 1.1.x of this package extends the 2011 census data coverage for Scotland and Northern Ireland. The aim is to make the data (and the metadata) consistent across all nations, but as neither country provide a web API for their data we have to resort to web scraping. This means the slicing-and-dicing and geographical query functionality may be more limited than it is for England & Wales. Note also that category values in equivalent tables may differ slightly.

Scotland

For Scotland, data can be downloaded at country or Council Area (~LAD) level, at geographical resolutions of Council Area, Data Zone (~LSOA) and Output Area. Intermediate Area (~MSOA) data can be aggregated (only) where the data is available at a higher geographical resolution.

The principal functions are NRScotland.get_metadata() for metadata, NRScotland.get_data() for the actual data, and NRScotland.contextify() to annotate the data using the metadata.

NB The OA-level Scotland data is provided in a zip compression format (deflate64) that python cannot extract. If this data is requested, you'll get an error message containing instructions on how to fix the issue by manually extracting the file(s) using unzip or 7zip.

Northern Ireland

For Northern Ireland, data can be downloaded at country or Local Government District (~LAD) level, at geographical resolutions of Super Output Area (~LSOA) and Small Area (OA). Ward (~MSOA) (~MSOA) data can be aggregated (only) where the data is available at higher geographical resolution. The principal functions are NISRA.get_metadata() for metadata, NISRA.get_data() for the actual data, and NISRA.contextify() to annotate the data using the metadata.

Nomisweb, run by Durham University, provides online access to the most detailed and up-to-date statistics from official sources for local areas throughout the UK, including census data.

This package provides both a python and an R wrapper around the nomisweb census data API, the NRScotland and NISRA websites, enabling:

  • querying table metadata
  • autogenerating customised python and R query code for future use
  • automated cached data downloads
  • modifying the geography of queries
  • adding descriptive information to tables (from metadata)

Queries can be customised on geographical coverage, geographical resolution, and table fields, the latter can be filtered to include only the category values you require.

The package can generate reusable code snippets that can be inserted into applications. Such applications will work seamlessly for any user as long as they have installed this package, and possess their own nomisweb API key.

Since census data is essentially static, it makes little sense to download the data every time it is requested: all data downloads are cached.

Example code is also provided which:

  • shows how an existing query can easily be modified in terms of geographical coverage.
  • shows how raw data can be annotated with meaningful metadata

Prerequisites

Software

  • python3.4 or higher, with pip, numpy and pandas. The dependencies should install automatically. Python 2 is not supported.
  • R version 3.3.3 or higher (optional, if using the R interface)

API key

It is recommended that you register with nomisweb before using this package and use the API key the supply you in all queries. Without a key, queries will be truncated (max 25000 rows). With a key, the row limit is 1000000 and this package will warn if a query generates data with this number of rows.

Once registered, you will find your API key on this page. You should not divulge this key to others.

This package will look for the key in the following places (in order):

  • locally: a file NOMIS_API_KEY in the cache directory defined at initialisation, e.g.
    $ cat cache/NOMIS_API_KEY
    0x0000000000000000000000000000000000000000
    
  • globally: the environment variable NOMIS_API_KEY. R users can store the key in their .Renviron file: R will set the environment on startup, which will be visible to a python session instantiated from R.

Initialisation will fail if the key is not defined in one of these locations. Note: if for some reason you cannot register with nomisweb, you must still define an API key - just set it to an obviously invalid value.

Installation

python release (from PyPI)

pip install UKCensusAPI

(NB This will install only the core package without the examples.)

python release (from conda-forge)

conda install -c conda-forge ukcensusapi

(NB This will install only the core package without the examples.)

python main branch (from github)

pip install git+https://github.com/virgesmith/UKCensusAPI.git

or for local development, clone and separately install:

git clone git+https://github.com/<your-fork>/UKCensusAPI.git
pip install -e .

To test:

pytest

R

> devtools::install_github("virgesmith/UKCensusAPI")

Set the RETICULATE_PYTHON environment variable in your .Renviron file to the python3 interpreter, e.g. (for linux)

RETICULATE_PYTHON=$(which python3)

Usage

In your Python code import the package like e.g.:

import ukcensusapi.Nomisweb as census_api

And in R:

library(UKCensusAPI)

Queries

Queries have three distinct subtypes:

  • metadata: query a table for the fields and categories it contains
  • geography: retrieve a list of area codes of a particular type within a given region of another (larger) type.
  • data: retrieve data from a table using a query built from the metadata and geography.

Data and metadata are cached locally to minimise requests to the data providers.

Using the interactive query builder, and a known table, you can construct a programmatically reusable query selecting categories, specific category values, and (optionally) geography, See example below.

Queries can subsequently be programmatically modified to switched to a different geographical region and/or resolution.

Interactive Query

The first thing users may want to do is an interactive query. All you need to do is specify the name of a census table. The script will then iterate over the categories within the table, prompting you user to select the categories and values you're interested in.

Once done you'll be prompted to (optionally) specify a geography for the data - a geographical region and a resolution.

Finally, if you've specified the geography, the script will ask if you want to download (and cache) the data immediately.

This can be run using this script:

$ ukcensus-query <cache-dir> [--no-api-key]

An API key must be specified (see above) unless the --no-api-key flag has been set.

The script will produce the following files (in the supplied cache directory):

  • a json file containing the table metadata
  • python and R code snippets that build the query and call this package to download the data
  • (optionally, depending on above selections) the data itself

The code snippets are designed to be copy/pasted into user code. The (cached) data and metadata can simply be loaded by user code as required.

Note for R users - there is no direct R script for the interactive query largely due to the fact it will not work from within RStudio (due to the way RStudio handles stdin).

Data reuse

Existing cached data is always used in preference to downloading. The data is stored locally using a filename based on the table name and md5 hash of the query used to download the data. This way, different queries on the same table can be stored.

To force the data to be downloaded, just delete the cached data.

Query Reuse

The code snippets can simply be inserted into user code, and the metadata (json) can be used as a guide for modifying the query, either manually or automatically.

Switching Geography

Existing queries can easily be modified to switch to a different geographical area and/or a different geographical resolution.

This allows, for example, users to write models where the geographical coverage and resolution can be user inputs.

Examples of how to do this are in geoquery.py and geoquery.R.

Annotating Data

Queries will download data with a minimal memory footprint, but also metadata that provides meaning. Whilst this makes manipulating and querying the data efficient, it means that the data itself lacks human-readability. For this reason the

View on GitHub
GitHub Stars39
CategoryData
Updated27d ago
Forks16

Languages

Python

Security Score

95/100

Audited on Mar 2, 2026

No findings