Geocode
Geocode postcodes, addresses, LLSOAs or Constituencies using the Code Point Open database, ONS data or GMaps API
Install / Use
/learn @SheffieldSolar/GeocodeREADME
Geocode
Geocode various geographical entities including postcodes and LLSOAs. Reverse-geocode to LLSOA or GSP/GNode.
Latest Version: 1.4
What is this repository for?
- Use Code Point Open and/or Google Maps API to geocode postcode and/or address into lat/lon co-ordinates.
- Use ONS & NRS LLSOA Population Weighted Centroids to geocode Lower Layer Super Output Areas.
- Use GIS data from data.gov.uk to geocode GB constituencies based on geospatial centroid.
- Use GIS boundaries data from ONS and NRS to reverse-geocode lat/lon to LLSOA.
- Use GIS data from the National Energy System Operator's (NESO) data portal to reverse-geocode to a GSP or GNode.
- Use GIS boudnaries from the Europa/Eurostats API to reverse-geocode to NUTS regions.
Benefits
- Prioritises Code Point Open for postcode lookup to save expensive GMaps API bills.
- Caches GMaps API queries locally so that repeated queries can be fulfilled without a new API request.
- Fetches data automatically at runtime from public APIs where possible.
How do I get set up?
Developed and tested with Python 3.12, should work for 3.11+.
Make sure you have Git installed - Download Git
Run pip install geocode-ss
or pip install git+https://github.com/SheffieldSolar/Geocode/
Check that the installation was successful by running the following command from terminal / command-line:
>> geocode -h
This will print the helper for the limited command line interface which provides tools to help get set up and to clear the cache when needed:
usage: geocode.py [-h] [--clear-cache] [--debug] [--setup SETUP [SETUP ...]]
[--load-cpo-zip </path/to/zip-file>] [--load-gmaps-key <gmaps-api-key>]
This is a command line interface (CLI) for the Geocode module version 0.12.1.
optional arguments:
-h, --help show this help message and exit
--clear-cache Specify to delete the cache files.
--debug Geocode some sample postcodes/addresses/LLSOAs.
--setup SETUP [SETUP ...]
Force download all datasets to local cache (useful if running
inside a Docker container i.e. run this as part of image build).
Possible values are 'neso', 'cpo', 'ons', 'eurostat' or 'all'.
--load-cpo-zip </path/to/zip-file>
Load the Code Point Open data from a local zip file.
--load-gmaps-key <gmaps-api-key>
Load a Google Maps API key.
Jamie Taylor & Ethan Jones, 2019-10-08
No additional set up is needed at this stage - the required datasets will be downloaded (or extracted from the packaged data) the first time you use the associated method. If you want to force the Geocode library to download/extract all data, you can run the following command:
>> geocode --setup
This is especially useful if you are installing / running the library inside a container - using the above command you can download the data once during the image build rather than have to re-download every time the container is destroyed.
Important
Note that this library makes use of the Shapely library from PyPi, which often does not install correctly on Windows machines due to some missing dependencies. If using Windows and you see an error like OSError: [WinError 126] The specified module could not be found, you should install Shapely from one of the unofficial binaries here.
All data required by this library is either packaged with the code or is downloaded at runtime from public APIs. Some data is subect to licenses and/or you may wish to manually update certain datasets (e.g. OS Code Point Open) - see appendix.
Usage
Within a Python script
Within your Python code, I recommend using the context manager so that GMaps cache will be automatically flushed on exit. See example.py:
import os
import logging
from geocode import Geocoder
def main():
with Geocoder() as geocoder:
# Geocode some postcodes / addresses...
print("GEOCODE POSTCODES / ADDRESSES:")
postcodes = ["S3 7RH", "S3 7", "S3", None, None, "S3 7RH"]
addresses = [None, None, None, "Hicks Building, Sheffield", "Hicks", "Hicks Building"]
results = geocoder.geocode(postcodes, "postcode", address=addresses)
for postcode, address, (lat, lon, status) in zip(postcodes, addresses, results):
print(f" Postcode + Address: `{postcode}` + `{address}` -> {lat:.3f}, {lon:.3f} "
f"({geocoder.status_codes[status]})")
# Geocode some LLSOAs...
print("GEOCODE LLSOAs:")
llsoas = ["E01033264", "E01033262"]
results = geocoder.geocode_llsoa(llsoas)
for llsoa, (lat, lon) in zip(llsoas, results):
print(f" LLSOA: `{llsoa}` -> {lat:.3f}, {lon:.3f}")
# Geocode some Constituencies...
print("GEOCODE CONSTITUENCIES:")
constituencies = ["Sheffield Central", "Sheffield Hallam"]
results = geocoder.geocode_constituency(constituencies)
for constituency, (lat, lon) in zip(constituencies, results):
print(f" Constituency: `{constituency}` -> {lat:.3f}, {lon:.3f}")
# Reverse-geocode some lat/lons to LLSOAs...
print("REVERSE-GEOCODE TO LLSOA:")
latlons = [(53.384, -1.467), (53.388, -1.470)]
results = geocoder.reverse_geocode_llsoa(latlons)
for llsoa, (lat, lon) in zip(results, latlons):
print(f" LATLON: {lat:.3f}, {lon:.3f} -> `{llsoa}`")
# Reverse-geocode some lat/lons to GSP...
print("REVERSE-GEOCODE TO GSP:")
latlons = [(53.384, -1.467), (53.388, -1.470)]
results = geocoder.reverse_geocode_gsp(latlons)
for (lat, lon), region_id in zip(latlons, results):
print(f" LATLON: {lat:.3f}, {lon:.3f} -> {region_id}")
# Reverse-geocode some lat/lons to 2021 NUTS2...
print("REVERSE-GEOCODE TO NUTS2:")
latlons = [(51.3259, -1.9613), (47.9995, 0.2335), (50.8356, 8.7343)]
results = geocoder.reverse_geocode_nuts(latlons, year=2021, level=2)
for (lat, lon), nuts2 in zip(latlons, results):
print(f" LATLON: {lat:.3f}, {lon:.3f} -> {nuts2}")
if __name__ == "__main__":
log_fmt = "%(asctime)s [%(levelname)s] [%(filename)s:%(funcName)s] - %(message)s"
fmt = os.environ.get("GEOCODE_LOGGING_FMT", log_fmt)
datefmt = os.environ.get("GEOCODE_LOGGING_DATEFMT", "%Y-%m-%dT%H:%M:%SZ")
logging.basicConfig(format=fmt, datefmt=datefmt, level=os.environ.get("LOGLEVEL", "WARNING"))
main()
>> python example.py
GEOCODE POSTCODES / ADDRESSES:
Postcode + Address: `S3 7RH` + `None` -> 53.381, -1.486 (Full match with GMaps)
Postcode + Address: `S3 7` + `None` -> nan, nan (Failed)
Postcode + Address: `S3` + `None` -> nan, nan (Failed)
Postcode + Address: `None` + `Hicks Building, Sheffield` -> 53.381, -1.486 (Full match with GMaps)
Postcode + Address: `None` + `Hicks` -> nan, nan (Failed)
Postcode + Address: `S3 7RH` + `Hicks Building` -> 53.381, -1.486 (Full match with GMaps)
GEOCODE LLSOAs:
LLSOA: `E01033264` -> 53.384, -1.467
LLSOA: `E01033262` -> 53.388, -1.470
GEOCODE CONSTITUENCIES:
Constituency: `Sheffield Central` -> 53.376, -1.464
Constituency: `Sheffield Hallam` -> 53.396, -1.604
REVERSE-GEOCODE TO LLSOA:
LATLON: 53.384, -1.467 -> E01033264
LATLON: 53.388, -1.470 -> E01033262
REVERSE-GEOCODE TO GSP:
LATLON: 53.384, -1.467 -> ('PITS_3', '_M')
LATLON: 53.388, -1.470 -> ('NEEP_3', '_M')
REVERSE-GEOCODE TO NUTS2:
LATLON: 51.326, -1.961 -> UKK1
LATLON: 47.999, 0.234 -> FRG0
LATLON: 50.836, 8.734 -> DE72
In the above example, postcodes and addresses are lists of strings, but it should be fine to use any iterator such as Numpy arrays or Pandas DataFrame columns, although the geocode() method will still return a list of tuples.
When reverse-geocoding to GSP, the reverse_geocode_gsp() method returns both a list of Region IDs and a corresponding list of GSP / GNodes etc. Since the relationship between Region:GSP:GNode is theoretically MANY:MANY:MANY, the second object returned is a list of lists of dicts. This is rather clunky and will likely be refined in a future release. An alternative use case could disregard this second return object and instead make use of the Geocoder.gsp_lookup instance attribute - this is a Pandas DataFrame giving the full lookup between Regions / GSPs / GNodes / DNO License Areas (i.e. this dataset on the ESO Data Portal). In testing, the reverse_geocode_gsp() method was able to allocate ~1 million random lat/lons to the correct GSP in average wall-clock time of around 300 seconds.
Use with a proxy
If your network configuration requires the use of a proxy server when downloading data from external URLs/APIs, you can specify the proxies parameter when instantiating the Geocoder class.
e.g.
from geocode import Geocoder
def main():
geocoder = Geocoder(
proxies=dict(
http="http://example.com",
https="http://example.com",
),
ssl_verify=False
)
In some network configurations, it may also be necessary to disable SSL certificate checks, which you can do by setting ssl_verify=False. This is not recommended!
Command Line Utilities
latlons2llsoa
This utility can be used to load a CSV file containing latitudes and longitudes and to reverse-geocode them to LLSOAs (optionally switching to Data Zones in Scotland):
>> latlons2llsoa -h
usage: latlons2llsoa [-h] -f </path/to/file> -o </path/to/file> [--datazones]
This is a command lin
