SkillAgentSearch skills...

NAPSDataAnalysis

Canadian National Air Pollution Surveillance Program (NAPS) data downloader, importer, extractor, analysis, and visualization toolbox.

Install / Use

/learn @dbeaudoinfortin/NAPSDataAnalysis

README

NAPS Data Analysis Toolbox

<p align="center"> <img src="https://github.com/dbeaudoinfortin/NAPSDataAnalysis/assets/15943629/bc1f2673-05fd-4713-8be7-57119d038358"/> </p> <p align="center"><b>Download NAPS air quality data <a href="https://dbeaudoinfortin.github.io/NAPSDataAnalysis/">here</a></b></p>

Contents

Overview

Welcome to the Canada National Air Pollution Surveillance Program (NAPS) data downloader, extractor, importer, analysis, and visualization toolbox.

This project will eventually contain a collection of tools to assist in the analysis of Canadian air quality data. The data is provided by the National Air Pollution Surveillance (NAPS) program, which is part of Environment and Climate Change Canada. You can view the original data here.

I started this project because, despite the wealth of data that NAPS provides, analysing it is challenging, time consuming and error prone. The data from the NAPS portal is spread out in hundreds of XLS/XLSX/CSV files, with dozens of formats, different units of measure, different naming conventions, different time zones, etc. With this toolbox, anyone can use the downloader tools to download all of the data they need in one command. I then provide the tools needed to parse all this data, clean it up and import it into a single simple, clean database schema. After that, you can analyse the data using whatever tool works best for you. I provide a powerful dynamic query tool, a CSV exporter tool, a heat map visualization tool to generate pretty graphs, and a couple example BI dashboards to get you started with BI tools. And if all of that is too complicated, you might still be interested in either the data download web page or the clean data exports that republish the NAPS data in a consistent format.

All usage is for non-commercial research purposes. I am not affiliated with the Government of Canada.

Data Download Web Page

If you are simply looking to download NAPS air quality data, I have created a simple web page that makes it quick and easy to download CSV files. This web page is hosted on GitHub Pages and delivers static content from the /docs directory of this project.

Clean Data Exports

Last Updated March 2025

The NAPS data is complicated to handle; the data files contain many inconsistencies in structure, formatting, labelling, time zones, etc. In order to load all this data into a clean database, I needed to implement many clean-up rules and handle many exceptional cases. I believe this work could be of benefit to others.

If you are curious about the data issues I have encountered, I have started keeping track of some of the non-trivial issues here.

In the /exports directory you will find many CSV files that re-publish the same NAPS data but cleaned-up and grouped. These files were generated using the NAPSContinuousDataExporter and NAPSIntegratedDataExporter.

Integrated Data

All of the integrated data exports have been zipped to compress them. I have exported the data 3 different ways:

  • PerPollutant - contains data that is grouped into a single file for each pollutant.
  • PerSite - contains data that is grouped into a single file for each site (station).
  • PerYear - contains data that is grouped into a single file for each year.

Continuous Data

All of the continuous data exports have been zipped to compress them. There is significantly more continuous data than integrated data. The zip files will expand to about 15GB in total. I have exported the data 3 different ways:

  • PerPollutant - contains data that is grouped into a single file for each pollutant.
  • PerSite - contains data that is grouped into a single file for each site (station).
  • PerYear - contains data that is grouped into a single file for each year.

To work around GitHub's file size limit of 100MB, some of the zip files have been created as multi-part archives. You will need to download all of the parts of the archive before you can extract the main zip file.

Even More Granular Data

If you are looking for even more granular data, have a look at the /docs/data directory which contains nearly 300,000 CSV files. I have created a simple web page with simple dropdowns that make it quick and easy to download these CSV files.

Data Analysis

Dynamic Queries

The NAPS Data Analysis Toolbox provides powerful tools for the analysis of Canadian air quality data. The dynamic query tools for both the continuous and integrated data allow you to run highly customized queries to aggregate or simply retrieve the data in the way that you need it, in a single command.

The dynamic query tools have support for several types of aggregation functions, multiple levels of grouping, filtering on many dimensions (site IDs, site name, pollutants, hour, days of the week, days of the month, months, years, provinces/territories, city name, site type, site urbanization), standard deviation functions, sample counts, minimum sample counts (to optionally ensure a statistically significant number of data points), lower and upper bounds for data points (to optionally exclude outliers) and post-aggregation lower and upper bounds (to eliminate results outside the scope of interest). I'm planning to add even more functionality in the future.

Say, for example, you want to know how many times the hourly reading for carbon monoxide exceeded the national standard of 13ppm across all of Canada for the years between 1974 and 2022. You can produce the following table by running the following command:

-pollutants CO
-group1 year
-yearStart 1974
-yearEnd 2022
-aggregateFunction count
-valueLowerBound 13

The table output will look something like this:

<p align="center"> <img src="https://github.com/user-attachments/assets/b5d1d43a-8c7c-4424-aee9-596111532065" height="600" /> </p>

For more details on how to run these query tools, see the continuous and integrated data query sections below.

Heat Map Diagrams

The NAPS Data Analysis Toolbox provides tools for generating heat map diagrams for both the continuous and integrated data. These heat maps are highly customizable and can be generated in a single command. They make it much easier to spot trends in the data. For example, here is the entire history carbon monoxide readings for all NAPS sites, for all of Canada, aggregated into a single heat map diagram.

Avg_CO_By Day of the Year and Year

From this diagram alone there are some trends that immediately stand out, such as:

  • the significant impr
View on GitHub
GitHub Stars10
CategoryData
Updated3mo ago
Forks1

Languages

Java

Security Score

92/100

Audited on Dec 11, 2025

No findings