SkillAgentSearch skills...

Transdim

Machine learning for transportation data imputation and prediction.

Install / Use

/learn @xinychen/Transdim
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

transdim

MIT License Python 3.7 repo size GitHub stars

<h6 align="center">Made by Xinyu Chen • :globe_with_meridians: <a href="https://xinychen.github.io">https://xinychen.github.io</a></h6>

logo

Transportation data imputation (a.k.a., transdim).

Machine learning models make important developments in the field of spatiotemporal data modeling - like how to forecast near-future traffic states of road networks. But what happens when these models are built on incomplete data commonly collected from real-world systems (e.g., transportation system)?

<br>

Table of Content

<br>

About this Project

In the transdim project, we develop machine learning models to help address some of the toughest challenges of spatiotemporal data modeling - from missing data imputation to time series prediction. The strategic aim of this project is creating accurate and efficient solutions for spatiotemporal traffic data imputation and prediction tasks.

In a hurry? Please check out our contents as follows.

<br>

Tasks and Challenges

Missing data are there, whether we like them or not. The really interesting question is how to deal with incomplete data.

<p align="center"> <img align="middle" src="https://github.com/xinychen/transdim/blob/master/images/missing.png" width="800" /> </p> <p align = "center"> <b>Figure 1</b>: Two classical missing patterns in a spatiotemporal setting. </p>

We create three missing data mechanisms on real-world data.

  • Missing data imputation 🔥

    • Random missing (RM): Each sensor lost observations at completely random. (★★★)
    • Non-random missing (NM): Each sensor lost observations during several days. (★★★★)
    • Blockout missing (BM): All sensors lost their observations at several consecutive time points. (★★★★)
<p align="center"> <img src="https://github.com/xinychen/transdim/blob/master/images/framework.png" alt="drawing" width="800"/> </p> <p align = "center"> <b>Figure 2</b>: Tensor completion framework for spatiotemporal missing traffic data imputation. </p>
  • Spatiotemporal prediction 🔥
    • Forecasting without missing values. (★★★)
    • Forecasting with incomplete observations. (★★★★★)
<p align="center"> <img align="middle" src="https://github.com/xinychen/transdim/blob/master/images/predictor-explained.png" width="700" /> </p> <p align = "center"> <b>Figure 3</b>: Illustration of our proposed Low-Rank Autoregressive Tensor Completion (LATC) imputer/predictor with a prediction window τ (green nodes: observed values; white nodes: missing values; red nodes/panel: prediction; blue panel: training data to construct the tensor). </p> <br>

Implementation

Open data

In this project, we have adapted some publicly available data sets into our experiments. The original links for these data are summarized as follows,

For example, if you want to view or use these data sets, please download them at the ../datasets/ folder in advance, and then run the following codes in your Python console:

import scipy.io

tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')
tensor = tensor['tensor']

In particular, if you are interested in large-scale traffic data, we recommend PeMS-4W/8W/12W and UTD19. For PeMS data, you can download the data from Zenodo and place them at the folder of datasets (data path example: ../datasets/California-data-set/pems-4w.csv). Then you can use Pandas to open data:

import pandas as pd

data = pd.read_csv('../datasets/California-data-set/pems-4w.csv', header = None)

For model evaluation, we mask certain entries of the "observed" data as missing values and then perform imputation for these "missing" values.

Model implementation

In our experiments, we implemented some machine learning models mainly on Numpy, and written these Python codes with Jupyter Notebook. If you want to evaluate these models, please download and run these notebooks directly (prerequisite: download the data sets in advance). In the following implementation, we have improved Python codes (in Jupyter Notebook) in terms of both readiability and efficiency.

Our proposed models are highlighted in bold fonts.

  • imputer (imputation models)

| Notebook | Guangzhou | Birmingham | Hangzhou | Seattle | London | NYC | Pacific | | :----------------------------------------------------------- | :---: | :---: | :---: | :---: | :---: | :---: | :---: | | BPMF | ✅ | ✅ | ✅ | ✅ | ✅ | 🔶 | 🔶 | | TRMF | ✅ | 🔶 | ✅ | ✅ | ✅ | 🔶 | 🔶 | | BTRMF | ✅ | 🔶 | ✅ | ✅ | ✅ | 🔶 | 🔶 | | BTMF | ✅ | ✅ | ✅ | ✅ | ✅ | 🔶 | 🔶 | | BGCP | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | BATF | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | BTTF | 🔶 | 🔶 | 🔶 | 🔶 | 🔶 | ✅ | ✅ | | HaLRTC | ✅ | 🔶 | ✅ | ✅ | ✅ | ✅ | ✅ | | LRTC-TNN | ✅ | 🔶 | ✅ | ✅ | 🔶 | 🔶 | 🔶 |

  • predictor (prediction models)

| Notebook | Guangzhou | Birmingham | Hangzhou | Seattle | London | NYC | Pacific | | :----------------------------------------------------------- | :---: | :---: | :---: | :---: | :---: | :---: | :---: | | TRMF | ✅ | 🔶 | ✅ | ✅ | ✅ | 🔶 | 🔶 | | BTRMF | ✅ | 🔶 | ✅ | ✅ | ✅ | 🔶 | 🔶 | | BTRTF | 🔶 | 🔶 | 🔶 | 🔶 | 🔶 | ✅ | ✅ | | BTMF | ✅ | 🔶 | ✅ | ✅ | ✅ | ✅ | ✅ | | BTTF | 🔶 | 🔶 | 🔶 | 🔶 | 🔶 | ✅ | ✅ |

  • ✅ — Cover
  • 🔶 — Does not cover
  • 🚧 — Under development

For the implementation of these models, we use both dense_mat and sparse_mat (or dense_tensor and sparse_tensor) as inputs. However, it is not necessary by doing so if you do not hope to see the imputation/prediction performance in the iterative process, you can remove dense_mat (or dense_tensor) from the inputs of these algorithms.

Imputation/Prediction performance

  • Imputation example (on Guangzhou data)

![example](https://github.com/xinychen/transdim/blob/master/images/estimated_series1

Related Skills

View on GitHub
GitHub Stars1.3k
CategoryEducation
Updated2d ago
Forks305

Languages

Jupyter Notebook

Security Score

95/100

Audited on Apr 3, 2026

No findings