SatIQ

Data collection, models, and fingerprinting code for "Watch This Space: Securing Satellite Communication through Resilient Transmitter Fingerprinting"

Generate Convert Improve

Install / Use

/learn @ssloxford/SatIQ

About this skill

Quality Score

0/100

README

SatIQ

This repository contains all the data collection, model training, and analysis code for the SatIQ system, described in the paper "Watch This Space: Securing Satellite Communication through Resilient Transmitter Fingerprinting". This system can be used to authenticate Iridium satellite transmitters using high sample rate message headers.

[!NOTE] This version of the repository contains the code used for the paper "SatIQ: Extensible and Stable Satellite Authentication using Hardware Fingerprinting". The code used for "Watch This Space: Securing Satellite Communication through Resilient Transmitter Fingerprinting" can be found here.

Additional materials (SatIQ):

"SatIQ" paper: https://www.cs.ox.ac.uk/files/14805/main.pdf
Full dataset (UK): https://doi.org/10.7910/DVN/P5FUAW
Full dataset (Germany): https://doi.org/10.7910/DVN/RXWV1M
Full dataset (Switzerland): https://doi.org/10.7910/DVN/OSSJ68
Trained model weights: https://doi.org/10.7910/DVN/GANMDZ

Additional materials (Watch This Space):

"Watch This Space" paper: https://arxiv.org/abs/2305.06947
Full dataset: https://zenodo.org/record/8220494
Trained model weights: https://zenodo.org/record/8298532

When using this code, please cite the following paper: "Watch This Space: Securing Satellite Communication through Resilient Transmitter Fingerprinting". The BibTeX entry is given below:

@inproceedings{smailesWatch2023,
  author = {Smailes, Joshua and K{\"o}hler, Sebastian and Birnbach, Simon and Strohmeier, Martin and Martinovic, Ivan},
  title = {{Watch This Space}: {Securing Satellite Communication through Resilient Transmitter Fingerprinting}},
  year = {2023},
  publisher = {Association for Computing Machinery},
  booktitle = {Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security},
  location = {Copenhagen, Denmark},
  series = {CCS '23}
}

Setup

To clone the repository:

git clone --recurse-submodules https://github.com/ssloxford/SatIQ.git
cd SatIQ

A Docker container is provided for ease of use, with all dependencies installed. A recent version of Docker must be installed on your system to use this.

To run scripts locally, the following packages are required:

python3

The following Python packages are also required:

numpy
matplotlib
pandas
keras
h5py
zmq
tqdm
tensorflow
tensorflow-datasets
tensorflow-addons==0.13.0
scipy
seaborn
scikit-learn
notebook

A GPU is recommended (with all necessary drivers installed), and a moderate amount of RAM will be required to run the data preproccessing and model training.

Downloading Data (SatIQ)

The full dataset for "SatIQ" is stored on the Harvard Dataverse at the following URL: https://dataverse.harvard.edu/dataverse/satiq.

This includes three datasets for each of the three locations (UK, Germany, Switzerland), and trained model weights.

These can be downloaded from the site directly, but the following script may be preferable due to the large file size:

TODO https://eamonnbell.webspace.durham.ac.uk/2023/03/07/bulk-downloading-from-dataverse/

[!WARNING] The files are very large (approximately 1TB total). Ensure you have enough disk space before downloading.

Downloading Data (Watch This Space)

The full dataset for "Watch This Space" is stored on Zenodo at the following URL: https://zenodo.org/record/8220494.

These can be downloaded from the site directly, but the following script may be preferable due to the large file size:

#!/bin/bash

for i in $(seq -w 0 5 165); do
  printf -v j "%03d" $((${i#0} + 4))
  wget https://zenodo.org/records/8220494/files/data_${i}_${j}.tar.gz
done

[!WARNING] These files are very large (4.0GB each, 135.4GB total). Ensure you have enough disk space before downloading.

To extract the files:

#!/bin/bash

for i in $(seq -w 0 5 165); do
  printf -v j "%03d" $((${i#0} + 4))
  tar xzf data_${i}_${j}.tar.gz
done

See the instructions below on processing the resulting files for use.

Directory Structure

The training and analysis scripts expect the repository to be laid out as follows:

SatIQ
├── ...
└── data
    ├── models
    │   ├── downsample
    │   │   └── ...
    │   └── ...
    ├── tfrecord
    │   ├── ...
    │   ├── germany
    │   │   └── ...
    │   ├── switzerland
    │   │   └── ...
    │   └── uk-switzerland
    └── test
        ├── embeddings
        │   └── ...
        └── labels
            └── ...

Any downloaded model/loss files with downsample in the name should be placed in data/models/downsample, and any other model files should be placed in data/models.

The uk-switzerland directory can be populated using preprocessing/dataset-combine.sh, and the embeddings and labels directories using preprocessing/generate-embeddings.py. These are described in greater detail below.

Usage

TensorFlow Container

The script tf-container.sh provides a Docker container with the required dependencies for data processing, model training, and the analysis code. Run the script from inside the repository's root directory to ensure volumes are correctly mounted.

If your machine has no GPUs:

Modify Dockerfile to use the tensorflow/tensorflow:latest image.
Modify tf-container.sh, removing --gpus all.

SatIQ

The util directory contains the main data processing and model code:

data.py contains utilities for data loading and preprocessing.
processing.py contains utilities for processing and analysis of results.
models.py contains the main model code.
model_utils.py contains various helper classes and functions used during model construction and training.

See the data collection, training, and analysis scripts for examples on how to use these files.

Data Collection

The data-collection directory contains a docker-compose pipeline to receive signals from an SDR, extract Iridium messages, and save the data to a database file. To run under its default configuration, connect a USRP N210 via Ethernet to the host machine, and run the following (from inside the data-collection directory:

docker-compose up

Data will be stored in data/db.sqlite3.

If a different SDR is used, the iridium_extractor configuration may need to be altered. Change the docker-compose.yml to ensure the device is mounted in the container, and modify iridium_extractor/iridium_extractor.py to use the new device as a source.

The autorun.sh and restart.sh scripts are provided for convenience, in order to automate the process of stopping the container and moving the resulting database files to a permanent storage location.

Data Preprocessing

The scripts in the preprocessing directory process the database file(s) into NumPy files, and then TFRecord datasets. It is recommended to run these scripts from within the TensorFlow container described above.

[!NOTE] Converting databases to NumPy files and filtering is only necessary if you are doing your own data collection. If the "SatIQ" dataset is used, no preprocessing is required. If the "Watch This Space" dataset is used, only the np-to-tfrecord.py script is required.

[!IMPORTANT] Please note that these scripts load the full datasets into memory, and will consume large amounts of RAM. It is recommended that you run them on a machine with at least 128GB of RAM.

db-to-tfrecord.py

This script extracts database files and processes them directly into TFRecord files, optionally adding weather data if provided. This should be used preferentially over the legacy scripts described below. To run this script, use the command-line arguments as directed by the script itself.

python3 db-to-tfrecord.py --help

db-to-np-multiple.py

This script extracts the database files into NumPy files. To run, adjust path_base if appropriate (this should point to your data directory), and db_indices to point to the databases that need extracting.

The script itself runs with no arguments:

python3 db-to-np-multiple.py

The resulting files will be placed in code/processed (ensure this directory already exists).

np-filter.py

This script normalizes the IQ samples, and filters out unusable data. To run, once again adjust path_base if appropriate, and set suffixes to the NumPy suffixes that need filtering -- this will likely be the same as db_indices from the previous step.

The script runs with no arguments:

python3 np-filter.py

The resulting files will be placed in code/filtered (ensure this directory already exists).

np-to-tfrecord.py

This script converts NumPy files into the TFRecord format, for use in model training. To run this script, ensure your data has been processed into NumPy files with the following format:

samples_<suffix>.npy
ra_sat_<suffix>.npy
ra_cell_<suffix>.npy

[!NOTE] The db-to-np-multiple.py script will produce files in this format. The dataset available from Zenodo is also in this format.

The script can be used as follows:

python3 np-to-tfrecord.py --path-in <INPUT PATH> --path-out <OUTPUT PATH>

There are also the following optional parameters:

--chunk-size <CHUNK SIZE>: number of records in each chunk. Default is 50000, set to a smaller value for smaller files.
-v, --verbose: display progress.
--max-files <MAX FILES>: stop after processing the specified number of input files.
--skip-files <SKIP FILES>: skip a specified number of input files.
--no-shuffle: do not shuffle the data.
--by-id: see below.

The by_id option creates 9 datasets. The first of these contains only the most common 10% of transmitter IDs. The second contains 20%, and so on. Be careful using this option, as it creates a much larger number of files, and takes significantly longer to run.

[!WARNING] This script in particular will use a

Related Skills

node-connect

340.5k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

84.2k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

340.5k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

commit-push-pr

84.2k

Commit, push, and open a PR