CrimeLinkageSiamese
Official implementation of the AAAI 2026 paper “Enhancing Binary-Encoded Crime Linkage Analysis Using Siamese Networks”.
Install / Use
/learn @AlberTgarY/CrimeLinkageSiameseREADME
Imperial College London | University of Birmingham | University of Leicester | UK National Crime Agency
Yicheng Zhan<sup>1</sup>, Fahim Ahmed<sup>1</sup>, Amy Burrell<sup>2</sup>, Matthew Tonkin<sup>3</sup>, Sarah Galambos<sup>4</sup>, Jessica Woodhams<sup>2</sup>, Dalal Alrajeh<sup>1</sup>
<sup>1</sup>Imperial College London, <sup>2</sup>University of Birmingham, <sup>3</sup>University of Leicester, <sup>4</sup>UK National Crime Agency
</div>@article{zhan2026enhancing,
author = {Zhan, Yicheng and Ahmed, Fahim and Burrell, Amy and Tonkin, Matthew and Galambos, Sarah and Woodhams, Jessica and Alrajeh, Dalal},
title = {Enhancing Binary Encoded Crime Linkage Analysis Using Siamese Network},
journal = {Proceedings of the AAAI Conference on Artificial Intelligence},
volume = {40},
number = {46},
pages = {39576--39584},
year = {2026},
month = mar,
doi = {10.1609/aaai.v40i46.41309},
url = {https://ojs.aaai.org/index.php/AAAI/article/view/41309}
}
Overview
We propose a Siamese Autoencoder framework for crime linkage analysis that learns meaningful latent representations from high-dimensional, sparse, binary-encoded crime data. Using the Violent Crime Linkage Analysis System (ViCLAS) dataset from the UK National Crime Agency, our approach integrates geographic-temporal features at the decoder stage to amplify behavioral representations, achieving up to 9% AUC improvement over traditional methods.
Installation
Clone the repository and install dependencies:
git clone https://github.com/AlberTgarY/CrimeLinkageSiamese.git
cd CrimeLinkageSiamese
pip install torch numpy pandas scikit-learn tqdm openpyxl matplotlib --break-system-packages
Requirements:
- Python 3.8+
- PyTorch 1.9+
- 16GB RAM minimum (32GB recommended)
Quick Start
Training
Train the Siamese Autoencoder on your crime linkage dataset:
import pandas as pd
from Utils import EuclideanDistance
from train import evaluate_siamese
# Load data
df = pd.read_csv("your_crime_data.csv", index_col="VA_ID")
# Train model
evaluate_siamese(
df,
behaviours_list=None, # Use all behavioral features
contexts_list=None, # Use all contextual features
euclidean_distance=EuclideanDistance(),
split_mode="random",
num_epochs=2,
batch_size=128,
learning_rate=1e-3,
weight_recon=1e4,
repeat=1,
data="all",
name="siamese_model",
dir_geo=None, # Optional: path to geographic-temporal data
device="cuda" # or "cpu"
)
Testing
Evaluate a trained model:
from test import test_siamese
# Test model
test_siamese(
df,
behaviours_list=None,
contexts_list=None,
model_path="map5.pt", # Path to trained model
data="all",
device="cuda",
name="map5_evaluation"
)
Geographic-Temporal Integration
To incorporate geographic-temporal features:
# Process geographic-temporal data
from geo_loader import GeoDataSaver
saver = GeoDataSaver("path/to/geo_data.csv")
saver.save_pt("output/directory/")
# Train with geo-temporal integration
evaluate_siamese(
df,
...,
dir_geo="path/to/geo_directory", # Enable decoder-stage integration
device="cuda"
)
Data Access
Important: Due to strict confidentiality and data-sharing agreements with the UK National Crime Agency, the ViCLAS dataset cannot be publicly shared.
Requesting Access:
Researchers interested in accessing ViCLAS data should:
- Submit a formal data access request to the Serious Crime Analysis Section (SCAS) of the UK National Crime Agency
- Provide detailed research proposals outlining intended use
- Comply with all ethical requirements and data protection regulations
- Obtain necessary institutional approvals
Contact: UK National Crime Agency, Serious Crime Analysis Section Website: https://www.nationalcrimeagency.gov.uk/
Access to the data used in this research was granted through requests R123, R128, R182a, and R182b.
Ethics Statement
This research implements safeguards for responsible deployment:
- Human-in-the-Loop: System supports, not replaces, investigative decision-making
- Routine Bias Audits: Continuous monitoring for demographic/geographic disparities
- Transparent Evaluation: Clear communication of performance, assumptions, and limitations
- Continuous Adaptation: Periodic retraining to address temporal distribution shifts
This approach aligns with the National Police Chiefs' Council Covenant for Using AI in Policing.
Acknowledgments
This research was partially supported by funding from the National Crime Agency, UK. We thank the analysts from the Serious Crime Analysis Section at the National Crime Agency for their valuable assistance with data preparation and insightful feedback.
License
This code is provided for academic research purposes only. Commercial use requires explicit permission from the authors and the UK National Crime Agency.
Contact
For questions about the code or methodology:
- Dalal Alrajeh: dalal.alrajeh04@imperial.ac.uk
For data access inquiries, please contact the UK National Crime Agency directly.
Related Skills
node-connect
350.1kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
109.9kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
350.1kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
350.1kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
