TraceFL
TraceFL is a novel mechanism for Federated Learning that achieves interpretability by tracking neuron provenance. It identifies clients responsible for global model predictions, achieving 99% accuracy across diverse datasets (e.g., medical imaging) and neural networks (e.g., GPT).
Install / Use
/learn @warisgill/TraceFLREADME
TraceFL: Interpretability-Driven Debugging in Federated Learning via Neuron Provenance
Accepted at 2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE) [Arxiv Paper Link]
For questions or feedback, please contact at waris@vt.edu. The code is written in Flower FL Framework, the most widely used FL framework.
1. TraceFL
TraceFL is a tool designed to provide interpretability in Federated Learning (FL) by identifying clients responsible for specific predictions made by a global model.

1.1 Overview
Federated Learning (FL) enables multiple clients (e.g., hospitals ) to collaboratively train a global model without sharing their raw data. However, this distributed and privacy-preserving setup makes it challenging to attribute a model's predictions to specific clients. Understanding which clients are most responsible for a model's output is crucial for debugging, accountability, and incentivizing high-quality contributions.
TraceFL addresses this challenge by dynamically tracking the significance of neurons in a global model's prediction and mapping them back to the corresponding neurons in each participating client's model. This process allows FL developers to localize the clients most responsible for a prediction without accessing their raw training data.
1.2 Key Features
- Neuron Provenance: A novel technique that tracks the flow of information from individual clients to the global model, identifying the most influential clients for each prediction.
- High Accuracy: TraceFL achieves 99% accuracy in localizing responsible clients in both image and text classification tasks.
- Wide Applicability: Supports multiple neural network architectures, including CNNs (e.g., ResNet, DenseNet) and any transformers model from HuggingFace library (e.g., BERT, GPT).
- Scalability and Robustness: Efficiently scales to thousands of clients and maintains high accuracy under varying data distributions and differential privacy settings.
- No Client-Side Instrumentation Required: Runs entirely on the central server, without needing access to clients' training data or modifications to the underlying fusion algorithm.
2. Running TraceFL
The
.sh(e.g.,job_training_all_exps.sh) scripts andTraceFL/tracefl/conf/base.yamlprovided in this artifact can be used to regenerate any experiment results presented in the paper. `
The experiments cover various aspects of federated learning, including:
- Image and Text Classification: Evaluating the performance of different models and datasets in federated settings.
- Differential Privacy: Analyzing the impact of differential privacy on model training and TraceFL's localizability.
- Scalability: Testing the scalability of TraceFL with varying numbers of clients and rounds.
- Dirichlet Alpha Tuning: Exploring the effects of different Dirichlet alpha values on data distribution, TraceFL's localizability, and model performance.
2.1 Experiments Configuration Overview
- Image Classification:
- Models: ResNet18, DenseNet121
- Datasets: MNIST, CIFAR-10, PathMNIST, OrganAMNIST
- Number of Rounds: 25-50
- Text Classification:
- Models: OpenAI GPT, Google BERT
- Datasets: DBPedia, Yahoo Answers
- Number of Rounds: 25
2.2 Differential Privacy Analysis
These experiments evaluate the impact of differential privacy on TraceFL by applying different noise levels and clipping norms.
- Models: DenseNet121, OpenAI GPT
- Datasets: MNIST, PathMNIST, DBPedia
- Noise Levels: 0.0001, 0.0003, 0.0007, 0.0009, 0.001, 0.003
- Clipping Norms: 15, 50
- Number of Rounds: 15
2.3 Scalability Experiments
Scalability tests involve running experiments with varying numbers of clients and rounds to assess how well TraceFL scales.
- Models: OpenAI GPT
- Dataset: DBPedia
- Number of Clients: 200, 400, 600, 800, 1000
- Clients per Round: 10, 20, 30, 40, 50
- Number of Rounds: 15, 100
2.4 Dirichlet Alpha Experiments
These experiments explore the effect of different Dirichlet alpha values on data partitioning, model training, and TraceFL's localizability.
- Models: OpenAI GPT, DenseNet121
- Datasets: Yahoo Answers, DBPedia, PathMNIST, OrganAMNIST, MNIST, CIFAR-10
- Dirichlet Alpha Values: 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1
- Number of Clients: 100
- Clients per Round: 10
- Number of Rounds: 15
2.5 Results and Log Files
Each experiment's output will be logged in the logs directory, providing detailed information about the training process and results.
3. Potential Use Cases of TraceFL
- Debugging and Fault Localization: Identify and isolate faulty or malicious clients responsible for incorrect or suspicious predictions in federated learning models.
- Enhancing Model Quality, Fairness, and Incentivization: Improve model performance by rewarding high-quality clients, ensuring fair client contributions, and incentivizing continued participation from beneficial clients.
- Client Accountability and Security: Increase accountability by tracing model decisions back to specific clients, deterring malicious behavior, and ensuring secure contributions.
- Optimized Client Selection and Efficiency: Dynamically select the most beneficial clients for training to enhance model performance and reduce communication overhead.
- Interpretable Federated Learning in Sensitive Domains: Provide transparency and interpretability in federated learning models, crucial for compliance, trust, and ethical considerations in domains like healthcare and finance.
4. Citation
Latex
@inproceedings{gill2025tracefl,
title = {{TraceFL: Interpretability-Driven Debugging in Federated Learning via Neuron Provenance}},
author = {Gill, Waris and Anwar, Ali and Gulzar, Muhammad Ali},
booktitle = {2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE)},
year = {2025},
organization = {IEEE},
}
Related Skills
gh-issues
350.1kFetch GitHub issues, spawn sub-agents to implement fixes and open PRs, then monitor and address PR review comments. Usage: /gh-issues [owner/repo] [--label bug] [--limit 5] [--milestone v1.0] [--assignee @me] [--fork user/repo] [--watch] [--interval 5] [--reviews-only] [--cron] [--dry-run] [--model glm-5] [--notify-channel -1002381931352]
node-connect
350.1kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
109.9kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
Writing Hookify Rules
109.9kThis skill should be used when the user asks to "create a hookify rule", "write a hook rule", "configure hookify", "add a hookify rule", or needs guidance on hookify rule syntax and patterns.
