FedGIE
Analytical Layer-wise Decomposition with Moore-Penrose Pseudoinverse for Stable Gradient-free Federated Learning
Install / Use
/learn @AINetworkLab/FedGIEREADME
FedGIE:Analytical Layer-wise Decomposition with Moore-Penrose Pseudoinverse for Stable Gradient-free Federated Learning
FedGIE is a gradient-free federated learning framework. Each layer update is solved as a least-squares problem with a Moore–Penrose pseudoinverse, avoiding backpropagation and black-box gradient estimation. A top-down feedback projection plus a ReLU diagonal Jacobian correction stabilizes update directions under strong Non-IID data. The repository includes both MLP and CNN reference models and supports MNIST, Fashion-MNIST, and CIFAR-10.
All
.pysources are intentionally comment-free as requested.
Manuscript status: This work has been accepted for publication in Science China Information Sciences.
✨ Features
- Closed-form per-layer updates (weights & bias via least squares with pseudoinverse).
- Top-down feedback projection to supervise lower layers without gradients.
- Activation-aware correction (diagonal Jacobian for ReLU).
- CNN support using
unfold/foldto linearize convolutions for closed-form solutions. - Federated training loop with broadcast + parameter averaging.
- Configurable Non-IID partitions via Dirichlet sampling.
🧱 Repository Layout
fedgie-multi/
├── README.md
├── requirements.txt
├── train.py
└── fedgie/
├── __init__.py
├── utils.py
├── server.py
├── client.py
├── data/
│ ├── __init__.py
│ └── partition.py
└── models/
├── __init__.py
├── mlp.py
└── cnn.py
train.py: entrypoint (CLI, initialization, training, evaluation)fedgie/server.py: global model, broadcast, aggregation, evaluationfedgie/client.py: client-side closed-form local updates (Linear + Conv2d)fedgie/models/: MLP and CNN reference modelsfedgie/data/partition.py: datasets and Dirichlet Non-IID partitioning
🔧 Installation
Requirements
- Python ≥ 3.9
- PyTorch and TorchVision (CPU or CUDA builds)
python -m venv .venv
. .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt
If you need GPU acceleration, install a CUDA-matching PyTorch wheel per the official PyTorch instructions, then install torchvision.
🚀 Quick Start
MLP + MNIST
python train.py --dataset mnist --model mlp --clients 20 --rounds 100 --batch 32 --alpha 0.6
CNN + MNIST
python train.py --dataset mnist --model cnn --clients 20 --rounds 100 --batch 32 --alpha 0.6
CNN + CIFAR-10
python train.py --dataset cifar10 --model cnn --clients 20 --rounds 100 --batch 32 --alpha 0.6
TorchVision will auto-download datasets into ./data/.
⚙️ Command-Line Arguments
| Argument | Type | Default | Description |
|--------------|--------|---------|---------------------------------------------------------------------------------|
| --dataset | str | mnist | One of: mnist, fashion_mnist, cifar10. |
| --model | str | mlp | One of: mlp, cnn. |
| --clients | int | 20 | Number of clients. |
| --rounds | int | 100 | Number of federated rounds. |
| --batch | int | 32 | Local batch size per client update. |
| --alpha | float | 0.6 | Dirichlet Non-IID strength (smaller = more skewed). |
| --seed | int | 42 | Random seed. |
| --device | str | auto | cpu, cuda, or auto (use GPU if available). |
🧠 Method Overview
Goal. Avoid unstable black-box gradient estimation in federated settings by replacing backprop with structured, per-layer least-squares updates.
Per-round, per-client outline:
- Run a single forward pass and cache each layer’s input
hand pre-activationz. - At the top layer, define a target matrix
F(e.g., one-hot labels, spatially broadcast for CNN). - Solve a bias-augmented linear regression in closed form:
- Build
Ĥ = [1; Hᵀ], computeŴ = F · pinv(Ĥ). - Extract
W = Ŵ[:,1:],b = Ŵ[:,0].
- Build
- Compute a top-down feedback signal for the previous layer by pseudo-inverting the updated mapping and apply ReLU diagonal Jacobian (element-wise mask on positive pre-activations).
- Repeat for all layers down to the input.
- Return local weights to the server; the server averages parameters to form the new global model.
CNN specifics. For Conv2d, use torch.nn.functional.unfold to produce local receptive-field matrices, solve the linear system in closed form, then use fold to project the feedback back to feature maps.
📊 Datasets & Partitioning
- Datasets: MNIST, Fashion-MNIST, CIFAR-10 (auto-downloaded to
./data/). - Non-IID Split: Dirichlet(α) over label distributions into
--clientspartitions.- Lower
alpha→ stronger heterogeneity.
- Lower
🔎 Reproducibility
- Use
--seedto fix randomness. - The script prints test accuracy each round:
round=1 acc=0.8123 round=2 acc=0.8410 ... - Tip: redirect logs for analysis:
python train.py ... | tee run.log
🧩 Extending the Project
Add a new model
- Create a file under
fedgie/models/(e.g.,resnet.py) exposing:layers: list of modules to be updated in order (e.g.,Linear/Conv2d).activations: list of activation names aligned withlayers(e.g.,["relu","relu","none"]).forward(x)andforward_cache(x)returning(h_list, z_list).
Add a new dataset
- Extend
get_datasetinfedgie/data/partition.pyto return(train, test, num_classes, in_dim_or_none).
⚠️ Known Limitations
- Memory/compute:
torch.linalg.pinvmay be heavy for large layers; reduce--batchor model width if needed. - Pooling/strides: The CNN example focuses on a minimal consistent setup. When adding pooling or different strides/dilations, ensure
unfold/foldparameters exactly match the convolution configuration. - Aggregation: Default is uniform parameter averaging; you may replace it with data-size weighted averaging.
📦 Requirements
requirements.txt contains:
torch
torchvision
For GPU builds, install CUDA-compatible wheels as per PyTorch’s official guide.
❓ FAQ
Q: Why no backprop or optimizer?
A: Each layer update is a closed-form least-squares solution, so no gradient steps are needed.
Q: How is the classification target formed?
A: We use one-hot labels (or their spatially broadcast version for CNN), then propagate top-down with pseudoinverse and activation-aware correction.
Q: Does it support GPU?
A: Yes. Set --device cuda or leave --device auto to use GPU if available.
📜 License & Citation
- License: Add a
LICENSEfile of your choice (e.g., MIT) at the repository root. - Citation: If this repository is useful in your research or product, please cite it. Example:
@misc{fedgie2025,
title = {Analytical Layer-wise Decomposition with Moore--Penrose Pseudoinverse for Stable Gradient-Free Federated Learning},
author = {Ruoyan XIONG, Yuepeng LI, Zhexiong LI, Lin GU, Deze ZENG, Quan CHEN & Minyi GUO},
year = {2025},
note = {Code available at: https://github.com/AINetworkLab/FedGIE}
}
📬 Contact
Feel free to contact us at ryxiong@cug.edu.cn
