Comgra
A tool to analyze and debug neural networks in pytorch. Use a GUI to traverse the computation graph and view the data from many different angles at the click of a button.
Install / Use
/learn @FlorianDietz/ComgraREADME
Comgra: Computation Graph Analysis
<p align="center"> <img src="src/assets/brandcrowd_logos/FullLogo.png" title="ComgraLogo" height="300" width="300"/> </p>- Overview
- Installation
- Usage
- Tutorial
- Other Features
- Custom Visualization
- Dynamic Recordings
- Known Issues
- Future Development: Anomaly Detection and Correlation Analysis
Overview
Comgra helps you analyze and debug neural networks in pytorch.
It records your network internals, visualizes the computation graph, and provides a GUI to investigate any part of your network from a variety of viewpoints.
Move along the computation graph, check for outliers, investigate both individual data points and summary statistics, compare gradients, automatically record special cases, and more.
Comgra records everything that could be relevant to you, and allows you to inspect your network's behavior from many different angles.
Suitable both for novices and for professional neural architecture designers: Create a simple visualization of your network to understand what is happening under the hood, or perform advanced analyses and trace anomalies through the computation graph.
| <img src="src/assets/screenshots_for_tutorial/main_overview.png" width="100%"/> | -
Comgra's GUI has three parts:
- A dependency graph that visualizes how the tensors in your network depend on each other
- Selectors that let you choose under what lens you want to inspect the tensors
- An output that lists both summary statistics and the values of individual neurons for the selected tensors
Each rectangle in the dependency graph is a node that represents a named tensor. The colors indicate the roles of the tensor in the network, such as input, intermediate result, parameter, etc.
When you select a node it becomes highlighted, along with all nodes that it depends on (to the left) and that depend on it (to the right). Only the links for the selected node are shown by default to avoid visual clutter, but by clicking on one node after the other you can explore the entire dependency graph.
If a node has a dotted border on one side, it indicates that it does not have any dependency (left) or dependent (right) on that iteration. If a connection is drawn with a thinner line, it indicates that some of the tensors in the node have this connection, but the currently selected one does not. In the example network of the tutorial, this is the case for the node 'subnet_pre', which summarizes all four parameters of the module with that name. You can use the "Role of Tensor" selector to switch to another parameter in that module, which will change the connections.
The dependency graph is generated automatically based on the computation graph used by pytorch and the names you assign to tensors through comgra. It is a subgraph of the computation graph, but it is much easier to understand because it is smaller and skips all the distracting details.
This cutting away of details also makes it easier to compare different variants of architectures: Their computation graphs may look different, but the simplified dependency graphs are the same.
</details>Installation
Install with pip:
pip install comgra
Usage
To use comgra, modify your python code with the following commands in the appropriate places. Most of it just tells comgra what you are currently doing so that it knows how to associate the tensors you register. The file src/scripts/run.py (found here) contains a documented example that you can copy and that will be explained in detail below.
import comgra
from comgra.recorder import ComgraRecorder
# Define a recorder
comgra.my_recorder = ComgraRecorder(...)
# Track your network parameters
comgra.my_recorder.track_module(...)
# Optionally, add some notes for debugging
comgra.my_recorder.add_note(...)
# Optionally, record KPIs (like Tensorboard)
comgra.my_recorder.record_kpi_in_graph()
# Call this whenever you start a new training step you want to record.
# Each training step may be composed of multiple iterations.
comgra.my_recorder.start_batch(...)
# Call this whenever you start the forward pass of an iteration:
comgra.my_recorder.start_iteration(...)
# Register any tensors you may want to investigate:
comgra.my_recorder.register_tensor(...)
# Create some additional, optional connections for cases where the
# computation graph does not fully reflect the connections you want to see,
# e.g. because of detach() commands or non-differentiable dependencies.
comgra.my_recorder.add_tensor_connection(...)
# Call these whenever you apply losses and propagate gradients:
comgra.my_recorder.record_current_gradients(...)
# Call this whenever you end an iteration:
comgra.my_recorder.finish_iteration()
# Call this whenever you end a training step:
comgra.my_recorder.finish_batch()
# Call this when you are done
comgra.my_recorder.finalize()
Not all of these commands are necessary. The following is a minimal example. You can run it directly on Colab
import torch
import torch.nn as nn
import torch.optim as optim
import comgra
from comgra.objects import DecisionMakerForRecordingsFrequencyPerType
from comgra.recorder import ComgraRecorder
# Define a simple model
class SimpleModel(nn.Module):
def __init__(self):
super(SimpleModel, self).__init__()
self.layer0 = nn.Linear(5, 5)
self.layer1 = nn.Linear(5, 5)
def forward(self, x):
x = self.layer0(x)
return self.layer1(x)
# Initialize comgra
comgra.my_recorder = ComgraRecorder(
comgra_root_path="/my/path/for/storing/data",
group="name_of_experiment_group",
trial_id="example_trial",
decision_maker_for_recordings=DecisionMakerForRecordingsFrequencyPerType(min_training_steps_difference=10),
)
comgra.my_recorder.add_note("This is an optional log message that will show up in the 'Notes' tab.")
# Create model, loss function, and optimizer
model = SimpleModel()
criterion = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)
comgra.my_recorder.track_module("main_model", model)
# Generate some dummy data
inputs = torch.randn(100, 5)
targets = 2 * inputs + torch.randn(100, 5) * 0.1
# Training loop
num_epochs = 100
for epoch in range(num_epochs):
comgra.my_recorder.start_batch(epoch, inputs.shape[0])
comgra.my_recorder.start_iteration()
# Forward
comgra.my_recorder.register_tensor("inputs", inputs, is_input=True)
outputs = model(inputs)
comgra.my_recorder.register_tensor("outputs", outputs)
comgra.my_recorder.register_tensor("targets", targets, is_target=True)
loss = criterion(outputs, targets)
comgra.my_recorder.register_tensor("loss", loss, is_loss=True)
comgra.my_recorder.record_kpi_in_graph("loss", "", loss)
# Backward
optimizer.zero_grad()
loss.backward()
optimizer.step()
comgra.my_recorder.record_current_gradients(f"gradients")
comgra.my_recorder.finish_iteration()
comgra.my_recorder.finish_batch()
comgra.my_recorder.finalize()
When your code runs, comgra will store data in the folder you specified with ComgraRecorder(comgra_root_path="/my/path/for/storing/data", group="name_of_experiment_group").
In the process, it will automatically organize everything, extract statistics, and build the dependency graph.
To start the GUI and visualize your results, run
comgra --path "/my/path/for/storing/data/name_of_experiment_group"
Note that "--path" should include both the "comgra_root_path" and the "group" parameter you gave to ComgraRecorder. You can start the GUI while the script is still running and it will automatically load new data as it becomes available.
Tutorial - Debugging an Example Network
The file src/scripts/run.py (found here) trains a neural network on an example task. This network contains a subtle bug, and in this tutorial we will show you how you can use comgra to find that bug.
For convenience, you can run the file from the commandline using
comgra-test-run
The results of that run will be stored in a local folder of the library. You can start the GUI on this data by running
comgra --use-path-for-test-run
You can also check out the GUI on Colab, with pre-generated data.
The Task and the Architecture
We use a synthetic task that is designed to test a neural network's ability to generalize to longer sequences, while being very simple and human-interpretable. The input is a sequence of N tuples of 5 numbers between -1.0 and 1.0. The network should treat these as 5 separate, independent sequences. Its objective is to sum up each of the sequences and decide if their sum is positive. The target consists of 5 numbers, one for each sequence, which is a 1 if the sum is positive and a 0 otherwise.
Our architecture is a simple recurrent neural network that is composed of three submodules. It's nothing fancy, but illustrates how comgra can be integrated into an architecture.
We run two variants of the architecture. The original variant contains a bug, which we will discover later in this section of the Readme. For convenience, we run both trials in one script, but in a real use case the second variant would have been implemente
Related Skills
node-connect
339.3kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
83.9kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
339.3kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
83.9kCommit, push, and open a PR
