SkillAgentSearch skills...

Thoad

Lightweight performat Python 3.12+ automatic differentiation system that leverages PyTorch’s computational graph to compute arbitrary-order partial derivatives.

Install / Use

/learn @mntsx/Thoad

README

<!-- # PyTorch High Order AutoDifferentiator (thoad) --> <!-- # $\text{{Py}{\color{#EE4C2C}T}{orch }{\color{#EE4C2C}H}{igh }{\color{#EE4C2C}O}{rder }{\color{#EE4C2C}A}{utomatic }{\color{#EE4C2C}D}{ifferentiation}}$ --> <img src="title.svg" alt="Logo" width="100%"/> <div align="center">

License: MIT PyPi version Python 3.12+ PyTorch 2.4+

</div> <br>

[!NOTE] This package is still in an experimental stage. It may exhibit unstable behavior or produce unexpected results, and is subject to possible future structural modifications.

<br>

About

thoad is a lightweight reverse-mode automatic differentiation engine written entirely in Python that works over PyTorch’s computational graph to compute high order partial derivatives. Unlike PyTorch’s native autograd - which is limited to first-order native partial derivatives - thoad is able to performantly propagate arbitray-order derivatives throughout the graph, enabling more advanced derivative-based workflows.

Core Features

  • Python 3.12+: thoad is implemented in Python 3.12, and its compatible with any higher Python version.
  • Built on PyTorch: thoad uses PyTorch as its only dependency. It is compatible with +70 PyTorch operator backward functions.
  • Arbitrary-Order Differentiation: thoad can compute arbitrary-order partial derivatives - including cross node derivatives.
  • Adoption of the PyTorch Computational Graph: thoad integrates with PyTorch tensors by adopting their internally traced subgraphs.
  • High Performance: thoad hessian comp time scales asymptotically better than torch.autograd's, remaining closer to jax.jet performance.
  • Non-Sequential Graph Support: Unlike jax.jet, thoad supports differentiation on arbitrary graph topologies, not only sequentials.
  • Non-Scalar Differentiation: Unlike torch.Tensor.backward, thoad allows launching differentiation from non-scalar tensors.
  • Support for Backward Hooks: thoad allows registering backward hooks for dynamic tuning of propagated high-order derivatives.
  • Diagonal Optimization: thoad detects and avoids duplication of cross diagonal dimensions during back-propagation.
  • Symmetry Optimization: Leveraging Schwarz’s theorem, thoad removes redundant derivative block computations.

Installation

thoad can be installed either from PyPI or directly from the GitHub repository.

  • From PyPI

    pip install thoad
    
  • From GitHub Install directly with pip (fetches the latest from the main branch):

    pip install git+https://github.com/mntsx/thoad.git
    

    Or, if you prefer to clone and install in editable mode:

    git clone https://github.com/mntsx/thoad.git
    cd thoad
    pip install -e .
    
<br>

Using the Package

thoad exposes two primary interfaces for computing high-order derivatives:

  1. thoad.backward: a function-based interface that closely resembles torch.Tensor.backward. It provides a quick way to compute high-order pertial derivatives without needing to manage an explicit controller object, but it offers only the core functionality (derivative computation and storage).
  2. thoad.Controller: a class-based interface that wraps the output tensor’s subgraph in a controller object. In addition to performing the same high-order backward pass, it gives access to advanced features such as fetching specific cross partials, inspecting batch-dimension optimizations, overriding backward-function implementations, retaining intermediate partials, and registering custom hooks.
<br>

thoad.backward

The thoad.backward function computes high-order partial derivatives of a given output tensor and stores them in each leaf tensor’s .hgrad attribute.

Arguments:

  • tensor: A PyTorch tensor from which to start the backward pass. This tensor must have require_grad=True and be part of a differentiable graph.

  • order: A positive integer specifying the maximum order of derivatives to compute.

  • gradient: A tensor with the same shape as tensor to seed the vector-Jacobian product (i.e., custom upstream gradient). If omitted, the primal vector space is not reduced.

  • crossings: A boolean flag (default=False). If set to True, cross partial derivatives (i.e., derivatives that involve more than one distinct leaf tensor) will be computed.

  • groups: An iterable of disjoint groups of leaf tensors. When crossings=False, only those cross partials whose participating leaf tensors all lie within a single group will be calculated. If crossings=True and groups is provided, a ValueError will be raised (they are mutually exclusive).

  • keep_batch: A boolean flag (default=False) that controls how output dimensions are organized in the computed derivatives.

    • When keep_batch=False:
      The derivative preserves one first flattened "primal" axis, followed by each original partial shape, sorted in differentiation order. Concretelly:

      • A single "primal" axis that contains every element of the graph output tensor (flattened into one dimension).
      • A group of axes per derivative order, each matching the shape of the respective differentially targeted tensor.

      For an N-th order derivative of a leaf tensor with input_numel elements and an output with output_numel elements, the deerivative shape is:

      • Axis 1: indexes all output_numel outputs
      • Axes 2…(sum(Nj)+1): each indexes all input_numel inputs
    • When keep_batch=True:
      The derivative shape follows the same ordering as in the previous case, but includes a series of "independent dimensions" immediately after the "primal" axis.

      • Axis 1 flattens all elements of the output tensor (size = output_numel).
      • Axes 2...(k+i) correspond to dimensions shared by multiple input tensors and treated independently throughout the graph. These are dimensions that are only operated on element-wise (e.g. batch dimensions).
      • Axes (k+i+1)...(k+i+sum(Nj)+1) each flatten all input_numel elements of the leaf tensor, one axis per derivative order.
  • keep_schwarz: A boolean flag (default=False). If True, symmetric (Schwarz) permutations are retained explicitly instead of being canonicalized/reduced, useful for debugging or inspecting non-reduced layouts.

Returns:

  • An instance of thoad.Controller wrapping the same tensor and graph.

Executing Autodifferentiation via thoad.backward

import torch
import thoad
from torch.nn import functional as F

### Normal PyTorch workflow
X = torch.rand(size=(10,15), requires_grad=True)
Y = torch.rand(size=(15,20), requires_grad=True)
Z = F.scaled_dot_product_attention(query=X, key=Y.T, value=Y.T)

### Call thoad backward
order = 2
thoad.backward(tensor=Z, order=order)

### Checks
# check derivative shapes through the attribute aggregated to torch.Tensor: hgrad
for o in range(1, 1 + order):
    assert X.hgrad[o - 1].shape == (Z.numel(), *(o * tuple(X.shape)))
    assert Y.hgrad[o - 1].shape == (Z.numel(), *(o * tuple(Y.shape)))
<br>

thoad.Controller

The Controller class wraps a tensor’s backward subgraph in a controller object, performing the same core high-order backward pass as thoad.backward while exposing advanced customization, inspection, and override capabilities.

Instantiation

Use the constructor to create a controller for any tensor requiring gradients:

controller = thoad.Controller(tensor=GO)  # takes graph output tensor
  • tensor: A PyTorch Tensor with requires_grad=True and a non-None grad_fn.

Properties

  • .tensor → Tensor The output tensor underlying this controller. Setter: Replaces the tensor (after validation), rebuilds the internal computation graph, and invalidates any previously computed derivatives.

  • .compatible → bool Indicates whether every backward function in the tensor’s subgraph has a supported high-order implementation. If False, some derivatives may fall back or be unavailable.

  • .index → Dict[Type[torch.autograd.Function], Type[ExtendedAutogradFunction]] A mapping from base PyTorch autograd.Function classes to thoad’s ExtendedAutogradFunction implementations. Setter: Validates and injects your custom high-order extensions.

Core Methods

.backward(order, gradient=None, crossings=False, groups=None, keep_batch=False, keep_schwarz=False) → None

Performs the high-order backward pass up to the specified derivative order, storing all computed partials in each leaf tensor’s .hgrad attribute.

  • order (int > 0): maximum derivative order.
  • gradient (Optional[Tensor]): custom upstream gradient with the same shape as controller.tensor.
  • crossings (bool, default False): If True, cross partial derivatives across different leaf tensors will be computed.
  • groups (Optional[Iterable[Iterable[Tensor]]], default None): When crossings=False, restricts cross partials to those whose leaf tensors all lie within a single group. If crossings=True and groups is provided, a ValueError is raised.
  • keep_batch (bool, default False): controls whether independent output axes are kept separate (batched) or merged (flattened) in stored/retrieved derivatives.

Related Skills

View on GitHub
GitHub Stars6
CategoryDevelopment
Updated26d ago
Forks1

Languages

Python

Security Score

90/100

Audited on Mar 2, 2026

No findings