H-Net

Dynamic Chunking for End-to-End Hierarchical Sequence Modeling
Sukjun Hwang, Brandon Wang, Albert Gu
Paper: https://arxiv.org/abs/2507.07955

About

H-Net

This repository contains code of the H-Net architecture. Most of the code lies in hnet/, which has the following structure:

configs/
hnet/
├── models/            # Directory for H-Net
|   ├── config_hnet.py     (defines the config for the H-Net)
|   ├── hnet.py            (h-net as a (B, L, D) -> (B, L, D) sequence model)
│   └── mixer_seq.py       (wrapper to turn h-net into a language model)
└── modules/           # Directory of model components
    ├── dc.py              (modeling code for the dynamic chunking mechanism)
    └── isotropic.py       (code for isotropic, i.e. non-hierarchical components)
generate.py        # Script for inference/generation

Installation

Requirements:

PyTorch >= 2.5.1

Clone the repository and install package.

git clone https://github.com/goombalab/hnet
cd hnet
pip install -e .

We strongly recommend building mamba_ssm package from the latest source as follows:

git clone https://github.com/state-spaces/mamba
cd mamba
pip install .

Pretrained Models

Pretrained models are uploaded to Hugging Face: hnet_1stage_L, hnet_2stage_L, hnet_1stage_XL, hnet_2stage_XL. We trained our models on the 100B-Token subset of FineWeb-Edu. <em>Large</em> and <em>XL</em> are compute-matched to GPT-3 <em>Large</em> and <em>XL</em>, respectively.

We also provide model weights for Chinese and Code, each trained using the 46B-Token subset of FineWeb-Edu Chinese V2.1 and Pile Github: hnet_2stage_XL_chinese, hnet_2stage_XL_code.

You can find specifics of these models at configs, and more details from the paper.

Text Generation

We provide generate.py for text generation that you can use with the pretrained checkpoints.

Examples

python generate.py --model-path [MODEL_CKPT] --config-path [CONFIG]
python generate.py --model-path hnet_2stage_XL.pt --config-path configs/hnet_2stage_XL.json --max-tokens 1024 --temperature 1.0 --top-p 1.0

Citation

If you use this codebase, or otherwise find our work valuable, please cite H-Net:

@article{hnet,
  title={Dynamic Chunking for End-to-End Hierarchical Sequence Modeling},
  author={Hwang, Sukjun and Wang, Brandon and Gu, Albert},
  journal={arXiv preprint arXiv:2507.07955},
  year={2025}
}

Hnet

Install / Use

README

H-Net

About

Installation

Requirements:

Pretrained Models

Text Generation

Examples

Citation