HypotheSAEs: Sparse Autoencoders for Hypothesis Generation

HypotheSAEs is a method which produces interpretable relationships ("hypotheses") in text datasets explaining how input texts are related to a target variable. For example, we can use HypotheSAEs to hypothesize concepts that explain which news headlines receive engagement, or whether a congressional speech was given by a Republican or Democrat speaker. The method works by training Sparse Autoencoders (SAEs) on rich embeddings of input texts, and then interpreting predictive features learned by the SAE.

Preprint 📄: Sparse Autoencoders for Hypothesis Generation. Rajiv Movva*, Kenny Peng*, Nikhil Garg, Jon Kleinberg, and Emma Pierson.
Website 🌐: https://hypothesaes.org
Data 🤗: https://huggingface.co/datasets/rmovva/HypotheSAEs (to reproduce the experiments in the paper)

Questions? Please read the FAQ and README; if not addressed, open an issue or contact us at rmovva@berkeley.edu and kennypeng@cs.cornell.edu.

FAQ
Method
Usage
Citation

FAQ

What are the inputs and outputs of HypotheSAEs?

Inputs: A dataset of texts (e.g., news headlines) with a target variable (e.g., clicks). The texts are embedded using SentenceTransformers or OpenAI.
Outputs: A list of hypotheses. Each hypothesis is a natural language concept, which, when present in the text, is positively or negatively associated with the target variable.

How should I handle very long documents?
Mechanically, text embeddings support up to 8192 tokens (OpenAI, ModernBERT, etc.). However, feature interpretation using long documents is difficult. For documents that are roughly >500 words, we recommend either:

Chunking: Split the document into chunks of ~250-500 words. Each chunk inherits the same label as its parent.
Summarization: Use an LLM to summarize the document into a shorter text.

Why am I not getting any statistically significant hypotheses?
HypotheSAEs identifies features in text embeddings that predict your target variable. If your text embeddings don't predict your target variable at all, it's unlikely HypotheSAEs will find anything. To check this, before running the method, fit a simple ridge regression to predict your target from the text embeddings. If you see any signal on a heldout set, even if it's weak, it's worth running HypotheSAEs. However, if you see no signal at all, the method will probably not work well.
Which LLMs can I use?
You can use OpenAI models or any OpenAI-compatible API endpoint (including vLLM server mode). The default models are GPT-5.2 for interpretation and GPT-5-mini for annotation. If you run your own endpoint, set OPENAI_BASE_URL and pass the served model name.
Do I need a GPU?

If using OpenAI LLMs: no, since all LLM use is via API calls. Training the SAE will be faster on GPU, but it shouldn't be prohibitively slow even on a laptop.
If using your own OpenAI-compatible endpoint (e.g. vLLM server): yes, you will need a reasonable GPU for that server.

What other resources will I need?
You'll need enough disk space to store your text embeddings, and enough RAM to load in the embeddings for SAE training. On an 8GB laptop, we started running out of RAM when trying to load in ~500K embeddings. It also should be possible to adapt the code to use a more efficient data loading strategy, so you don't need to fit everything in RAM.
What types of prediction tasks does HypotheSAEs support?
The repo supports binary classification and regression tasks. For multiclass labels, we recommend using a one-vs-rest approach to convert the problem to binary classification.
You can also use HypotheSAEs to study pairwise tasks (regression or classification), e.g., whether a news headline is more likely to be clicked on than another. See the experiment reproduction notebook for an example of this on the Headlines dataset.
If I use OpenAI models, how much does HypotheSAEs cost?
It's cheap (on the order of $1-10). See the Cost section for an example breakdown.
I heard that SAEs actually aren't useful?
It depends what you're using them for; for hypothesis generation, our paper shows that SAEs outperform several strong baselines. See this thread or our position paper for more discussion.
I'm getting errors about OpenAI rate limits.
You can reduce the number of parallel workers for interpretation and annotation so that you stay within rate limits. See the detailed usage notes for more details.
Can I use private data with HypotheSAEs?
If you're using your own OpenAI-compatible endpoint on a local machine, everything happens on your machine, so only people with access to your machine can see your data.
If using OpenAI: as of now (08/2025), OpenAI doesn't train on data sent through the API. However, they retain data for 30 days for abuse monitoring, which may or may not comply with your DUA.
Note that text embeddings and annotations default to being cached to your disk (wherever your package is installed). If you are using a shared machine, set your file permissions appropriately on your HypotheSAEs directory.

Method

HypotheSAEs has five steps:

Embeddings: Generate text embeddings with OpenAI API or your favorite sentence-transformers model.
Feature Generation: Train a Sparse Autoencoder (SAE) on the text embeddings. This maps the embeddings from a blackbox space into an interpretable feature space.
Feature Selection: Select the learned SAE features which are most predictive of your target variable (e.g., with Lasso).
Feature Interpretation: Generate a natural language interpretation of each feature using an LLM. Each interpretation serves as a hypothesis about what predicts the target variable.
Hypothesis Validation: Use an LLM annotator to test whether the hypotheses are predictive on a heldout set. Note that this step uses only the natural language descriptions of the hypotheses.

The figure below summarizes steps 2-4 (the core hypothesis generation procedure).

Usage

Setup

Option 1: Clone repo (recommended)

Clone the repo and install in editable mode. This will give you access to all of the example notebooks, which are helpful for getting started. You'll also be able to edit the code directly.

git clone https://github.com/rmovva/HypotheSAEs.git
cd HypotheSAEs
pip install -e .

This install is sufficient for the main workflows, including using a local OpenAI-compatible endpoint such as vllm serve. HypotheSAEs no longer includes an in-process vllm inference path; local LLM usage is supported through OpenAI-compatible servers only.

Option 2: Install from PyPI

Alternatively, you can install the package directly from PyPI:

pip install hypothesaes

Note: If using this option, you'll need to separately download any example notebooks you want to use from the GitHub repository.

Set your OpenAI API key

Set your OpenAI API key as an environment variable:

export OPENAI_KEY_SAE="your-api-key-here"

Alternatively, you can set the key in Python (before importing any HypotheSAEs functions) with os.environ["OPENAI_KEY_SAE"] = "your-api-key".

To use a local OpenAI-compatible endpoint (e.g., vLLM server), also set:

export OPENAI_BASE_URL="http://0.0.0.0:8000/v1"

When OPENAI_BASE_URL points to a non-OpenAI endpoint (for example http://127.0.0.1:8000/v1 for vLLM), OPENAI_KEY_SAE is optional.

Quickstart

First, clone and install the repo (Setup) or install via pip. Then, use one of the notebooks to get started:

See notebooks/quickstart.ipynb for a complete working example on using OpenAI models. This notebook uses a 20K example subset of the Yelp restaurant review dataset. The inputs are review texts and the target variable is 1-5 star rating.
See notebooks/quickstart_local.ipynb for a local quickstart reference. For the unified API path, run a local OpenAI-compatible endpoint (e.g., vLLM server) and set OPENAI_BASE_URL.
See notebooks/experiment_reproduction.ipynb to reproduce the results in the paper.

For many use cases

HypotheSAEs

Install / Use

README