PEAP
No description available
Install / Use
/learn @technion-cs-nlp/PEAPREADME
Position-aware Automatic Circuit Discovery
This repository implements Position-Aware Edge Attribution Patching (PEAP),
Arxiv: https://arxiv.org/abs/2502.04577
Overview
PEAP extends Edge Attribution Patching (EAP) by incorporating positional edges, enabling researchers to understand how components at different token positions interact with each other. The method computes attribution scores for both:
- Non-crossing edges: Connections within the same token position (such as attention head -> mlp, mlp -> attention head, embedding -> mlp and more)
- Crossing edges: Connections between attention heads at different token positions.
Key Features
- Position-aware analysis: Segment input sequences into meaningful spans and analyze interactions between them
- Position-aware edge attribution: Compute attribution scores for many types of connections in transformer models
- Circuit discovery: Automatically discover circuits of varying sizes
- Faithfulness evaluation: Evaluate how faithfully the circuits preserve model behavior using ablation studies.
- Multiple tasks supported: Includes implementations for Indirect Object Identification (IOI), WinoBias, and Greater-Than comparison tasks
Repository Structure
src/
├── pos_aware_edge_attribution_patching.py # Core PEAP implementation
├── eval_utils.py # Circuit discovery and evaluation utilities
├── eval.py # Full evaluation pipeline
├── exp.py # Experiment classes for different tasks
├── data_generation.py # Dataset generation for supported tasks
├── input_attribution.py # Input attribution analysis methods (for finding the Schema)
├── schema_generation.py # Automatic span schema generation using LLMs
└── environment.yml # Conda environment specification
Core Components
PEAP Algorithm (pos_aware_edge_attribution_patching.py)
- Computes position-aware attribution scores using gradient-based methods
- Handles both counterfactual and mean ablation strategies -Supports multiple aggregation methods to handle spans of varying lengths. (sum, average, max absolute value)
Circuit Discovery (eval_utils.py)
- Implements algorithms to find circuits of specified sizes
- Supports threshold-based and top-k circuit discovery
- Provides both forward (logits→embeddings) and reverse (embeddings→logits) search
Faithfulness Evaluation (eval.py)
- Evaluates discovered circuits through mean ablation
- Computes faithfulness metrics and accuracy preservation
- Generates comprehensive evaluation reports
Dataset Generation (data_generation.py)
- Creates datasets for IOI (ABBA/BABA patterns), WinoBias, and Greater-Than tasks
- Includes automatic model evaluation on generated datasets
Schema Generation (schema_generation.py)
- Automatically generates span schemas using large language models
- Supports multiple LLM backends (GPT-4, Claude, Llama)
- Includes input attribution methods to identify important tokens
Supported Tasks
How to Add New Tasks
-
Create a customized
Experimentobject, as defined inExperiment.py. -
Prepare a DataFrame that follows these guidelines:
- It must include a
"prompt"column. - It should have one column per span, where each column contains the index of the first token in that span.
- The end of span t is one index before the starting index of span t+1. This means every token is included in exactly one span.
- Empty spans are allowed — just set the start index equal to the start index of the next span.
- The DataFrame must also include a "length" column indicating the total number of tokens in the prompt. This helps handle prompts of varying lengths and will also serve as the boundary for the final span, ensuring it includes all remaining tokens.
- A Beginning-of-Sequence (BOS) token is automatically added when running the pipeline.
- Do not include it in the
"prompt"text. - However, make sure to account for it when setting span indices.
For example, in the prompt"I love you", the token"I"should have index1in the DataFrame, since the BOS token will be added at position0.
- Do not include it in the
- It must include a
Example
For the prompt "I love you" (which is tokenized as ["I", "love", "you"] and becomes ["<BOS>", "I", "love", "you"]), the DataFrame might look like this:
| prompt | span_0 | span_1 | span_2 | length | |---------------|--------|--------|--------|--------| | I love you | 1 | 2 | 3 | 4 |
span_0starts at index 1 (token"I")span_1starts at index 2 (token"love")
→ Sospan_0includes only"I"span_2starts at index 3 (token"you")
→ Sospan_1includes only"love"- Since
length = 4(because the BOS token will be added),span_2includes all tokens from index 3 up to (but not including) index 4 → just"you"
Installation
conda env create -f src/environment.yml
conda activate peap
Usage
1. Generate Datasets
python src/data_generation.py --model_name gpt2 --save_dir ./data --task ioi_baba --seed 42
2. Compute PEAP Scores
Make sure to add "length" as the last span.
python src/pos_aware_edge_attribution_patching.py \
-e ioi -m gpt2 -cl data/gpt2/ioi_ABBA/human_baseline/IOI_data_clean.csv -co data/gpt2/ioi_ABBA/human_baseline/IOI_data_counter_abc.csv \
-sp prefix IO and S1 S1+1 action1 S2 action2 to length -ds 10 -p ioi_results.pkl
3. Discover and Evaluate Circuits
Make sure to add "length" as the last span.
python src/eval.py \
-e ioi -m gpt2 -cl data/gpt2/ioi_ABBA/human_baseline/IOI_data_clean.csv -co data/gpt2/ioi_ABBA/human_baseline/IOI_data_counter_abc.csv \
-sp prefix IO and S1 S1+1 action1 S2 action2 to length -n 10 -tk 100 200 300 \
-p ioi_results.pkl -sp results.pkl
How PEAP Works
1. Span Definition
PEAP segments input sequences into meaningful spans based on:
- Syntactic structure (subjects, objects, verbs)
- Semantic roles (professions, names, actions)
- Task-specific elements (important tokens identified through attribution)
2. Attribution Computation
PEAP computes attribution scors for both:
- Non-crossing edges: Direct connections within spans
- Crossing edges: Attention-mediated connections between different spans through query-key-value interactions
3. Circuit Discovery
Using computed attribution scores, we discovers circuits through:
- Top-k selection: Select k highest-scoring edges
4. Faithfulness Evaluation
Discovered circuits are evaluated by:
- Mean ablation: Replace non-circuit components with mean activations
- Performance preservation: Measure how well the circuit maintains original model behavior
- Size-performance tradeoffs: Analyze circuit efficiency across different sizes
Related Skills
node-connect
344.4kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
99.2kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
344.4kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
344.4kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
