Efficientsam3

EfficientSAM3 compresses SAM3 into lightweight, edge-friendly models via progressive knowledge distillation for fast promptable concept segmentation and tracking.

Generate Convert Improve

Install / Use

/learn @SimonZeng7108/Efficientsam3

About this skill

Quality Score

0/100

README

EfficientSAM3: Progressive Hierachical Knowledge Distillation (PhD) from SAM1, 2 and 3

Chengxi Simon Zeng1,†, Yuxuan Jiang1, Gao Ge1, Shuai Wang2, Duolikun Danier3, Bin Zhu4, Stevan Rudinac2, David Bull1, Fan Aaron Zhang1 1Visual Information Lab, University of Bristol; 2MultiX lab, University of Amsterdam; 3University of Edinburgh; 4Singapore Management University

†Tech Lead & Corresponding Author

Updates

[2026/02/18] SAM3-LiteText released! SAM3-LiteText reduces text encoder parameters by up to 88% with similar performance to the original text encoder. Paper available on arXiv. Code available in sam3_litetext branch and weights on Hugging Face.
[2026/01/11] Stage 1 geometry-prompt fine-tuned (ft) weights released/updated (image encoders on 1% SA-1B; text encoders fine-tuned on SA-Co Gold+Silver).
[2025/12/08] Stage 1 text encoder weights released for all 3 variants (MobileCLIP S0, S1, and MobileCLIP2 L) - distilled on 1% Recap-DataComp-1B dataset.
[2025/12/02] Stage 1 image encoder weights released for all 9 variants (RepViT, TinyViT, EfficientViT) - unsupervised distilled on 1% of SA-1B dataset.
[2025/11/25] Teaser model released. See Above. More models are baking in the oven🔥.
[2025/10/18] Project announced. Code and weights are not released yet; they will be published once SAM3 code is publicly available.

Table of Contents
Updates
Installation
Inference
Training and Evaluation
Datasets
EfficientSAM3 Model Zoo & Weight Release
Preliminary Evaluation
CoreML / ONNX Export
Web Demo
Development To-Do List
Call for Pull Requests
Citation
License
Acknowledgments
Users

SAM3 (Segment Anything Model 3) has introduced powerful Promptable Concept Segmentation (PCS) capabilities, enabling semantic understanding and temporal object tracking beyond traditional mask generation. However, SAM3's massive vision backbone and dense memory bank make it impractical for real-time, on-device applications where computational resources and latency constraints are critical.

EfficientSAM3 addresses this challenge by distilling SAM3's capabilities into lightweight architectures suitable for edge devices, enabling high-quality concept segmentation on mobile phones, embedded systems, and resource-constrained platforms.

<details> <summary>Supported Models and Architecture</summary>

| Component | Model/Backbone | Purpose | |-----------|----------------|---------| | Teacher Models | SAM (Segment Anything Model) | Foundation for image-level encoder distillation | | | SAM2 | Temporal memory and video tracking distillation | | | SAM3 | Promptable Concept Segmentation (PCS) capabilities | | Datasets | SA-1B | Image segmentation dataset | | | SA-V | Video object segmentation dataset | | | SA-Co/Gold | Promptable concept segmentation benchmark | | | Recap-DataComp-1B | Large-scale image-text dataset for text encoder distillation | | Student Backbones (Image) | RepViT (M0.9, M1.1, M2.3) | Mobile-optimized Vision Transformer for highest throughput | | | TinyViT (5M, 11M, 21M) | Balanced efficiency and performance | | | EfficientViT (B0, B1, B2) | Ultra-lightweight architectures for minimal latency | | Student Backbones (Text) | MobileCLIP S0 | Lightweight text encoder (42.57M params) | | | MobileCLIP S1 | Balanced text encoder (63.56M params) | | | MobileCLIP2 L | Larger text encoder (123.6M params) |

</details>

<details> <summary>Three-Stage Progressive Training Curriculum</summary>

EfficientSAM3 is trained through a three-stage progressive distillation:

Stage 1: Encoder Distillation (Image-Level Segmentation)

Distill the SAM3 image encoder to nine student backbones (3 RepViT × 3 TinyViT × 3 EfficientViT variants)
Distill the SAM3 text encoder to three student text encoders (MobileCLIP S0, S1, 2-L variants)
Use SA-1B dataset with Prompt-in-the-Loop Distillation for image encoder distillation
Use Recap-DataComp-1B dataset for text encoder distillation
Align student backbone features with teacher encoder outputs.

Stage 2: Temporal Memory Distillation (Video Tracking)

Replace SAM3's dense memory bank with a compact Perceiver-based memory module (adapted from EdgeTAM)
Distill memory-conditioned mask predictions using SA-V dataset
Train the Perceiver module to compress and retrieve spatiotemporal features efficiently

Stage 3: End-to-End Fine-Tuning (Concept Segmentation)

Refine the complete EfficientSAM3 pipeline using SAM3 official dataset
Joint optimization of distilled encoder + compressed memory + mask decoder
Preserve Promptable Concept Segmentation capabilities while maintaining efficiency

tl;dr

Stage 1: We distill the SAM3 encoder using SAM1 data. Stage 2: We align the distilled encoder to a perceiver and an efficient memory bank using SAM2 data. Stage 3: We fine-tune the complete pipeline using SAM3 data.

</details>

Installation

EfficientSAM3 purposely shares the same software contract as upstream SAM3:

Python ≥ 3.12
PyTorch 2.7.0
Device: NVIDIA GPU (CUDA), Apple Silicon (MPS), or CPU

For non-CUDA platforms (MPS/CPU), install scipy for distance transform operations:

pip install scipy

Follow the exact environment setup from the official SAM3 README or use the condensed steps below:

git clone https://github.com/SimonZeng7108/efficientsam3.git
cd efficientsam3

conda create -n efficientsam3 python=3.12 -y
conda activate efficientsam3

pip install --upgrade pip

# Install PyTorch (choose one based on your device):
# CUDA (default):
pip install torch==2.7.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126

# MPS/CPU (Apple Silicon or CPU-only):
pip install torch==2.7.0 torchvision torchaudio

# Install repo dependencies via the root pyproject (brings in SAM3 + Stage-1 extras)
pip install -e ".[stage1]"

# Note: the Stage-1 extra includes the SAM1 package dependency
# (PyPI name: segment-anything, import name: segment_anything).
# If your environment cannot resolve it from PyPI, install the vendored repo instead:
# pip install -e ./segment-anything

Inference

Download checkpoints from the Model Zoo section. All Stage 1 image encoder weights are available via Google Drive and Hugging Face links in the table below.

Quick Start (Image Segmentation):

🔥 Teaser Image Model

EfficientViT-S (0.68M params) distilled from SAM3 Encoder (461.84M) — 99.85% smaller, trained on 1% SA-1B.

from sam3.model_builder import build_efficientsam3_image_model
from sam3.model.sam3_image_processor import Sam3Processor

# Load model
model = build_efficientsam3_image_model(
  checkpoint_path="efficient_sam3_efficientvit_s.pt",
  backbone_type="efficientvit",
  model_name="b0",
  enable_inst_interactivity=True,
)

# Process image and predict
processor = Sam3Processor(model)
inference_state = processor.set_image(image)

# Single positive point prompt (x, y) in pixels
points = [[image.size[0] / 2, image.size[1] / 2]]
labels = [1]
masks, scores, _ = model.predict_

Related Skills

docs-writer

98.8k

`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie

model-usage

331.2k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

Design

Campus Second-Hand Trading Platform \- General Design Document (v5.0 \- React Architecture \- Complete Final Version)1\. System Overall Design 1.1. Project Overview This project aims t

arscontexta

2.8k

Claude Code plugin that generates individualized knowledge systems from conversation. You describe how you think and work, have a conversation and get a complete second brain as markdown files you own.

SimonZeng7108

View profile

View on GitHub

GitHub Stars467

CategoryContent

Updated2h ago

Forks35

SimonZeng7108/efficientsam3

Languages

Jupyter Notebook

Security Score

85/100

Audited on Mar 23, 2026

No findings