SkillAgentSearch skills...

Iris.c

Flux 2 image generation model pure C inference

Install / Use

/learn @antirez/Iris.c
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Iris - a C inference pipeline for image synthesis models

Iris is an inference pipeline that generates images from text prompts using open weights diffusion transformer models. It is implemented entirely in C, with zero external dependencies beyond the C standard library. MPS and BLAS acceleration are optional but recommended. Under macOS, a BLAS API is part of the system, so nothing is required.

The name comes from the Greek goddess Iris, messenger of the gods and personification of the rainbow.

Supported model families:

  • FLUX.2 Klein (by Black Forest Labs):
    • 4B distilled (4 steps, auto guidance set to 1, very fast).
    • 4B base (50 steps for max quality, or less. Classifier-Free Diffusion Guidance, much slower but more generation variety).
    • 9B distilled (4 steps, larger model, higher quality. Non-commercial license).
    • 9B base (50 steps, CFG, highest quality. Non-commercial license).
  • Z-Image-Turbo (by Tongyi-MAI):
    • 6B (8 NFE / 9 scheduler steps, no CFG, fast).

Quick Start

# Build (choose your backend)
make mps       # Apple Silicon (fastest)
# or: make blas    # Intel Mac / Linux with OpenBLAS
# or: make generic # Pure C, no dependencies

# Download a model (~16GB) - pick one:
./download_model.sh 4b                   # using curl
# or: pip install huggingface_hub && python download_model.py 4b

# Generate an image
./iris -d flux-klein-4b -p "A woman wearing sunglasses" -o output.png

If you want to try the base model, instead of the distilled one (much slower, higher quality), use the following instructions. Use 10 steps if your computer is quite slow, instead of the default of 50, it will still work well enough to test it (10 seconds to generate a 256x256 image on a MacBook M3 Max).

./download_model.sh 4b-base
# or: pip install huggingface_hub && python download_model.py 4b-base
./iris -d flux-klein-4b-base -p "A woman wearing sunglasses" -o output.png

If you want to try the 9B model (higher quality, non-commercial license, ~30GB download):

# 9B is a gated model - you need a HuggingFace token
# 1. Accept the license at https://huggingface.co/black-forest-labs/FLUX.2-klein-9B
# 2. Get your token from https://huggingface.co/settings/tokens
./download_model.sh 9b --token YOUR_TOKEN
# or: python download_model.py 9b --token YOUR_TOKEN
# or: set HF_TOKEN env var
./iris -d flux-klein-9b -p "A woman wearing sunglasses" -o output.png

For Z-Image-Turbo:

# Download Z-Image-Turbo (~12GB)
pip install huggingface_hub && python download_model.py zimage-turbo
./iris -d zimage-turbo -p "a fish" -o fish.png

That's it. No Python runtime or CUDA toolkit required at inference time.

Example Output

Woman with sunglasses

Generated with: ./iris -d flux-klein-4b -p "A picture of a woman in 1960 America. Sunglasses. ASA 400 film. Black and White." -W 512 -H 512 -o woman.png

Image-to-Image Example

antirez to drawing

Generated with: ./iris -i antirez.png -o antirez_to_drawing.png -p "make it a drawing" -d flux-klein-4b

Features

  • Zero dependencies: Pure C implementation, works standalone. BLAS optional for ~30x speedup (Apple Accelerate on macOS, OpenBLAS on Linux)
  • Metal GPU acceleration: Automatic on Apple Silicon Macs. Performance matches PyTorch's optimized MPS pipeline
  • Runs where Python can't: Memory-mapped weights (default) enable inference on 8GB RAM systems where the Python ML stack cannot run at all
  • Text-to-image: Generate images from text prompts
  • Image-to-image: Transform existing images guided by prompts (Flux models)
  • Multi-reference: Combine multiple reference images (e.g., -i car.png -i beach.png for "car on beach")
  • Integrated text encoder: Qwen3 encoder built-in (4B or 8B depending on model), no external embedding computation needed
  • Memory efficient: Automatic encoder release after encoding (up to ~16GB freed)
  • Memory-mapped weights: Enabled by default. Reduces peak memory from ~16GB to ~4-5GB. Fastest mode on MPS; BLAS users with plenty of RAM may prefer --no-mmap for faster inference
  • Size-independent seeds: Same seed produces similar compositions at different resolutions. Explore at 256x256, then render at 512x512 with the same seed
  • Terminal image display: watch the resulting image without leaving your terminal (Ghostty, Kitty, iTerm2, WezTerm, or Konsole).

Terminal Image Display

Kitty protocol example

Display generated images directly in your terminal with --show, or watch the denoising process step-by-step with --show-steps:

# Display final image in terminal (auto-detects Kitty/Ghostty/iTerm2/WezTerm/Konsole)
./iris -d flux-klein-4b -p "a cute robot" -o robot.png --show

# Display each denoising step (slower, but interesting to watch)
./iris -d flux-klein-4b -p "a cute robot" -o robot.png --show-steps

Requires a terminal supporting the Kitty graphics protocol (such as Kitty or Ghostty), the iTerm2 inline image protocol (iTerm2, WezTerm), or Konsole. Terminal type is auto-detected from environment variables.

Use --zoom N to adjust the display size (default: 2 for Retina displays, use 1 for non-HiDPI screens).

Usage

Text-to-Image

./iris -d flux-klein-4b -p "A fluffy orange cat sitting on a windowsill" -o cat.png

Image-to-Image

Transform an existing image based on a prompt:

./iris -d flux-klein-4b -p "oil painting style" -i photo.png -o painting.png

FLUX.2 uses in-context conditioning for image-to-image generation. Unlike traditional approaches that add noise to the input image, FLUX.2 passes the reference image as additional tokens that the model can attend to during generation. This means:

  • The model "sees" your input image and uses it as a reference
  • The prompt describes what you want the output to look like
  • Results tend to preserve the composition while applying the described transformation

Tips for good results:

  • Use descriptive prompts that describe the desired output, not instructions
  • Good: "oil painting of a woman with sunglasses, impressionist style"
  • Less good: "make it an oil painting" (instructional prompts may work less well)

Super Resolution: Since the reference image can be a different size than the output, you can use img2img for upscaling:

./iris -d flux-klein-4b -i small.png -W 1024 -H 1024 -o big.png -p "Create an exact copy of the input image."

The model will generate a higher-resolution version while preserving the composition and details of the input.

Multi-Reference Generation

Combine elements from multiple reference images:

./iris -d flux-klein-4b -i car.png -i beach.png -p "a sports car on the beach" -o result.png

Each reference image is encoded separately and passed to the transformer with different positional embeddings (T=10, T=20, T=30, ...). The model attends to all references during generation, allowing it to combine elements from each.

Example:

  • Reference 1: A red sports car
  • Reference 2: A tropical beach with palm trees
  • Prompt: "combine the two images"
  • Result: A red sports car on a tropical beach

You can specify up to 16 reference images with multiple -i flags. The prompt guides how the references are combined.

Interactive CLI Mode

Start without -p to enter interactive mode:

./iris -d flux-klein-4b

Generate images by typing prompts. Each image gets a $N reference ID:

iris> a red sports car
Done -> /tmp/iris-.../image-0001.png (ref $0)

iris> a tropical beach
Done -> /tmp/iris-.../image-0002.png (ref $1)

iris> $0 $1 combine them
Generating 256x256 (multi-ref, 2 images)...
Done -> /tmp/iris-.../image-0003.png (ref $2)

Prompt syntax:

  • prompt - text-to-image
  • 512x512 prompt - set size inline
  • $ prompt - img2img with last image
  • $N prompt - img2img with reference $N
  • $0 $3 prompt - multi-reference (combine images)

Commands: !help, !save, !load, !seed, !size, !steps, !guidance, !linear, !power, !explore, !show, !quit

Command Line Options

Required:

-d, --dir PATH        Path to model directory
-p, --prompt TEXT     Text prompt for generation
-o, --output PATH     Output image path (.png or .ppm)

Generation options:

-W, --width N         Output width in pixels (default: 256)
-H, --height N        Output height in pixels (default: 256)
-s, --steps N         Sampling steps (default: auto, 4 distilled / 50 base / 9 zimage)
-S, --seed N          Random seed for reproducibility
-g, --guidance N      CFG guidance scale (default: auto, 1.0 distilled / 4.0 base / 0.0 zimage)
    --linear          Use linear timestep schedule (see below)
    --power           Use power curve timestep schedule (see below)
    --power-alpha N   Set power schedule exponent (default: 2.0)
    --base            Force base model mode (undistilled, CFG enabled)

Image-to-image options:

-i, --input PATH      Reference image (can be specified multiple times)

Output options:

-q, --quiet           Silent mode, no output
-v, --verbose         Show detailed config and timing info
    --show            Display image in terminal (auto-detects Kitty/Ghostty/iTerm2/WezTerm/Konsole)
    --show-steps      Display each denoising step (slower)
    --zoom N          Terminal image zoom factor (default: 2 for Retina)

Other options:

-m, --mmap            Memory-mapped weights (default, fastest on MPS)
    --no-mmap         Disable mmap, load all weights upfront
    --n

Related Skills

View on GitHub
GitHub Stars1.9k
CategoryContent
Updated6h ago
Forks130

Languages

C

Security Score

95/100

Audited on Mar 28, 2026

No findings