SkillAgentSearch skills...

Thyme

✨✨ [ICLR 2026] Think Beyond Images

Install / Use

/learn @yfzhang114/Thyme
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<p align="center" width="40%"> <img src="docs/resources/keye_logo.png" width="50%" height="50%"> </p>

<font size=7><div align='center' >
[📖 Home Page] [📖 Technique Report]

[📊 Thyme SFT Model] [📊 Thyme RL Model] [📝 SFT Data] [📝 RL Data]

</div></font>

🔥 News

  • 2025.09.04 🌟 Thyme is supported by VLMEvalKit and LMMs-Eval. Feel free to use it without hesitation!
  • 2025.08.18 🌟 We are excited to introduce Thyme: Think Beyond Images. Thyme transcends traditional ``thinking with images'' paradigms by autonomously generating and executing diverse image processing and computational operations through executable code, significantly enhancing performance on high-resolution perception and complex reasoning tasks. Leveraging a novel two-stage training strategy that combines supervised fine-tuning with reinforcement learning and empowered by the innovative GRPO-ATS algorithm, Thyme achieves a sophisticated balance between reasoning exploration and code execution precision.
<p align="center" width="100%"> <img src="docs/resources/pipeline.png" width="100%" height="100%"> </p>

Table of Contents

  1. Quick Start
  2. Data Preparation
  3. Supervised Fine-Tuning (Thyme-SFT)
  4. Reinforcement Learning (Thyme-RL)
  5. Evaluation
  6. Usage Example: How to use Thyme
  7. Citation
  8. Related Projects

1. Quick Start

1.1 Clone the Repository

git clone https://github.com/yfzhang114/Thyme.git
cd Thyme

1.2 Environment Setup & Dependency Installation

We recommend creating a Conda environment for isolation and installing dependencies as follows:

conda create -n Thyme python=3.10 -y
conda activate Thyme

pip install -e .
pip install "sglang[all]" -U
pip install "vllm>=0.5.1" "transformers<4.55" "trl<0.21" -U
pip install "lmdeploy>=0.5,<0.9" -U --no-deps
pip install autoawq -U --no-deps
pip install auto_gptq optimum bitsandbytes "gradio<5.33" -U
pip install git+https://github.com/modelscope/ms-swift.git
pip install timm -U
pip install "deepspeed<0.17" -U
pip install qwen_vl_utils qwen_omni_utils decord librosa icecream soundfile -U
pip install liger_kernel nvitop pre-commit math_verify py-spy -U
pip install wandb

pip install flash-attn --no-build-isolation --use-pep517


2. Data Preparation

2.1 Download Dataset

Obtain the training data from the HuggingFace Dataset Page. The SFT dataset consists of three splits:

  • wo_thinking_thyme_single_round: Single-turn image operation data
  • 2round: Multi-turn dialogue data
  • computation: Annealing data used for computational tasks

Each sample’s image field is a list containing the original and processed images.

2.2 Process Images and Update Paths

Before training, ensure all referenced images are downloaded and saved locally. Update the dataset files (e.g., .jsonl) by replacing image URLs or remote paths with local absolute paths, for example:

"image": [
  "/path/to/original_images/0904.0709_0.jpg",
  "/path/to/processed_images/0904.0709_0_6349.jpg"
]
#!/usr/bin/env python3
import os, json, base64
from pathlib import Path
from concurrent.futures import ThreadPoolExecutor, as_completed
from datasets import load_dataset
from tqdm import tqdm
import io
from PIL import Image

HF_DATA_DIR = "./data/Thyme-SFT"
ROOT_OUT    = Path("Thyme_sft_data")
IMG_ROOT    = './data/Thyme_sft_data/img'
JSONL_ROOT  = ROOT_OUT / "jsonl"
SPLITS      = ["wo_thinking_thyme_single_round", "2round", "computation"] #,  
MAX_WORKERS = os.cpu_count()      # Can be adjusted based on machine specs

# IMG_ROOT.mkdir(parents=True, exist_ok=True)
JSONL_ROOT.mkdir(parents=True, exist_ok=True)

# ----------- Thread pool task -----------
def save_one_image(args):
    """
    Decode Base64 string, handle transparency and save image as JPEG.

    Args:
        args (tuple): Tuple containing (b64_str, save_path).
    """
    b64_str, save_path = args
    if os.path.exists(save_path):
        return save_path

    try:
        # 1. Decode Base64 to get raw binary data
        image_bytes = base64.b64decode(b64_str)

        # 2. Use Pillow to open image from binary data
        with Image.open(io.BytesIO(image_bytes)) as img:
            # 3. Handle transparency (key step)
            # Check if image mode needs transparency handling.
            # 'P' mode may contain transparency, 'LA' is grayscale+transparency.
            # 'RGBA' is the most common mode with transparency.
            if img.mode in ("RGBA", "LA", "P"):
                # To uniformly handle all transparency cases, first convert image to RGBA mode.
                # If image is in 'P' mode with transparency, conversion will result in correct RGBA image.
                img = img.convert("RGBA")

                # Create a white background base image
                background = Image.new("RGB", img.size, (255, 255, 255))

                # Paste original image onto the background.
                # At this point img is already in RGBA mode, so it can serve as its own mask.
                # Pillow will automatically use its Alpha channel.
                background.paste(img, (0, 0), img)
                img = background # Now img is the merged RGB image

            # If image mode is not RGB (e.g., 'L', 'CMYK', etc.), convert to RGB
            elif img.mode != "RGB":
                img = img.convert("RGB")

            # 4. Save image in JPEG format
            # JPEG doesn't support transparency, so background filling is necessary.
            img.save(save_path, "jpeg", quality=95) # Recommend adding quality parameter

        return str(save_path)

    except Exception as e:
        # Add exception handling for debugging which image caused the problem
        print(f"Error processing image for {save_path}: {e}")
        return None

# ----------- Main processing -----------
for split in SPLITS:
    print(f"\n>>> Processing split : {split}  (max_workers={MAX_WORKERS})")
    # 3. Write jsonl, check if already exists
    jsonl_path = JSONL_ROOT / f"{split}.jsonl"
    if not jsonl_path.exists():  # Only write if jsonl file doesn't exist
        print(f"  JSONL  -> {jsonl_path}")
    else:
        print(f"  JSONL already exists: {jsonl_path}")

    ds = load_dataset(HF_DATA_DIR, split=split)

    img_dir = IMG_ROOT + '/' + split
    # img_dir.mkdir(exist_ok=True)

    # 1. First collect all tasks to be saved
    tasks = []               # (b64_str, save_path)
    records = []             # For writing jsonl
    for sample_idx, sample in enumerate(ds):
        img_paths = []
        for img_idx, b64_img in enumerate(sample["image"], start=1):
            img_name = f"{sample_idx+1:08d}_{img_idx:02d}.jpg"
            img_path = img_dir + '/' + img_name
            tasks.append((b64_img, img_path))
            img_paths.append(str(img_path))
        records.append({
            "image": img_paths,
            "question": sample["question"],
            "response": sample["response"]
        })

    # 2. Execute with multi-threading
    with ThreadPoolExecutor(max_workers=MAX_WORKERS) as pool:
        # Only save images that don't already exist
        saved_images = list(tqdm(pool.map(save_one_image, tasks),
                                 total=len(tasks), desc="Saving images"))
    
    # Filter out items that returned None (i.e., files that already existed)
    saved_images = [img for img in saved_images if img is not None]

    with open(jsonl_path, "w", encoding="utf-8") as f:
        for rec in records:
            f.write(json.dumps(rec, ensure_ascii=False) + "\n")
    
    print(f"  Images -> {img_dir}  ({len(saved_images)} files)")

print("\nAll done (multi-threaded)!")

2.3 File Path Conversion for System Integration

In every question, there is a specified file path that needs to be converted into the correct system path for use in our platform. The following steps outline the process for handling these paths.

Conversion Process:

  1. Original Path Format:

    • Example: "User Image Path: \"/mllm_hdd/yfzhang/data/temp_processed_images/cauldron_dvqa_images_dvqa_00110792.png_rotated_image_318.png\""
  2. Transformation:

    • Extract the filename from the original path.
    • Convert it into the first element of the image array in the system.
    • This element will represent the correct file path for the system.
  3. Response Path Conversion:

    • Similarly, ensure that any corresponding absolute paths provided in the response are transformed to match the system format as described.

3. Supervised Fine-Tuning (Thyme-SFT)

3.1 Training Data Format

Training samples follow this JSON format example (full dataset includes similar structures):

{
  "image": ["/path/to/original.jpg", "/path/to/processed.jpg"],
  "question": "<image>\nBased on the top-right graph, describe the behavior of P(z) as z approaches zero. Options:\n...",
  "response": "<think>Detailed reasoning and executable code...</think><answer>B</answer>"
}

3.2 Configure Training Paths

Set these variables in your training script or environment:

  • DATASET: Path to your training dataset
  • SAVE_PATH: Directory to save the trained model
  • Model: Path to your model

3.3 Run Training

Execute the training scripts:

sh scripts/sft_stage1.sh   # Stage 1: Supervised fine-tuning
sh scripts/sft_stag
View on GitHub
GitHub Stars589
CategoryDevelopment
Updated1d ago
Forks36

Languages

Python

Security Score

95/100

Audited on Apr 3, 2026

No findings