InternAgent-1.5: A Unified Agentic Framework for Long-Horizon Autonomous Scientific Discovery

Autonomous Discovery Across All Sciences

Papers: InternAgent 1.0 | InternAgent 1.5
Links: Website | HuggingFace

🔥 News

📌 Pinned: Leveraging the general capabilities of InternAgent 1.5, anyone can now submit their algorithm tasks for optimization by opening an issue in this repository. We will regularly update the algorithm design results on this website. For other scientific discovery tasks, please visit Intern-Discovery.

2026.3.17: 🚀🚀 We provide public access to InternAgent's Deep Research capabilities, enabling developers and researchers to seamlessly integrate its advanced deep research functionality into their own workflows.
2026.2.14: ❤️‍🔥❤️‍🔥 We open-source MLEvolve, the core implementation of InternAgent's solution optimization subsystem for algorithm design tasks. As the open-source method to achieve #1 on MLEBench, MLEvolve demonstrates powerful capabilities in solution optimization within bounded hypothesis spaces.
2026.2.10: 🔥 Official release of the InternAgent 1.5 Technical Report. InternAgent 1.5 achieves leading performance on scientific reasoning benchmarks including GAIA, HLE, GPQA, and FrontierScience, and supports end-to-end autonomous scientific discovery tasks across Physical, Biology, Earth, and Life Science domains, enabling both algorithm discovery and empirical discovery (dry/wet-lab experiments).
2025.10.13: InternAgent-1.0 code has been fully open-sourced, supporting end-to-end automation and autonomous evolution across 12 scientific research tasks.

2025.07.17: The source code of InternAgent has been partially open-sourced. The complete version of InternAgent (covering 12 types of tasks for autonomous scientific research) will be open-sourced soon. This code repository can be used for full-cycle autonomous scientific research, ranging from hypothesis generation to automated experimental execution.
2025.07.10: NovelSeek has been renamed to InternAgent. This change embodies our hopeful vision for autonomous scientific research framework, and we hope it will empower all researchers to achieve great scientific discoveries.

</details>

📖 Overview

InternAgent

InternAgent 1.5 is a unified autonomous system for end-to-end scientific discovery across both Algorithm Discovery and Empirical Discovery. Building on InternAgent 1.0, it organizes scientific inquiry into three coordinated subsystems: Generation (hypothesis construction via deep research), Verification (methodological evaluation via solution refinement), and Evolution (evidence-driven refinement via long-horizon memory). InternAgent 1.5 achieves leading performance on scientific reasoning benchmarks (GAIA, HLE, GPQA, FrontierScience, SGI-bench) and demonstrates sustained autonomous optimization across extended discovery cycles. The system supports algorithm discovery (agent memory, reinforcement learning, test-time scaling, ...) and empirical discovery workflows (dry-lab simulations and wet-lab experimentation) across Physical, Biological, Earth, and Life Sciences. Capability

🌟 Core Features

Framework

InternAgent 1.5 is built on three foundational subsystems that enable autonomous scientific discovery:

🔍 Generation: Deep Research for Hypothesis Construction

Autonomous literature analysis and knowledge synthesis across scientific domains
Multi-source information integration from papers, code repositories, and domain-specific databases
Structured hypothesis formulation grounded in existing scientific evidence

✅ Verification: Solution Refinement for Methodological Evaluation

Systematic transformation of hypotheses into executable experimental protocols
Automated code generation, debugging, and execution across computational and experimental environments
Exception-guided intelligent error correction and iterative solution optimization

🔄 Evolution: Long-Horizon Memory for Evidence-Driven Refinement

Persistent memory architecture that accumulates knowledge across extended research cycles
Cross-iteration learning from experimental outcomes and methodological feedback
Adaptive optimization that continuously refines hypotheses and experimental designs

🧩 Three-Subsystem Coordination

Generation → Verification → Evolution forms a complete discovery cycle
Seamless integration of dry-lab (computational modeling) and wet-lab (physical experimentation) workflows
Extensible architecture supporting diverse tasks across Algorithm Discovery and Empirical Discovery

InternAgent 1.5 delivers end-to-end autonomous scientific discovery, enabling researchers to complete the full cycle—from hypothesis generation to experimental validation—across Physical, Biological, Earth, and Life Sciences.

🔬 Supported Research Tasks

Scientific Algorithm Discovery

Suzuki–Miyaura Reaction Yield Prediction
Transcription Prediction for Perturbation Response
Power Flow Estimation
Time Series Forecasting
Molecular Dynamics Simulation
Enhancer Activity Prediction

AI Algorithm Discovery

Test-Time Scaling for LLM Reasoning
Long-Term Memory Management for Agents
Self-Distillation for Mathematical Reasoning
Test-Time Reinforcement Learning

Empirical Discovery

Automated Climate Diagnostics
Climate Downscaling Optimization
Biological Evidence Synthesis for Target Discovery
Hypothesis Generation and Target Prioritization
Fluorescent Protein Engineering
Automated Reaction Outcome Prediction
Generative Scaffold Hopping And more...

🎉 Benchmark Results

Results on Al Research Tasks

InternAgent consistently improves upon the baseline and outperforms Dolphin across all tasks, spanning AI and scientific domains.

Max Performance

| Task | Metric | Baseline | Dolphin | InternAgent | |------|--------|----------|---------|-------------| | AutoRYP | R² ↑ | 27.6 | 31.8 (+4.2) | 35.4 (+7.8) | | AutoMD | Forces-MAE ↓ | 0.158 | 0.152 | 0.148 | | AutoPower | RMSE ↓ | 0.00473 | 0.00455 | 0.00426 | | AutoTSF | MAE ↓ | 0.4382 | 0.4627 | 0.4331 | | AutoTPPR | MSE ↓ | 0.197 | 0.173 | 0.146 | | AutoEAP | HK-PCC ↑ | 0.65 | 0.76 | 0.79 | | AutoSenCls | Acc ↑ | 91.0 | 92.5 (+1.5) | 93.5 (+2.5) | | Auto2DCls | Top-1 Acc ↑ | 81.2 | 82.0 (+0.8) | 83.3 (+2.1) | | Auto3DCls | OA ↑ | 91.0 | 93.9 (+2.9) | 95.5 (+4.5) | | Auto2DSeg | mIoU ↑ | 78.8 | - | 81.0 (+2.2) | | AutoPCDet | mAP ↑ | 65.0 | - | 65.9 (+0.9) | | AutoVLM | QA ↑ | 67.1 | - | 67.6 (+0.5) |

Average Performance

| Task | Metric | Baseline | Dolphin | InternAgent | |------|--------|----------|---------|-------------| | AutoRYP | R² ↑ | 27.6 | 31.3 (+3.7) | 33.5 (+5.9) | | AutoMD | Forces-MAE ↓ | 0.158 | 0.155 | 0.152 | | AutoPower | RMSE ↓ | 0.00473 | 0.00459 | 0.00447 | | AutoTSF | MAE ↓ | 0.4382 | - | 0.4346 | | AutoTPPR | MSE ↓ | 0.197 | 0.179 | 0.170 | | AutoEAP | HK-PCC ↑ | 0.65 | 0.73 | 0.77 | | AutoSenCls | Acc ↑ | 91.0 | 91.8 (+0.8) | 92.5 (+1.5) | | Auto2DCls | Top-1 Acc ↑ | 81.2 | 81.8 (+0.6) | 82.2 (+1.0) | | Auto3DCls | OA ↑ | 91.0 | 92.0 (+1.0) | 93.4 (+2.4) | | Auto2DSeg | mIoU ↑ | 78.8 | - | 80.1 (+1.3) | | AutoPCDet | mAP ↑ | 65.0 | - | 65.7 (+0.7) | | AutoVLM | QA ↑ | 67.1 | - | 67.6 (+0.5) |

🧪 GAIA, GPQA-Diamond, FrontierScience and HLE Benchmarks

InternAgent-1.5 achieved state-of-the-art results across multiple benchmarks.

Humanity's Last Exam (HLE)

| Setting | Model | Math | Bio/Med | CS/AI | Physics | Human. | Chem. | Engineer. | Other | Avg. | |---------|-------|------|---------|-------|---------|--------|-------|-----------|-------|------| | Text-Only | Deepseek-R1 | 9.30 | 8.60 | 7.40 | 5.80 | 11.00 | 5.60 | 10.30 | 7.50 | 8.60 | | | Gemini-3-pro-preview | 45.08 | 26.13 | 26.79 | 32.67 | 44.04 | 34.65 | 29.69 | 32.39 | 38.00 | | | InternAgent-1.5 | 48.96 | 30.63 | 29.46 | 34.16 | 44.56 | 30.69 | 28.13 | 37.50 | 40.87 | | All-Set | o4-mini | 19.00 | 11.40 | 12.90 | 12.60 | 9.10 | 12.70 | 12.60 | 6.90 | 14.30 | | | GPT-5 | 31.00 | 22.10 | 24.90 | 21.70 | 20.60 | 16.40 | 14.40 | 18.00 | 24.80 | | | Gemini-3-pro-preview | 44.76 | 27.14 | 29.05 | 31.30 | 42.92 | 40.00 | 32.43 | 34.33 | 38.04 | | | InternAgent-1.5 | 48.09 | 30.36 | 30.71 | 33.04 | 42.47 | 34.55 | 30.63 | 38.63 | 40.00 |

FrontierScience Benchmark

| Method | Olympiad (avg N=20) | | | | Research (avg N=30) | | | | |--------|---------|---------|---------|---------|---------|---------|---------|---------| | | Bio | Chem | Phy | All | Bio | Chem | Phy | All | | o4-mini | 47.00±14.90 | 65.00±6.40 | 53.40±4.50 | 57.40±3.30 | 9.67±5.47 | 8.17±4.37 | 0.83±2.27 | 6.20±2.54 | | InternS1-235B | 17.00±12.69 | 52.88±4.05 | 50.40±3.88 | 48.05±2.84 | 4.50±4.35 | 11.00±3.74 | 2.67±3.35 | 6.06±2.30 | | Mirothinker-v1.5-30B-A3B | 22.86±4.52 | 69.64±7.49 | 54.86±3.18 | 57.57±3.66 | 8.17±6.39 | 8.50±6.21 | 5.83±4.10 | 7.50±3.77 | | DeepSeek-V3.2-Thinking | 26.50±7.26 | 72.