SkillAgentSearch skills...

InternAgent

InternAgent-1.5: A Unified Agentic Framework for Long-Horizon Autonomous Scientific Discovery

Install / Use

/learn @InternScience/InternAgent

README

InternAgent-1.5: A Unified Agentic Framework for Long-Horizon Autonomous Scientific Discovery

Autonomous Discovery Across All Sciences

🔥 News

📌 Pinned: Leveraging the general capabilities of InternAgent 1.5, anyone can now submit their algorithm tasks for optimization by opening an issue in this repository. We will regularly update the algorithm design results on this website. For other scientific discovery tasks, please visit Intern-Discovery.


  • 2026.3.17: 🚀🚀 We provide public access to InternAgent's Deep Research capabilities, enabling developers and researchers to seamlessly integrate its advanced deep research functionality into their own workflows.

  • 2026.2.14: ❤️‍🔥❤️‍🔥 We open-source MLEvolve, the core implementation of InternAgent's solution optimization subsystem for algorithm design tasks. As the open-source method to achieve #1 on MLEBench, MLEvolve demonstrates powerful capabilities in solution optimization within bounded hypothesis spaces.

  • 2026.2.10: 🔥 Official release of the InternAgent 1.5 Technical Report. InternAgent 1.5 achieves leading performance on scientific reasoning benchmarks including GAIA, HLE, GPQA, and FrontierScience, and supports end-to-end autonomous scientific discovery tasks across Physical, Biology, Earth, and Life Science domains, enabling both algorithm discovery and empirical discovery (dry/wet-lab experiments).

  • 2025.10.13: InternAgent-1.0 code has been fully open-sourced, supporting end-to-end automation and autonomous evolution across 12 scientific research tasks.

<details> <summary>more...</summary>
  • 2025.07.17: The source code of InternAgent has been partially open-sourced. The complete version of InternAgent (covering 12 types of tasks for autonomous scientific research) will be open-sourced soon. This code repository can be used for full-cycle autonomous scientific research, ranging from hypothesis generation to automated experimental execution.

  • 2025.07.10: NovelSeek has been renamed to InternAgent. This change embodies our hopeful vision for autonomous scientific research framework, and we hope it will empower all researchers to achieve great scientific discoveries.

</details>

📖 Overview

InternAgent

InternAgent 1.5 is a unified autonomous system for end-to-end scientific discovery across both Algorithm Discovery and Empirical Discovery. Building on InternAgent 1.0, it organizes scientific inquiry into three coordinated subsystems: Generation (hypothesis construction via deep research), Verification (methodological evaluation via solution refinement), and Evolution (evidence-driven refinement via long-horizon memory). Benchmark InternAgent 1.5 achieves leading performance on scientific reasoning benchmarks (GAIA, HLE, GPQA, FrontierScience, SGI-bench) and demonstrates sustained autonomous optimization across extended discovery cycles. The system supports algorithm discovery (agent memory, reinforcement learning, test-time scaling, ...) and empirical discovery workflows (dry-lab simulations and wet-lab experimentation) across Physical, Biological, Earth, and Life Sciences. Capability


🌟 Core Features

Framework

InternAgent 1.5 is built on three foundational subsystems that enable autonomous scientific discovery:

🔍 Generation: Deep Research for Hypothesis Construction

  • Autonomous literature analysis and knowledge synthesis across scientific domains
  • Multi-source information integration from papers, code repositories, and domain-specific databases
  • Structured hypothesis formulation grounded in existing scientific evidence

✅ Verification: Solution Refinement for Methodological Evaluation

  • Systematic transformation of hypotheses into executable experimental protocols
  • Automated code generation, debugging, and execution across computational and experimental environments
  • Exception-guided intelligent error correction and iterative solution optimization

🔄 Evolution: Long-Horizon Memory for Evidence-Driven Refinement

  • Persistent memory architecture that accumulates knowledge across extended research cycles
  • Cross-iteration learning from experimental outcomes and methodological feedback
  • Adaptive optimization that continuously refines hypotheses and experimental designs

🧩 Three-Subsystem Coordination

  • Generation → Verification → Evolution forms a complete discovery cycle
  • Seamless integration of dry-lab (computational modeling) and wet-lab (physical experimentation) workflows
  • Extensible architecture supporting diverse tasks across Algorithm Discovery and Empirical Discovery

InternAgent 1.5 delivers end-to-end autonomous scientific discovery, enabling researchers to complete the full cycle—from hypothesis generation to experimental validation—across Physical, Biological, Earth, and Life Sciences.


🔬 Supported Research Tasks

Scientific Algorithm Discovery

  • Suzuki–Miyaura Reaction Yield Prediction
  • Transcription Prediction for Perturbation Response
  • Power Flow Estimation
  • Time Series Forecasting
  • Molecular Dynamics Simulation
  • Enhancer Activity Prediction

AI Algorithm Discovery

  • Test-Time Scaling for LLM Reasoning
  • Long-Term Memory Management for Agents
  • Self-Distillation for Mathematical Reasoning
  • Test-Time Reinforcement Learning

Empirical Discovery

  • Automated Climate Diagnostics
  • Climate Downscaling Optimization
  • Biological Evidence Synthesis for Target Discovery
  • Hypothesis Generation and Target Prioritization
  • Fluorescent Protein Engineering
  • Automated Reaction Outcome Prediction
  • Generative Scaffold Hopping And more...

🎉 Benchmark Results

Results on Al Research Tasks

InternAgent consistently improves upon the baseline and outperforms Dolphin across all tasks, spanning AI and scientific domains.

Max Performance

| Task | Metric | Baseline | Dolphin | InternAgent | |------|--------|----------|---------|-------------| | AutoRYP | R² ↑ | 27.6 | 31.8 (+4.2) | 35.4 (+7.8) | | AutoMD | Forces-MAE ↓ | 0.158 | 0.152 | 0.148 | | AutoPower | RMSE ↓ | 0.00473 | 0.00455 | 0.00426 | | AutoTSF | MAE ↓ | 0.4382 | 0.4627 | 0.4331 | | AutoTPPR | MSE ↓ | 0.197 | 0.173 | 0.146 | | AutoEAP | HK-PCC ↑ | 0.65 | 0.76 | 0.79 | | AutoSenCls | Acc ↑ | 91.0 | 92.5 (+1.5) | 93.5 (+2.5) | | Auto2DCls | Top-1 Acc ↑ | 81.2 | 82.0 (+0.8) | 83.3 (+2.1) | | Auto3DCls | OA ↑ | 91.0 | 93.9 (+2.9) | 95.5 (+4.5) | | Auto2DSeg | mIoU ↑ | 78.8 | - | 81.0 (+2.2) | | AutoPCDet | mAP ↑ | 65.0 | - | 65.9 (+0.9) | | AutoVLM | QA ↑ | 67.1 | - | 67.6 (+0.5) |

Average Performance

| Task | Metric | Baseline | Dolphin | InternAgent | |------|--------|----------|---------|-------------| | AutoRYP | R² ↑ | 27.6 | 31.3 (+3.7) | 33.5 (+5.9) | | AutoMD | Forces-MAE ↓ | 0.158 | 0.155 | 0.152 | | AutoPower | RMSE ↓ | 0.00473 | 0.00459 | 0.00447 | | AutoTSF | MAE ↓ | 0.4382 | - | 0.4346 | | AutoTPPR | MSE ↓ | 0.197 | 0.179 | 0.170 | | AutoEAP | HK-PCC ↑ | 0.65 | 0.73 | 0.77 | | AutoSenCls | Acc ↑ | 91.0 | 91.8 (+0.8) | 92.5 (+1.5) | | Auto2DCls | Top-1 Acc ↑ | 81.2 | 81.8 (+0.6) | 82.2 (+1.0) | | Auto3DCls | OA ↑ | 91.0 | 92.0 (+1.0) | 93.4 (+2.4) | | Auto2DSeg | mIoU ↑ | 78.8 | - | 80.1 (+1.3) | | AutoPCDet | mAP ↑ | 65.0 | - | 65.7 (+0.7) | | AutoVLM | QA ↑ | 67.1 | - | 67.6 (+0.5) |


🧪 GAIA, GPQA-Diamond, FrontierScience and HLE Benchmarks

InternAgent-1.5 achieved state-of-the-art results across multiple benchmarks.

Humanity's Last Exam (HLE)

| Setting | Model | Math | Bio/Med | CS/AI | Physics | Human. | Chem. | Engineer. | Other | Avg. | |---------|-------|------|---------|-------|---------|--------|-------|-----------|-------|------| | Text-Only | Deepseek-R1 | 9.30 | 8.60 | 7.40 | 5.80 | 11.00 | 5.60 | 10.30 | 7.50 | 8.60 | | | Gemini-3-pro-preview | 45.08 | 26.13 | 26.79 | 32.67 | 44.04 | 34.65 | 29.69 | 32.39 | 38.00 | | | InternAgent-1.5 | 48.96 | 30.63 | 29.46 | 34.16 | 44.56 | 30.69 | 28.13 | 37.50 | 40.87 | | All-Set | o4-mini | 19.00 | 11.40 | 12.90 | 12.60 | 9.10 | 12.70 | 12.60 | 6.90 | 14.30 | | | GPT-5 | 31.00 | 22.10 | 24.90 | 21.70 | 20.60 | 16.40 | 14.40 | 18.00 | 24.80 | | | Gemini-3-pro-preview | 44.76 | 27.14 | 29.05 | 31.30 | 42.92 | 40.00 | 32.43 | 34.33 | 38.04 | | | InternAgent-1.5 | 48.09 | 30.36 | 30.71 | 33.04 | 42.47 | 34.55 | 30.63 | 38.63 | 40.00 |


FrontierScience Benchmark

| Method | Olympiad (avg N=20) | | | | Research (avg N=30) | | | | |--------|---------|---------|---------|---------|---------|---------|---------|---------| | | Bio | Chem | Phy | All | Bio | Chem | Phy | All | | o4-mini | 47.00±14.90 | 65.00±6.40 | 53.40±4.50 | 57.40±3.30 | 9.67±5.47 | 8.17±4.37 | 0.83±2.27 | 6.20±2.54 | | InternS1-235B | 17.00±12.69 | 52.88±4.05 | 50.40±3.88 | 48.05±2.84 | 4.50±4.35 | 11.00±3.74 | 2.67±3.35 | 6.06±2.30 | | Mirothinker-v1.5-30B-A3B | 22.86±4.52 | 69.64±7.49 | 54.86±3.18 | 57.57±3.66 | 8.17±6.39 | 8.50±6.21 | 5.83±4.10 | 7.50±3.77 | | DeepSeek-V3.2-Thinking | 26.50±7.26 | 72.

View on GitHub
GitHub Stars1.2k
CategoryDevelopment
Updated1h ago
Forks105

Languages

Python

Security Score

85/100

Audited on Mar 27, 2026

No findings