PaperOrchestra

An automated AI research-paper writer based off PaperOrchestra paper's implementation through a skills : five composable skills + benchmark + autoraters using any coding agent (Claude Code, Cursor, Antigravity, Cline, Aider). No API keys, no LLM SDKs.

Generate Convert Improve

Install / Use

/learn @Ar9av/PaperOrchestra

About this skill

Quality Score

0/100

README

PaperOrchestra

A pluggable skill pack that lets any coding agent in Claude Code, Cursor, Antigravity, Cline, Aider, OpenCode, etc. which can run the PaperOrchestra multi-agent pipeline for turning unstructured research materials into a submission-ready LaTeX paper.

Song, Y., Song, Y., Pfister, T., Yoon, J. PaperOrchestra: A Multi-Agent Framework for Automated AI Research Paper Writing. arXiv:2604.05018, 2026. https://arxiv.org/pdf/2604.05018

<a href="https://arxiv.org/pdf/2604.05018"> <img src="docs/assets/paper-preview.png" alt="PaperOrchestra paper — first page preview" width="420"/> </a> Click to read the paper on arXiv

Why this exists

The paper defines a five-agent pipeline

Outline
Plotting
Literature Review
Section Writing
Content Refinement

that substantially outperforms single-agent and tree-search baselines on the PaperWritingBench benchmark (50–68% absolute win margin on literature review quality; 14–38% on overall quality). The paper ships the exact prompts for every agent in Appendix F.

This repo turns those prompts, schemas, halt rules, and verification pipelines into a set of host-agent-executable skills. There are no API keys, no SDK dependencies, no embedded LLM calls. The skills are instruction documents plus deterministic helpers; your coding agent does all LLM reasoning and web search using its own tools.

How skills work here

Each skill is:

SKILL.md — a dense instruction document the host agent reads and follows.
references/ — reference material: verbatim paper prompts (Appendix F), JSON schemas, rubrics, halt rules, example outputs.
scripts/ — purely deterministic local helpers: JSON schema validation, Levenshtein fuzzy matching, BibTeX formatting, dedup, LaTeX sanity checks, coverage gates. No network, no LLM, no API keys.

Everything else (LLM reasoning, web search, Semantic Scholar lookups, LaTeX compilation) is delegated to the host agent by instruction. See skills/paper-orchestra/references/host-integration.md for per-host invocation (Claude Code, Cursor, Antigravity, Cline, Aider).

The seven skills

| Skill | Paper step | # LLM calls | Role | |---|---|---|---| | paper-orchestra | orchestrator | — | Top-level driver. Coordinates the other six. | | outline-agent | Step 1 | 1 | Idea + log + template + guidelines → structured outline JSON (plotting plan, lit review plan, section plan). | | plotting-agent | Step 2 | ~20–30 | Execute plotting plan; render plots & conceptual diagrams; optional VLM-critique refinement loop; caption everything. | | literature-review-agent | Step 3 | ~20–30 | Web-search candidates; Semantic Scholar verify (Levenshtein > 70, cutoff, dedup); draft Intro + Related Work with ≥90% citation integration. | | section-writing-agent | Step 4 | 1 | One single multimodal call: draft remaining sections, build tables from experimental log, splice figures. | | content-refinement-agent | Step 5 | ~5–7 | Simulated peer review; accept/revert per strict halt rules; safety constraints prevent gaming the evaluator. | | paper-writing-bench | §3 | — | Reverse-engineer raw materials (Sparse/Dense idea, experimental log) from an existing paper to build benchmark cases. | | paper-autoraters | App. F.3 | — | Run the paper's own autoraters: Citation F1 (P0/P1), LitReview quality (6-axis), SxS paper quality, SxS litreview quality. |

Steps 2 and 3 run in parallel (see skills/paper-orchestra/references/pipeline.md).

agent-research-aggregator

A pre-pipeline skill that bridges the gap between scattered AI coding-agent history and the structured (idea.md, experimental_log.md) inputs that PaperOrchestra expects. If you have been running experiments through Claude Code, Cursor, Antigravity, or OpenClaw — but never wrote up a clean experiment log — this skill does that extraction for you.

Run it before paper-orchestra.

What it does

[.claude/]  [.cursor/]  [.antigravity/]  [.openclaw/]
      │            │              │               │
      └────────────┴──────────────┴───────────────┘
                        │
                Phase 1: Discovery  (deterministic)
                        │
                Phase 2: Extraction (LLM — per batch)
                        │
                Phase 3: Synthesis  (LLM — one call)
                        │
                Phase 4: Formatting (deterministic)
                        │
             ┌──────────┴──────────┐
      workspace/inputs/      workspace/ara/
        idea.md                aggregation_report.md
        experimental_log.md    discovered_logs.json
                               raw_experiments.json
                               synthesis.json

The four phases are:

| Phase | Tool | What happens | |---|---|---| | 1 Discovery | discover_logs.py | Walks --search-roots to catalog every relevant log file across all agent caches. Prints a summary for user review before anything is read. | | 2 Extraction | LLM (per ~50 KB batch) | Applies references/extraction-prompt.md to each batch; produces raw_experiments.json. PII is stripped; unverified numbers are flagged [UNVERIFIED]. | | 3 Synthesis | LLM (one call) | Merges possibly-redundant experiment records into a single research narrative (synthesis.json). Detects multiple disconnected projects and pauses to ask the user. | | 4 Formatting | format_po_inputs.py | Converts synthesis.json into idea.md (Sparse Idea format, §3.1) and experimental_log.md (App. D.3), ready for paper-orchestra. |

Integration

Install — no extra dependencies beyond the base requirements.txt.

Symlink the skill into your host's skill directory alongside the others:

ln -sf ~/paper-orchestra/skills/agent-research-aggregator \
       ~/.claude/skills/agent-research-aggregator

For Cursor / Antigravity / Cline / Aider, follow the same per-host instructions in skills/paper-orchestra/references/host-integration.md.

Invoke by telling your coding agent:

"Aggregate my agent logs for paper writing" — or — "Prepare PaperOrchestra inputs from my cache" — or — "Turn my agent logs into a paper"

The trigger phrases are listed in the description field of skills/agent-research-aggregator/SKILL.md.

Parameters

| Flag | Default | Description | |---|---|---| | --search-roots | cwd, ~ | Directories to scan for agent caches | | --agents | all | Subset: claude,cursor,antigravity,openclaw | | --workspace | ./workspace | PaperOrchestra workspace root | | --depth | 4 | Max scan depth (prevents runaway traversal) | | --since | — | Only logs modified after this date (ISO 8601) |

Example workflows

From Claude Code memory + CLAUDE.md only:

python skills/agent-research-aggregator/scripts/discover_logs.py \
    --search-roots . \
    --agents claude \
    --out workspace/ara/discovered_logs.json
# → finds .claude/projects/<hash>/memory/*.md and CLAUDE.md

From a Cursor project (chat history + rules):

python skills/agent-research-aggregator/scripts/discover_logs.py \
    --search-roots ~/my-project \
    --agents cursor \
    --out workspace/ara/discovered_logs.json
# → finds .cursor/chat/chatHistory.json and .cursorrules

From Antigravity worker logs, restricted to the last 60 days:

python skills/agent-research-aggregator/scripts/discover_logs.py \
    --search-roots ~/my-project \
    --agents antigravity \
    --since 2026-02-09 \
    --out workspace/ara/discovered_logs.json
# → finds .antigravity/workers/<id>/log.jsonl and output.md

From OpenClaw sessions + run metrics:

python skills/agent-research-aggregator/scripts/discover_logs.py \
    --search-roots ~/my-project \
    --agents openclaw \
    --out workspace/ara/discovered_logs.json
# → finds .openclaw/sessions/*/conversation.md and runs/*/metrics.json

Full run across all caches:

# Phase 1 — discovery
python skills/agent-research-aggregator/scripts/discover_logs.py \
    --search-roots . ~ --out workspace/ara/discovered_logs.json

# Phase 2 — LLM extraction (your agent handles this; validate afterward)
python skills/agent-research-aggregator/scripts/extract_experiments.py \
    --discovered workspace/ara/discovered_logs.json \
    --out workspace/ara/raw_experiments.json --validate-only

# Phase 3 — LLM synthesis (your agent handles this)

# Phase 4 — format + audit report
python skills/agent-research-aggregator/scripts/format_po_inputs.py \
    --synthesis workspace/ara/synthesis.json \
    --out workspace/inputs/ \
    --report workspace/ara/aggregation_report.md

After Phase 4, the workspace is ready for paper-orchestra. You still need to supply workspace/inputs/template.tex (your conference LaTeX template) and workspace/inputs/conference_guidelines.md (page limit, deadline, formatting rules).

Reference docs

skills/agent-research-aggregator/SKILL.md — full phase-by-phase protocol
skills/agent-research-aggregator/references/log-formats.md — per-agent cache layouts and file priorities
skills/agent-research-aggregator/references/extraction-prompt.md — verbatim LLM extraction prompt
skills/agent-research-aggregator/references/synthesis-prompt.md — verbatim LLM synthesis prompt

Install

git clone <this repo> ~/paper-orchestra
cd ~/paper-orchestra
pip install -r requirements.txt   # deterministic helpers only

Then symlink the skill

Related Skills

node-connect

354.5k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

112.4k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

354.5k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

354.5k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。