OpenNovelty (English)

English | 中文

OpenNovelty is an LLM-powered agentic pipeline for transparent, evidence-grounded, and verifiable scholarly novelty assessment.

Website (public reports): https://www.opennovelty.org
Technical report (arXiv): https://arxiv.org/abs/2601.01576

🌟 If you find OpenNovelty useful, please give us a star on GitHub and an upvote on Hugging Face — your support helps us a lot!

Repository status: The codebase will be incrementally refactored over the next 2 months to improve quality; the current version is a demo.

Why OpenNovelty

Novelty is a key criterion in peer review, but manual evaluation is often constrained by time, subjectivity, and retrieval coverage. By grounding analysis in retrieval and evidence alignment, OpenNovelty provides traceable novelty assessments and helps reduce subjective bias.

System Architecture

Pipeline Overview

Figure: The full OpenNovelty workflow — a four-phase pipeline from paper input to report output.

Four-Phase Overview

| Phase | Functionality | Key Input | Key Output | Time | Dependencies | |:-----:|--------------|----------|-----------|:----:|-------------| | I | Information Extraction | Paper PDF URL | phase1_extracted.json | ~1 min | LLM API | | II | Literature Retrieval | Phase 1 outputs | citation_index.json | ~10 min | Wispaper API | | III | Deep Analysis | Phase 2 outputs | phase3_complete_report.json | ~10 min | LLM API | | IV | Report Generation | Phase 3 outputs | novelty_report.md/pdf | ~30 sec | weasyprint |

Details

Phase I — Information Extraction

Download the PDF and extract full text + metadata
Use an LLM to extract one core task and 1–3 contribution claims
Generate 3 retrieval query variants for each task/contribution

Phase II — Literature Retrieval

Semantic retrieval of related papers (via WisPaper API [paper])
Quality filtering (perfect-match, time filtering) and deduplication (canonical_id + title normalization)
Build a citation index (Core Task Top-50, Contribution Top-10)
⚠️ Note: Phase 2 depends on Wispaper API which is not yet publicly available. The API will be opened soon — please stay tuned for updates!

Phase III — Deep Analysis

Build a related-work taxonomy and synthesize a survey
Textual similarity detection (token-level fuzzy matching; ⚠️ experimental, recommended to skip)
Full-text comparative verification and novelty judgments (can_refute / cannot_refute / unclear)
💡 Recommended setting: set SKIP_TEXTUAL_SIMILARITY=true in .env to skip similarity detection (this experimental module is being updated)

Phase IV — Report Generation

Template-based rendering for Markdown/PDF reports (no LLM calls)
Unified citation formatting, evidence snippets, and hierarchical structure

Quick Start 🚀

Requirements

| Type | Requirement | |------|------------| | OS | Linux (Ubuntu 20.04+) / macOS | | Python | 3.8+ (recommended: 3.10+) | | Memory | 8GB+ | | Network | Access to OpenReview, Wispaper API, and an LLM API |

1️⃣ Install Dependencies

# Ubuntu/Debian system dependencies
sudo apt-get update && sudo apt-get install -y \
  git curl wget \
  libpango-1.0-0 libpangocairo-1.0-0 libgdk-pixbuf2.0-0 \
  libffi-dev libcairo2 libcairo2-dev libgirepository1.0-dev

# Python dependencies
cd /path/to/pnp_oss
python3 -m venv venv && source venv/bin/activate
pip install -r requirements.txt

Main dependencies: requests, openai, openreview-py, pypdf, weasyprint, python-dotenv, tqdm

2️⃣ Configuration

Create a .env file in the project root:

# ============ LLM API Configuration (Required) ============
export LLM_API_ENDPOINT="https://openrouter.ai/api/v1"           # Example
export LLM_API_KEY="sk-xxxxxxxx"
export LLM_MODEL_NAME="anthropic/claude-sonnet-4.5"           # Example

# ============ Wispaper API (Required for Phase 2) ============
# Token is saved to ~/.wispaper_tokens.json by default (see first-time setup below)

# ============ Phase 3 Configuration (Recommended) ============
export SKIP_TEXTUAL_SIMILARITY="true"           # Skip similarity detection (under development)

# ============ Optional Configuration ============
export HTTP_PROXY="http://127.0.0.1:7893"
export HTTPS_PROXY="http://127.0.0.1:7893"

🔐 Wispaper Authentication (Before Running Phase 2)

⚠️ Coming Soon: Wispaper API is not yet publicly available. The following configuration will be enabled once the API is opened.

3️⃣ Example Run (Single Paper)

Using https://openreview.net/pdf?id=ZgCCDwcGwn as an example:

# Phase 1 - Extraction (~1 min)
python scripts/run_phase1_batch.py \
  --papers "https://openreview.net/pdf?id=ZgCCDwcGwn" \
  --out-root output/demo \
  --force-year 2026 \
  2>&1 | tee logs/phase1.log

# Phase 2 - Retrieval (~10 min) ⚠️ Requires Wispaper API (coming soon)
bash scripts/run_phase2_concurrent.sh \
  openreview_ZgCCDwcGwn_20260118 \
  --base-dir output/demo \
  2>&1 | tee logs/phase2.log

# Phase 3 - Deep analysis (~10 min)
bash scripts/run_phase3_all.sh \
  output/demo/openreview_ZgCCDwcGwn_20260118 \
  2>&1 | tee logs/phase3.log

# Phase 4 - Report generation (~30 sec)
bash scripts/run_phase4.sh \
  output/demo/openreview_ZgCCDwcGwn_20260118 \
  2>&1 | tee logs/phase4.log

# View the result
cat output/demo/openreview_ZgCCDwcGwn_20260118/phase4/novelty_report.md

💡 Argument notes: --papers paper URL | --out-root output directory | --force-year force year | --base-dir search directory | | tee save logs

4️⃣ Batch Processing

# Create a paper list
cat > papers.txt << EOF
https://openreview.net/pdf?id=PAPER_ID_1
https://openreview.net/pdf?id=PAPER_ID_2
https://openreview.net/pdf?id=PAPER_ID_3
EOF

# Phase 1: Batch extraction
python scripts/run_phase1_batch.py \
  --paper-file papers.txt \
  --out-root output/batch \
  --force-year 2026

# Phase 2: Batch retrieval (auto-discover all papers) ⚠️ Requires Wispaper API (coming soon)
bash scripts/run_phase2_concurrent.sh \
  --base-dir output/batch \
  --auto-discover \           # Auto-discover all papers with Phase 1 completed under base-dir
  --max-workers 10            # Concurrency (default 10; adjust based on machine capacity)

# Phase 3+4: Batch analysis and report generation
bash scripts/run_phase3_phase4_serial_pending.sh output/batch

💡 Argument notes:

--paper-file: a list file (one URL per line)

--auto-discover: auto-scan all papers that need processing

--max-workers: number of parallel workers (Phase 2 makes concurrent API calls)

5️⃣ Common Commands

| Command | Purpose | | ---------------------------------------------- | ------------------------------------- | | python scripts/refresh_wispaper_token.py | Refresh Wispaper token ⚠️ (coming soon) | | python scripts/run_phase1_batch.py --help | Show Phase 1 help | | bash scripts/run_phase2_concurrent.sh --help | Show Phase 2 help ⚠️ (coming soon) | | cat logs/phase2.log \| grep ERROR | Locate error logs |

Technical Reference

Script Entrypoints (`scripts/`)

| Script | Function | Use Case | Time | | ------------------------------------- | --------------------------- | -------------------------------------- | :-------------: | | run_phase1_batch.py | Information extraction | Single / batch | ~1 min / paper | | run_phase2_concurrent.sh | Literature retrieval ⚠️ | Single / batch (coming soon) | ~10 min / paper | | run_phase2_only.py | Retrieval (single paper) ⚠️ | Coming soon | ~10 min / paper | | run_phase3_all.sh | Deep analysis (7 sub-steps) | Single / batch | ~10 min / paper | | run_phase4.sh | Report generation | Single paper | ~30 sec / paper | | run_phase3_phase4_serial_pending.sh | Batch completion | Auto-discover papers with Phase 2 done | - | | refresh_wispaper_token.py | Token refresh ⚠️ | Coming soon | ~10 sec |

Directory Layout

output/<run>/<paper_id>/
├── phase1/                         # Phase 1 outputs
│   ├── phase1_extracted.json       # ⭐ Core task and contributions (with query variants)
│   ├── paper.json                  # Paper metadata
│   ├──

OpenNovelty

Install / Use

README

OpenNovelty (English)

Why OpenNovelty

System Architecture

Four-Phase Overview

Details

Quick Start 🚀

Requirements

1️⃣ Install Dependencies

2️⃣ Configuration

🔐 Wispaper Authentication (Before Running Phase 2)

3️⃣ Example Run (Single Paper)

4️⃣ Batch Processing

5️⃣ Common Commands

Technical Reference

Script Entrypoints (`scripts/`)

Directory Layout

OpenNovelty

Install / Use

README

OpenNovelty (English)

Why OpenNovelty

System Architecture

Four-Phase Overview

Details

Quick Start 🚀

Requirements

1️⃣ Install Dependencies

2️⃣ Configuration

🔐 Wispaper Authentication (Before Running Phase 2)

3️⃣ Example Run (Single Paper)

4️⃣ Batch Processing

5️⃣ Common Commands

Technical Reference

Script Entrypoints (scripts/)

Directory Layout

Script Entrypoints (`scripts/`)