SkillAgentSearch skills...

OpenNovelty

No description available

Install / Use

/learn @january-blue/OpenNovelty
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

OpenNovelty (English)

License arXiv Hugging Face Website

English | 中文

OpenNovelty is an LLM-powered agentic pipeline for transparent, evidence-grounded, and verifiable scholarly novelty assessment.

  • Website (public reports): https://www.opennovelty.org
  • Technical report (arXiv): https://arxiv.org/abs/2601.01576

🌟 If you find OpenNovelty useful, please give us a star on GitHub and an upvote on Hugging Face — your support helps us a lot!

Repository status: The codebase will be incrementally refactored over the next 2 months to improve quality; the current version is a demo.


Why OpenNovelty

Novelty is a key criterion in peer review, but manual evaluation is often constrained by time, subjectivity, and retrieval coverage. By grounding analysis in retrieval and evidence alignment, OpenNovelty provides traceable novelty assessments and helps reduce subjective bias.


System Architecture

Pipeline Overview

Figure: The full OpenNovelty workflow — a four-phase pipeline from paper input to report output.

Four-Phase Overview

| Phase | Functionality | Key Input | Key Output | Time | Dependencies | |:-----:|--------------|----------|-----------|:----:|-------------| | I | Information Extraction | Paper PDF URL | phase1_extracted.json | ~1 min | LLM API | | II | Literature Retrieval | Phase 1 outputs | citation_index.json | ~10 min | Wispaper API | | III | Deep Analysis | Phase 2 outputs | phase3_complete_report.json | ~10 min | LLM API | | IV | Report Generation | Phase 3 outputs | novelty_report.md/pdf | ~30 sec | weasyprint |

Details

Phase I — Information Extraction

  • Download the PDF and extract full text + metadata
  • Use an LLM to extract one core task and 1–3 contribution claims
  • Generate 3 retrieval query variants for each task/contribution

Phase II — Literature Retrieval

  • Semantic retrieval of related papers (via WisPaper API [paper])
  • Quality filtering (perfect-match, time filtering) and deduplication (canonical_id + title normalization)
  • Build a citation index (Core Task Top-50, Contribution Top-10)
  • ⚠️ Note: Phase 2 depends on Wispaper API which is not yet publicly available. The API will be opened soon — please stay tuned for updates!

Phase III — Deep Analysis

  • Build a related-work taxonomy and synthesize a survey
  • Textual similarity detection (token-level fuzzy matching; ⚠️ experimental, recommended to skip)
  • Full-text comparative verification and novelty judgments (can_refute / cannot_refute / unclear)
  • 💡 Recommended setting: set SKIP_TEXTUAL_SIMILARITY=true in .env to skip similarity detection (this experimental module is being updated)

Phase IV — Report Generation

  • Template-based rendering for Markdown/PDF reports (no LLM calls)
  • Unified citation formatting, evidence snippets, and hierarchical structure

Quick Start 🚀

Requirements

| Type | Requirement | |------|------------| | OS | Linux (Ubuntu 20.04+) / macOS | | Python | 3.8+ (recommended: 3.10+) | | Memory | 8GB+ | | Network | Access to OpenReview, Wispaper API, and an LLM API |

1️⃣ Install Dependencies

# Ubuntu/Debian system dependencies
sudo apt-get update && sudo apt-get install -y \
  git curl wget \
  libpango-1.0-0 libpangocairo-1.0-0 libgdk-pixbuf2.0-0 \
  libffi-dev libcairo2 libcairo2-dev libgirepository1.0-dev

# Python dependencies
cd /path/to/pnp_oss
python3 -m venv venv && source venv/bin/activate
pip install -r requirements.txt

Main dependencies: requests, openai, openreview-py, pypdf, weasyprint, python-dotenv, tqdm

2️⃣ Configuration

Create a .env file in the project root:

# ============ LLM API Configuration (Required) ============
export LLM_API_ENDPOINT="https://openrouter.ai/api/v1"           # Example
export LLM_API_KEY="sk-xxxxxxxx"
export LLM_MODEL_NAME="anthropic/claude-sonnet-4.5"           # Example

# ============ Wispaper API (Required for Phase 2) ============
# Token is saved to ~/.wispaper_tokens.json by default (see first-time setup below)

# ============ Phase 3 Configuration (Recommended) ============
export SKIP_TEXTUAL_SIMILARITY="true"           # Skip similarity detection (under development)

# ============ Optional Configuration ============
export HTTP_PROXY="http://127.0.0.1:7893"
export HTTPS_PROXY="http://127.0.0.1:7893"

🔐 Wispaper Authentication (Before Running Phase 2)

⚠️ Coming Soon: Wispaper API is not yet publicly available. The following configuration will be enabled once the API is opened.

<!-- **One-time setup, long-term effective**: ```bash python scripts/refresh_wispaper_token.py # → 🌐 Automatically opens a browser for login (register first: https://wispaper.ai) # → 💾 Token saved to ~/.wispaper_tokens.json # → 🔄 Auto-refresh enabled, no need to repeat ``` **Custom token path** (optional): ```bash export WISPAPER_TOKEN_FILE="/your/custom/path/wispaper_tokens.json" ``` **Verify configuration**: ```bash python -c "from paper_novelty_pipeline.services.wispaper_client import WispaperClient; WispaperClient()" # Seeing "Loaded token bundle" indicates success ``` -->

3️⃣ Example Run (Single Paper)

Using https://openreview.net/pdf?id=ZgCCDwcGwn as an example:

# Phase 1 - Extraction (~1 min)
python scripts/run_phase1_batch.py \
  --papers "https://openreview.net/pdf?id=ZgCCDwcGwn" \
  --out-root output/demo \
  --force-year 2026 \
  2>&1 | tee logs/phase1.log

# Phase 2 - Retrieval (~10 min) ⚠️ Requires Wispaper API (coming soon)
bash scripts/run_phase2_concurrent.sh \
  openreview_ZgCCDwcGwn_20260118 \
  --base-dir output/demo \
  2>&1 | tee logs/phase2.log

# Phase 3 - Deep analysis (~10 min)
bash scripts/run_phase3_all.sh \
  output/demo/openreview_ZgCCDwcGwn_20260118 \
  2>&1 | tee logs/phase3.log

# Phase 4 - Report generation (~30 sec)
bash scripts/run_phase4.sh \
  output/demo/openreview_ZgCCDwcGwn_20260118 \
  2>&1 | tee logs/phase4.log

# View the result
cat output/demo/openreview_ZgCCDwcGwn_20260118/phase4/novelty_report.md

💡 Argument notes: --papers paper URL | --out-root output directory | --force-year force year | --base-dir search directory | | tee save logs

4️⃣ Batch Processing

# Create a paper list
cat > papers.txt << EOF
https://openreview.net/pdf?id=PAPER_ID_1
https://openreview.net/pdf?id=PAPER_ID_2
https://openreview.net/pdf?id=PAPER_ID_3
EOF

# Phase 1: Batch extraction
python scripts/run_phase1_batch.py \
  --paper-file papers.txt \
  --out-root output/batch \
  --force-year 2026

# Phase 2: Batch retrieval (auto-discover all papers) ⚠️ Requires Wispaper API (coming soon)
bash scripts/run_phase2_concurrent.sh \
  --base-dir output/batch \
  --auto-discover \           # Auto-discover all papers with Phase 1 completed under base-dir
  --max-workers 10            # Concurrency (default 10; adjust based on machine capacity)

# Phase 3+4: Batch analysis and report generation
bash scripts/run_phase3_phase4_serial_pending.sh output/batch

💡 Argument notes:

  • --paper-file: a list file (one URL per line)
  • --auto-discover: auto-scan all papers that need processing
  • --max-workers: number of parallel workers (Phase 2 makes concurrent API calls)

5️⃣ Common Commands

| Command | Purpose | | ---------------------------------------------- | ------------------------------------- | | python scripts/refresh_wispaper_token.py | Refresh Wispaper token ⚠️ (coming soon) | | python scripts/run_phase1_batch.py --help | Show Phase 1 help | | bash scripts/run_phase2_concurrent.sh --help | Show Phase 2 help ⚠️ (coming soon) | | cat logs/phase2.log \| grep ERROR | Locate error logs |


Technical Reference

Script Entrypoints (scripts/)

| Script | Function | Use Case | Time | | ------------------------------------- | --------------------------- | -------------------------------------- | :-------------: | | run_phase1_batch.py | Information extraction | Single / batch | ~1 min / paper | | run_phase2_concurrent.sh | Literature retrieval ⚠️ | Single / batch (coming soon) | ~10 min / paper | | run_phase2_only.py | Retrieval (single paper) ⚠️ | Coming soon | ~10 min / paper | | run_phase3_all.sh | Deep analysis (7 sub-steps) | Single / batch | ~10 min / paper | | run_phase4.sh | Report generation | Single paper | ~30 sec / paper | | run_phase3_phase4_serial_pending.sh | Batch completion | Auto-discover papers with Phase 2 done | - | | refresh_wispaper_token.py | Token refresh ⚠️ | Coming soon | ~10 sec |

Directory Layout

output/<run>/<paper_id>/
├── phase1/                         # Phase 1 outputs
│   ├── phase1_extracted.json       # ⭐ Core task and contributions (with query variants)
│   ├── paper.json                  # Paper metadata
│   ├── 
View on GitHub
GitHub Stars125
CategoryDevelopment
Updated1d ago
Forks2

Languages

Python

Security Score

90/100

Audited on Apr 2, 2026

No findings