SkillAgentSearch skills...

DramaBench

A six-dimensional evaluation framework for drama script continuation with interactive leaderboard and case studies

Install / Use

/learn @IIIIQIIII/DramaBench
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

DramaBench

<div align="center">

DramaBench Cover

A Six-Dimensional Evaluation Framework for Drama Script Continuation

Status License Paper

🌐 Website✨ Interactive Demo📊 Leaderboard🤗 Dataset

</div>

📋 Table of Contents


<a id="overview"></a>

🎯 Overview

DramaBench is a comprehensive benchmark for evaluating drama script continuation capabilities of large language models. It provides:

Core Components

  • 🌐 Project Website - Interactive showcase with evaluation results and case studies
  • Interactive Demo - Try script continuation with multiple LLM models (user-provided API key)
  • 💾 Large-Scale Dataset - 1,103 drama scripts with human annotations
  • 📊 Evaluation Framework - 6 independent dimensions with rigorous metrics
  • 🏆 Model Leaderboard - Compare 8 SOTA language models
  • 📝 Case Studies - 24 curated examples with detailed analysis
  • 🔧 Evaluation Prompts - LLM-based labeling templates for all 6 dimensions

Six Evaluation Dimensions

  1. Format Standards (Rule-based) - Screenplay format compliance
  2. Narrative Efficiency (LLM-labeled) - Story progression effectiveness
  3. Character Consistency (LLM-labeled) - Character voice and behavior
  4. Emotional Depth (LLM-labeled) - Emotional arc development
  5. Logic Consistency (LLM-labeled) - Factual coherence and continuity
  6. Conflict Handling (LLM-labeled) - Conflict development quality

Key Statistics

  • 1,103 unique drama scripts
  • 8,824 total evaluations (1,103 scripts × 8 models)
  • 8 state-of-the-art language models
  • 6 independent evaluation dimensions
  • 252 statistical significance tests (65.9% significant)
  • 24 curated case studies

<a id="quick-start"></a>

🚀 Quick Start

Prerequisites

  • Python 3.10+
  • Web browser (Chrome, Safari, Firefox, or Edge)

Launch Web Demo

Method 1: One-Click Start (Easiest)

cd DramaBench
./start_demo.sh

This will automatically:

  • ✅ Start a local HTTP server on port 8000
  • ✅ Open the demo in your default browser
  • ✅ Navigate to http://localhost:8000

Method 2: Manual Server Start

cd DramaBench

# Using uv (if available)
uv run python -m http.server 8000

# Or using Python 3 directly
python3 -m http.server 8000

# Then open http://localhost:8000 in your browser

⚠️ Important Note

Due to browser CORS restrictions, you must use a local HTTP server to view the demo. Opening HTML files directly (file:// protocol) will cause data loading errors.


<a id="project-components"></a>

🧩 Project Components

1. Project Website & Interactive Demo

An interactive, Apple-inspired web interface for exploring evaluation results and trying script continuation.

Website Features:

  • 📊 Interactive leaderboard with dimension filters
  • 📝 Case studies explorer with 24 examples
  • 🎨 Premium dark gradient design
  • 📱 Fully responsive (mobile/tablet/desktop)
  • ⚡ Pure HTML/CSS/JavaScript (no frameworks)

Interactive Demo Features:

  • ✨ Try script continuation with 4 SOTA models (GPT-5.2, Gemini 3, GLM-4.7, MiniMax M2.1)
  • 🔑 User-provided OpenRouter API key (stored locally)
  • 📜 500 drama scripts from DramaBench dataset
  • 🎭 Official prompt template for generation
  • 📊 Compare AI-generated vs ground truth continuations
  • 🎨 Matching Apple-style design

Pages:

  • index.html - Main landing page
  • web/leaderboard.html - Model rankings
  • web/cases.html - Case studies browser
  • web/demo.html - Interactive script continuation demo

→ View Live Website | → Try Interactive Demo

2. Dataset

🎉 Now Available on Hugging Face!

The DramaBench dataset is being released progressively to ensure quality and gather community feedback.

Current Release (v2.0):

  • 500 Drama Scripts - Available now on Hugging Face
  • 📥 Download: FutureMa/DramaBench
  • 📄 Format: JSONL with structured metadata
  • 🔓 License: MIT License
  • 📊 Usage: Load with datasets library

Quick Start:

from datasets import load_dataset

# Load dataset
dataset = load_dataset("FutureMa/DramaBench", split="train")

# Access samples
sample = dataset[0]
print(sample['title'])
print(sample['context'])
print(sample['continuation'])

Release Roadmap: | Version | Samples | Status | Expected Release | |---------|---------|--------|------------------| | v1.0 | 100 | ✅ Released | 2025-12-23 | | v2.0 | 500 | ✅ Available | 2026-01-01 | | v3.0 (Full) | 1,103 | 📋 Planned | Q2 2026 |

Full Dataset Contents (v3.0):

  • 1,103 drama script contexts and continuations
  • Model-generated continuations (8 SOTA models)
  • Human annotations and quality assessments
  • Multi-dimensional evaluation metrics
  • Error taxonomy and classification

3. Evaluation Prompts

✅ Now Available: LLM-based evaluation prompt templates for all 6 dimensions.

Location: prompts/ directory

Contents:

  • narrative_efficiency_prompt.txt - Story progression effectiveness
  • character_consistency_prompt.txt - Character voice and behavior consistency
  • emotional_depth_prompt.txt - Emotional arc development
  • logic_consistency_prompt.txt - Factual coherence and continuity
  • conflict_handling_prompt.txt - Conflict development and resolution
  • dialogue_quality_prompt.txt - Dialogue naturalness and purpose

Quick Start:

# Load a prompt template
with open('prompts/narrative_efficiency_prompt.txt', 'r') as f:
    prompt = f.read()

# Fill placeholders
prompt = prompt.replace('{CONTEXT}', script_context)
prompt = prompt.replace('{CONTINUATION}', generated_continuation)
prompt = prompt.replace('{MODEL}', 'GPT-4')
prompt = prompt.replace('{SCRIPT_ID}', 'script_001')

# Send to LLM and get structured JSON output
response = llm_api_call(prompt)
evaluation = json.loads(response)

See prompts/README.md for detailed usage instructions.

Coming Soon: Full evaluation pipeline including:

  • Statistical analysis scripts
  • Visualization generation tools
  • Reproducibility automation scripts

<a id="web-demo"></a>

🌐 Website & Interactive Demo

Live Website

Visit dramabench.pages.dev to explore:

  • Homepage - Project overview and statistics
  • Leaderboard - Compare 8 SOTA models across 6 dimensions
  • Case Studies - Browse 24 curated examples with detailed analysis
  • Interactive Demo - Try script continuation yourself

Interactive Demo

Try it now: dramabench.pages.dev/web/demo.html

Experience drama script continuation with state-of-the-art language models:

Features:

  • 🎭 500 Drama Scripts - Select from DramaBench v2.0 dataset
  • 🤖 4 SOTA Models - GPT-5.2, Gemini 3 Flash, GLM-4.7, MiniMax M2.1
  • 🔑 Your API Key - Uses OpenRouter API (bring your own key)
  • 📊 Compare Results - View AI-generated vs ground truth side-by-side
  • 🎨 Apple Design - Beautiful, responsive interface

How to Use:

  1. Get your free API key from OpenRouter
  2. Visit the demo page
  3. Enter your API key (stored locally in your browser)
  4. Select a script from 500 options
  5. Choose your preferred model
  6. Generate and compare continuations

Cost: Pay-as-you-go through OpenRouter (typically $0.01-0.10 per generation)

Website Features

Interactive Leaderboard

  • Filter by dimension (overall + 6 dimensions)
  • Expandable model details with per-dimension scores
  • Rank badges (gold/silver/bronze)
  • Real-time filtering and sorting

Case Studies Explorer

  • 24 curated success/failure examples
  • Filter by dimension and type
  • Script excerpts with metrics
  • Analysis insights and takeaways

Design

  • Apple-inspired UI with premium dark gradients
  • SF Pro font family (system fonts)
  • Glassmorphism effects
  • Smooth animations and transitions
  • Fully responsive layout

Technologies

  • Pure HTML/CSS/JavaScript (no frameworks)
  • Apple Design Language principles
  • CSS Grid & Flexbox layouts
  • Backdrop filters for glassmorphism
  • CSS animations for smooth transitions

Local Development

Regenerate web demo data from source:

cd DramaBench
uv run python web/scripts/process_data.py

This processes:

  • 6 dimension metrics CSV files (8,824 evaluations)
  • 24 case studies with detailed analysis
  • Generates web-friendly JSON in web/data/

<a id="dataset"></a>

💾 Dataset

Dataset Access

🤗 Hugging Face Dataset: FutureMa/DramaBench

Current Release: v2.0 (500 samples) - Available Now!

Quick Start

Load with Datasets Library:

from datasets import load_dataset

# Load the dataset
dataset = load_dataset("FutureMa/DramaBench", split="train")

# Access a sample
sample = dataset[0]
print(f"Title: {sample['title']}")
print(f"Context: {sample['context'][:200]}...")
print(f"Continuation: {sample['continuation'][:200]}...")
print(f"Stats: {sample['stats']}")

Analyze Dataset:

View on GitHub
GitHub Stars84
CategoryDevelopment
Updated2d ago
Forks5

Languages

HTML

Security Score

100/100

Audited on Apr 4, 2026

No findings