SkillAgentSearch skills...

LiteCoST

πŸ”₯[ICLR'26] Official repository for the paper "Long-Document QA with Chain-of-Structured-Thought and Fine-Tuned SLMs"

Install / Use

/learn @HKUSTDial/LiteCoST
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Long-Document QA with Chain-of-Structured-Thought and Fine-Tuned SLMs

A two-stage RL-enhanced framework that equips SLMs for high-accuracy long-document QA.

<div align="center">

OpenReview arXiv Model Python

</div>

πŸŽ‰ News

  • [2026-01-26] Our LiteCoST is accepted by ICLR’26.

πŸ“‹ Overview

<div align="center"> <img src="assets/framework.png" alt="Overview Figure" width="800"/> </div>

Pillar 1: Chain-of-Structured-Thought (CoST) uses a high-capability LLM purely as a trace generator: it proposes a minimal structure, executes a step-wise, structure-guided trace over the documents, serializes the result, and verifies/refines it (optionally with an LLM-as-judge).

<div align="center"> <img src="assets/grpo.png" alt="Overview Figure" width="800"/> </div>

Pillar 2: SLM fine- tuning (SFT β†’ GRPO) trains an SLM with the CoST supervision in two phases: Supervised Fine-Tuning to learn structural patterns, formatting rules, and reasoning steps, followed by Group Relative Policy Optimization with dual signals that reward both answer/format quality and step/process consistencyβ€”transferring structure-first behavior to an efficient SLM for low-latency deployment.

πŸ—οΈ Method & Architecture

CoST: Structure-First Reasoning and Trace Generation

  1. πŸ” Structure Analysis
  2. 🧠 Trace Geneartion
  3. βœ… Data Verification
  4. πŸ” Data Refinement

SLM Fine-Tuning: SFT β†’ GRPO

  1. 🎯 Supervised Fine-Tuning (SFT)
  2. ⚑ Group Relative Poilcy Optimization (GRPO)

The core execution of LiteCoST is implemented in the src directory (See GRPO in 'verl/'):

src
β”œβ”€β”€ convert_func.py              # Conversion function module
β”œβ”€β”€ data_refinement.py           # Data refinement module
β”œβ”€β”€ data_verification.py         # Data verification module
β”œβ”€β”€ extract/                     # Extraction module
β”‚   β”œβ”€β”€ graph.py                 # Graph class
β”‚   β”œβ”€β”€ main.py                  # Main program
β”‚   β”œβ”€β”€ table.py                 # Table class
β”‚   β”œβ”€β”€ to_desc.py               # Convert to description
β”‚   β”œβ”€β”€ to_graph.py              # Convert to graph 
β”‚   └── to_table.py              # Convert to table
β”œβ”€β”€ sft.py                       # SFT module
β”œβ”€β”€ prompt.py                    # Prompt template module
β”œβ”€β”€ reasoner.py                  # Reasoning module
β”œβ”€β”€ reward.py                    # Reward module
β”œβ”€β”€ structure_analysis/          # Structure analysis module
β”‚   β”œβ”€β”€ query2schema.py          # Schema construction
β”‚   └── structure_decision.py    # Structure decision
β”œβ”€β”€ cal_latenct.py               # Calculate Latency
└── utils.py                     # Utility functions module

πŸ› οΈ Usage

  1. Generate the Serialized Structured Output
python main.py --model gpt-4o --dataset Loong --structured --document

cd src
python data_verification.py
python data_refinement.py
  1. Conduct SFT Training
python -m src.convert_func # data format conversion
python -m src.sft
  1. Conduct GRPO Optimization
cd verl
bash scripts/run_grpo_cost.sh

## merge model 
python scripts/model_merger.py merge --backend fsdp --local_dir checkpoints/cost-sft/cost-sft-llama3.2-3b-ins/global_step_1566/actor --target_dir merged/cost-grpo/llama3.2-3b-ins

Usage Examples

1. Quick Deployment
cd Loong/src
bash vllm_example.sh

2. Run the pipeline
python main.py --model deployed_model --dataset Loong --structured --document

🎯 Performance

<div align="center"> <p><b>Efficacy of Chain-of-Structured-Thought (CoST).</b></p> <div style="display: flex; justify-content: center; align-items: flex-start; gap: 16px;"> <figure style="margin: 0; flex: 1; text-align: center;"> <img src="assets/CoST_finance.png" alt="Finance" style="width: 100%; height: auto; max-height: 320px; object-fit: contain;"> <figcaption><b>Finance</b></figcaption> </figure> <!-- <figure style="margin: 0; flex: 1; text-align: center;"> <img src="assets/CoST_general.png" alt="General Knowledge" style="width: 100%; height: auto; max-height: 320px; object-fit: contain;"> <figcaption><b>General Knowledge</b></figcaption> --> </figure> </div> <p><b>Effectiveness: How good is LiteCoST for SSO Generation?</b></p> <div style="display: flex; justify-content: center; gap: 16px; margin-top: 16px;"> <figure style="margin: 0; flex: 1;"> <img src="assets/litecost_finance.png" alt="Finance" style="width: 100%; height: auto;"> <figcaption><b>Finance</b></figcaption> </figure> <figure style="margin: 0; flex: 1;"> <img src="assets/litecost_legal.png" alt="Legal" style="width: 100%; height: auto;"> <figcaption><b>Legal</b></figcaption> </figure> <!-- <figure style="margin: 0; flex: 1;"> <img src="assets/litecost_general.png" alt="General Knowledge" style="width: 100%; height: auto;"> <figcaption><b>General Knowledge</b></figcaption> </figure> --> </div> </div>

Acknowledgement

We implement our reinforcement learning algorithm by extending the veRL framework. For efficient inference, we leverage vLLM, and we develop evaluation scripts based on the Loong datasets. We sincerely thank these communities for their valuable contributions!

View on GitHub
GitHub Stars25
CategoryDevelopment
Updated1d ago
Forks0

Languages

Python

Security Score

75/100

Audited on Apr 7, 2026

No findings