OneKE
[WWW 2025] A Dockerized Schema-Guided LLM Agent-based Knowledge Extraction System.
Install / Use
/learn @zjunlp/OneKEREADME
Table of Contents
- Table of Contents
- 🔔News
- 🌟Overview
- 🚀Quick Start
- 📟Web UI Navigation
- 🔍Further Usage
- 🛠️Network Issue Solutions
- 🎉Contributors
- 🌻Acknowledgement
- 🚩Citation
🔔News
- [2025/07] We update the OneKE frontend demo to support visualization of constructed knowledge graphs.
- [2025/03] We add the Triple Extraction Task for automated knowledge graph construction.
- [2025/02] We support the local deployment of the DeepSeek-R1 series in addition to the existing API service, as well as vllm acceleration for other LLMs.
- [2025/01] OneKE is accepted by WWW 2025 Demonstration Track 🎉🎉🎉.
- [2024/12] We open source the OneKE framework, supporting multi-agent knowledge extraction across various scenarios.
- [2024/04] We release a new bilingual (Chinese and English) schema-based information extraction model called OneKE based on Chinese-Alpaca-2-13B.
🌟Overview
OneKE is a flexible dockerized system for schema-guided knowledge extraction, capable of extracting information from the web and raw PDF books across multiple domains like science and news. It employs a collaborative multi-agent approach and includes a user-customizable knowledge base to enable tailored extraction. Embark on your information extraction journey with OneKE!
<img src="./figs/main.png" alt="method" style="zoom: 50%;" />OneKE currently offers the following features:
- [x] Various IE Tasks Support
- [x] Various Data Sources Support
- [x] Various LLMs Support
- [x] Various Extraction Method Support
- [x] User-Configurable Knowledge Base
🚀Quick Start
We have developed a webpage demo for OneKE with Gradio, click here try information extraction in an intuitive way.
Note: The demo only displays OneKE's basic capabilities for efficiency. Consider the local deployment steps below for further features.
Step1: Environment Setup
OneKE supports both manual and docker image environment configuration, choose your preferred method to build.
🔩Manual Environment Configuration
Conda virtual environments offer a light and flexible setup.
Prerequisites
- Anaconda Installation
- GPU support (recommended CUDA version: 12.4)
Configure Steps
- Clone the repository:
git clone https://github.com/zjunlp/OneKE.git
- Enter the working directory, and all subsequent commands should be executed in this directory.
cd OneKE
- Create a virtual environment using
Anaconda.
conda create -n oneke python=3.9
conda activate oneke
- Install all required Python packages.
pip install -r requirements.txt
# If you encounter network issues, consider setting up a domestic mirror for pip.
🐳Building With Docker Image
Docker image provides greater reliability and stability.
Prerequisites
- Docker Installation
- NVIDIA Container Toolkit
- GPU support (recommended CUDA version: 12.4)
Configure Steps
- Clone the repository:
git clone https://github.com/zjunlp/OneKE.git
- Pull the docker image from the mirror repository.
docker pull zjunlp/oneke:v4
# If you encounter network issues, consider setting up domestic registry mirrors for docker.
- Launch a container from the image.
docker run --gpus all \
-v ./OneKE:/app/OneKE \
-it oneke:v4 /bin/bash
If using locally deployed models, ensure the local model path is mapped to the container:
docker run --gpus all \
-v ./OneKE:/app/OneKE \
-v your_local_model_path:/app/model/your_model_name \
-it oneke:v4 /bin/bash
Map any necessary local files to the container paths as shown above, and use container paths in your code and execution.
Upon starting, the container will enter the /app/OneKE directory as its working directory. Just modify the code locally as needed, and the changes will sync to the container through mapping.
Step2: Start with Examples
We offer two quick-start options. Choose your preferred method to swiftly explore OneKE with predefined examples.
Note:
- Ensure that your working directory is set to the
OneKEfolder, whether in a virtual environment or a docker container.- Refer to here to resolve the network issues. If you have more questions, feel free to open an issue with us.
🖊️Start with CLI
Step1: Prepare the configuration file
Several YAML configuration files are available in the examples/config. These extraction scenarios cover different extraction data, methods, and models, allowing you to easily explore all the features of OneKE.
Web News Extraction:
Here is the example for the web news knowledge extraction scenario, with the source extraction text in HTML format:
# model configuration
model:
category: DeepSeek # model category, chosen from ChatGPT, DeepSeek, LLaMA, Qwen, ChatGLM, MiniCPM, OneKE.
model_name_or_path: deepseek-chat # model name, chosen from deepseek-chat and deepseek-reasoner. Choose deepseek-chat to use DeepSeek-V3 or choose deepseek-reasoner to use DeepSeek-R1.
api_key: your_api_key # your API key for the model with API service. No need for open-source models.
base_url: https://api.deepseek.com # base URL for the API service. No need for open-source models.
# extraction configuration
extraction:
task: Base # task type, chosen from Base, NER, RE, EE.
instruction: Extract key information from the given text. # description for the task. No need for NER, RE, EE task.
use_file: true # whether to use a file for the input text. Default set to false.
file_path: ./data/input_files/Tulsi_Gabbard_News.html # path to the input file. No need if use_file is set to false.
output_schema: NewsReport # output schema for the extraction task. Selected the from schema repository.
mode: customized # extraction mode, chosen from quick, detailed, customized. Default set to quick. See src/config.yaml for more details.
update_case: false # whether to update the case repository. Default set to false.
show_trajectory: false # whether to display the extracted intermediate steps
The model section contains information about the extraction model, while the extraction section configures the settings for the extraction process.
You can choose an existing configuration file or customize the extraction settings as you wish. Note that when using an API service like ChatGPT and DeepSeek, please set your API key.
Step2: Run the shell script
Specify the configuration file path and run the code to start the extraction process.
config_file=your_yaml_file_path # configuration file path, use the container path if inside a container
python src/run.py --config $config_file # start extraction, executed in the OneKE directory
If you want to deploy the local models using vllm, run the following code:
config_file=your_yaml_file_path # REMEMBER to set vllm_serve to TRUE!
python src/models/vllm_serve.py --config $config_file # deploy local model via vllm, executed in the OneKE directory
python src/run.py --config $config_file # start extraction, executed in the OneKE directory
Refer to here to get an overview of the knowledge extraction results.
Note: You can also try OneKE by directly running the
example.pyfile located in theexampledirectory. In this way, you can explore more advanced uses flexibly.
🖊️Start with Web UI
Note: Before starting with the web UI, make sure the package
gradio 4.44.0is already installed in your Environment.
Step1: Execute Command
Execute the following commands in the OneKE directory:
cd frontend/
chmod u+x start.sh
bash ./start.sh
Step2: Open your Web Browser
The front-end is built with Streamlit, and the default port is 8501. Therefore, please enter the following URL in your browser's
