<p align="center"> <img src="./figs/logo.png" width="300px"> </p> <h3 align="center"> A Dockerized Schema-Guided Knowledge Extraction System </h3> <p align="center"> <a href="http://oneke.openkg.cn/">🌐Web</a> • <a href="http://oneke.openkg.cn/demo.mp4">📹Video</a> • <a href="https://huggingface.co/spaces/zjunlp/OneKE">🤗HF·Demo </p>

Table of Contents
🔔News
🌟Overview
🚀Quick Start
- Step1: Environment Setup
  - 🔩Manual Environment Configuration
  - 🐳Building With Docker Image
- Step2: Start with Examples
  - 🖊️Start with CLI
  - 🖊️Start with Web UI
📟Web UI Navigation
🔍Further Usage
🛠️Network Issue Solutions
🎉Contributors
🌻Acknowledgement
🚩Citation

🔔News

[2025/07] We update the OneKE frontend demo to support visualization of constructed knowledge graphs.
[2025/03] We add the Triple Extraction Task for automated knowledge graph construction.
[2025/02] We support the local deployment of the DeepSeek-R1 series in addition to the existing API service, as well as vllm acceleration for other LLMs.
[2025/01] OneKE is accepted by WWW 2025 Demonstration Track 🎉🎉🎉.
[2024/12] We open source the OneKE framework, supporting multi-agent knowledge extraction across various scenarios.
[2024/04] We release a new bilingual (Chinese and English) schema-based information extraction model called OneKE based on Chinese-Alpaca-2-13B.

🌟Overview

OneKE is a flexible dockerized system for schema-guided knowledge extraction, capable of extracting information from the web and raw PDF books across multiple domains like science and news. It employs a collaborative multi-agent approach and includes a user-customizable knowledge base to enable tailored extraction. Embark on your information extraction journey with OneKE!

OneKE currently offers the following features:

[x] Various IE Tasks Support
[x] Various Data Sources Support
[x] Various LLMs Support
[x] Various Extraction Method Support
[x] User-Configurable Knowledge Base

🚀Quick Start

We have developed a webpage demo for OneKE with Gradio, click here try information extraction in an intuitive way.

Note: The demo only displays OneKE's basic capabilities for efficiency. Consider the local deployment steps below for further features.

Step1: Environment Setup

OneKE supports both manual and docker image environment configuration, choose your preferred method to build.

🔩Manual Environment Configuration

Conda virtual environments offer a light and flexible setup.

Prerequisites

Anaconda Installation
GPU support (recommended CUDA version: 12.4)

Configure Steps

Clone the repository:

git clone https://github.com/zjunlp/OneKE.git

Enter the working directory, and all subsequent commands should be executed in this directory.

cd OneKE

Create a virtual environment using Anaconda.

conda create -n oneke python=3.9
conda activate oneke

Install all required Python packages.

pip install -r requirements.txt
# If you encounter network issues, consider setting up a domestic mirror for pip.

🐳Building With Docker Image

Docker image provides greater reliability and stability.

Prerequisites

Docker Installation
NVIDIA Container Toolkit
GPU support (recommended CUDA version: 12.4)

Configure Steps

Clone the repository:

git clone https://github.com/zjunlp/OneKE.git

Pull the docker image from the mirror repository.

docker pull zjunlp/oneke:v4
# If you encounter network issues, consider setting up domestic registry mirrors for docker.

Launch a container from the image.

docker run --gpus all \
  -v ./OneKE:/app/OneKE \
  -it oneke:v4 /bin/bash

If using locally deployed models, ensure the local model path is mapped to the container:

docker run --gpus all \
  -v ./OneKE:/app/OneKE \
  -v your_local_model_path:/app/model/your_model_name \
  -it oneke:v4 /bin/bash

Map any necessary local files to the container paths as shown above, and use container paths in your code and execution.

Upon starting, the container will enter the /app/OneKE directory as its working directory. Just modify the code locally as needed, and the changes will sync to the container through mapping.

Step2: Start with Examples

We offer two quick-start options. Choose your preferred method to swiftly explore OneKE with predefined examples.

Note:

Ensure that your working directory is set to the OneKE folder, whether in a virtual environment or a docker container.

Refer to here to resolve the network issues. If you have more questions, feel free to open an issue with us.

🖊️Start with CLI

Step1: Prepare the configuration file

Several YAML configuration files are available in the examples/config. These extraction scenarios cover different extraction data, methods, and models, allowing you to easily explore all the features of OneKE.

Web News Extraction:

Here is the example for the web news knowledge extraction scenario, with the source extraction text in HTML format:

# model configuration
model:
  category: DeepSeek  # model category, chosen from ChatGPT, DeepSeek, LLaMA, Qwen, ChatGLM, MiniCPM, OneKE.
  model_name_or_path: deepseek-chat # model name, chosen from deepseek-chat and deepseek-reasoner. Choose deepseek-chat to use DeepSeek-V3 or choose deepseek-reasoner to use DeepSeek-R1.
  api_key: your_api_key # your API key for the model with API service. No need for open-source models.
  base_url: https://api.deepseek.com # base URL for the API service. No need for open-source models.

# extraction configuration
extraction:
  task: Base # task type, chosen from Base, NER, RE, EE.
  instruction: Extract key information from the given text. # description for the task. No need for NER, RE, EE task.
  use_file: true # whether to use a file for the input text. Default set to false.
  file_path: ./data/input_files/Tulsi_Gabbard_News.html # path to the input file. No need if use_file is set to false.
  output_schema: NewsReport # output schema for the extraction task. Selected the from schema repository.
  mode: customized # extraction mode, chosen from quick, detailed, customized. Default set to quick. See src/config.yaml for more details.
  update_case: false # whether to update the case repository. Default set to false.
  show_trajectory: false # whether to display the extracted intermediate steps

The model section contains information about the extraction model, while the extraction section configures the settings for the extraction process.

You can choose an existing configuration file or customize the extraction settings as you wish. Note that when using an API service like ChatGPT and DeepSeek, please set your API key.

Step2: Run the shell script

Specify the configuration file path and run the code to start the extraction process.

config_file=your_yaml_file_path # configuration file path, use the container path if inside a container
python src/run.py --config $config_file # start extraction, executed in the OneKE directory

If you want to deploy the local models using vllm, run the following code:

config_file=your_yaml_file_path # REMEMBER to set vllm_serve to TRUE!
python src/models/vllm_serve.py --config $config_file # deploy local model via vllm, executed in the OneKE directory
python src/run.py --config $config_file # start extraction, executed in the OneKE directory

Refer to here to get an overview of the knowledge extraction results.

Note: You can also try OneKE by directly running the example.py file located in the example directory. In this way, you can explore more advanced uses flexibly.

🖊️Start with Web UI

Note: Before starting with the web UI, make sure the package gradio 4.44.0 is already installed in your Environment.

Step1: Execute Command

Execute the following commands in the OneKE directory:

cd frontend/
chmod u+x start.sh
bash ./start.sh

Step2: Open your Web Browser

The front-end is built with Streamlit, and the default port is 8501. Therefore, please enter the following URL in your browser's

OneKE

Install / Use

README

Table of Contents