SkillAgentSearch skills...

AgenticData

agentic data generation(under refactor!!!)

Install / Use

/learn @Max-VibeCoding/AgenticData

README

SeekingData

<div align="center"> <img src="seekingdata.png" alt="SeekingData" width="600" />

Professional SFT Data Generation & Harbor Task Management Platform

License: MIT Python 3.12+ React 18 FastAPI Electron

中文文档

</div>

⚠️ Work in Progress: This project is undergoing a complete refactor with many new features being added. It is not recommended for production use at this time. Please wait for a stable release.


Overview

SeekingData is a cross-platform desktop application that integrates SFT (Supervised Fine-Tuning) data generation and Harbor task management, featuring Material Design 3 for a modern user experience.

Features

SFT Data Generation

| Feature | Description | |---------|-------------| | Single Processing | File upload (PDF/DOCX/TXT), URL extraction, AI-powered generation | | Batch Processing | Bulk URL processing with real-time progress tracking | | Format Converter | Alpaca ↔ OpenAI bidirectional conversion | | CoT Generator | Chain of Thought reasoning data generation | | Image Dataset | Automatic image description generation | | Video Dataset | Video understanding data processing | | Dataset Sharing | One-click upload to HuggingFace |

Harbor Task Management

| Feature | Description | |---------|-------------| | GitHub Task Generator | Auto-generate tasks from GitHub repositories | | Visual Task Builder | Drag-and-drop editing with Monaco Editor | | Task Manager | List, search, view details, export tasks | | Task Validation | Integrated Harbor validation tools |

Tech Stack

Frontend

  • Framework: React 18 + Vite 5
  • UI Design: Material Design 3
  • Styling: TailwindCSS 3.4
  • State Management: Zustand
  • Routing: React Router DOM 7
  • Code Editor: Monaco Editor
  • Flow Editor: React Flow

Backend

  • Framework: FastAPI 0.115+
  • Language: Python 3.12
  • Validation: Pydantic 2.10+
  • LLM Integration: LiteLLM 1.40+
  • Document Processing: Docling 2.0+
  • Agent Framework: Camel AI 0.2.89
  • Task Framework: Harbor 0.1.45

Desktop Application

  • Framework: Electron 33
  • Packaging: Electron Builder
  • Platforms: macOS, Windows, Linux

Quick Start

Prerequisites

  • Node.js 18+
  • Python 3.12+
  • uv (Python package manager)
  • yarn (Node package manager)

Installation

# Clone repository
git clone https://github.com/yourusername/SeekingData.git
cd SeekingData

# Install frontend dependencies
yarn install

# Install backend dependencies
cd backend
uv venv .venv --python 3.12
source .venv/bin/activate  # macOS/Linux
# or .venv\Scripts\activate  # Windows
uv pip install -r requirements.txt

Development

# Terminal 1: Start backend
cd backend
source .venv/bin/activate
uvicorn main:app --reload --port 5001

# Terminal 2: Start frontend
yarn dev

Access the application at: http://localhost:3002

Production Build

# macOS
yarn build:mac

# Windows
yarn build:win

# Linux
yarn build:linux

Configuration

Backend Environment (backend/.env)

# LLM API Configuration
OPENAI_API_KEY=sk-xxx

# GitHub Token (optional, for GitHub task generation)
GITHUB_TOKEN=ghp_xxx

# Application
APP_NAME=SeekingData
APP_VERSION=0.1.0
DEBUG=true

Frontend Settings

Configure via the Settings page in the application:

  • API Base URL: LLM provider endpoint
  • API Key: Your secret API key
  • Model: Model identifier (e.g., qwen/qwen3.5-plus)
  • Suggestions Count: Number of suggestions per request (1-10)

Project Structure

SeekingData/
├── src/                    # React frontend
│   ├── components/
│   │   ├── sft/           # SFT data generation
│   │   ├── harbor/        # Harbor task management
│   │   ├── ui/            # Material Design 3 components
│   │   └── layout/        # Layout components
│   ├── lib/               # Utilities and stores
│   └── pages/             # Page components
├── backend/               # FastAPI backend
│   ├── agents/           # AI agents (GitHub, etc.)
│   ├── api/routes/       # API endpoints
│   ├── models/           # Pydantic models
│   ├── services/         # Business logic
│   └── tasks/            # Harbor task storage
├── electron/             # Electron main process
├── scripts/              # Build scripts
└── docs/                 # Documentation

API Endpoints

| Method | Endpoint | Description | |--------|----------|-------------| | GET | /api/sft/config | Get current configuration | | POST | /api/sft/config | Save configuration | | POST | /api/sft/generate | Generate SFT data | | POST | /api/sft/batch | Batch URL processing | | POST | /api/sft/convert | Format conversion | | GET | /api/harbor/tasks | List all tasks | | POST | /api/harbor/tasks | Create new task | | GET | /api/harbor/tasks/{id} | Get task details | | POST | /api/harbor/github/generate | Generate from GitHub |

Supported Models

The application supports any LiteLLM-compatible model:

| Provider | Model Examples | |----------|---------------| | OpenAI | gpt-4, gpt-4o, gpt-3.5-turbo | | Qwen | qwen/qwen3.5-plus, qwen/qwen-max | | Moonshot | moonshot/kimi-k2.5 | | Zhipu | zhipu/glm-5, zhipu/glm-4 | | MiniMax | minimax/MiniMax-M2.5 | | DeepSeek | openai/deepseek-v3.2 |

Documentation

Contributing

Contributions are welcome! Please read our contributing guidelines before submitting a pull request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments


<div align="center">

Made with ❤️ by SeekingX-AILab

</div>

Related Skills

View on GitHub
GitHub Stars196
CategoryDevelopment
Updated13d ago
Forks17

Languages

TypeScript

Security Score

85/100

Audited on Mar 21, 2026

No findings