Exstruct
Conversion from Excel to structured JSON (tables, shapes, charts) for LLM/RAG pipelines, and autonomous Excel reading/writing by AI agents via CLI and MCP integration.
Install / Use
/learn @harumiWeb/ExstructQuality Score
Category
Development & EngineeringSupported Platforms
README
ExStruct — Excel Structured Extraction Engine
ExStruct reads Excel workbooks into structured data and applies patch-based editing workflows through a shared core. It provides extraction APIs, a JSON-first editing CLI, and an MCP server for host-managed integrations, with options tuned for LLM/RAG preprocessing, reviewable edit flows, and local automation.
- In COM/Excel environments (Windows), it performs rich extraction.
- In non-COM environments (Linux/macOS):
- if the LibreOffice runtime is available, it performs best-effort extraction for cells, table candidates, shapes, connectors, and charts
- otherwise, it safely falls back to cells + table candidates + print areas
Detection heuristics, editing workflows, and output modes are adjustable for LLM/RAG pipelines and local automation.
Choose an Interface
| Use case | Recommended interface | Why |
| --- | --- | --- |
| Write direct Python Excel-editing code | openpyxl / xlwings | Usually the better fit for imperative Python editing. Reach for exstruct.edit only when you specifically want ExStruct's patch contract in Python. |
| Run local operator or AI-agent edit workflows | exstruct patch, make, ops, validate | Canonical operational interface; JSON-first and dry-run friendly. |
| Run sandboxed or host-managed integrations | exstruct-mcp / MCP tools | Integration / compatibility layer that owns PathPolicy, transport, and artifact behavior. |
Extraction keeps the existing top-level Python API (extract, process_excel,
ExStructEngine) and the legacy exstruct INPUT.xlsx ... CLI entrypoint.
Main Features
- Excel -> structured JSON: outputs cells, shapes, charts, SmartArt, table candidates, merged-cell ranges, print areas, and auto page-break areas by sheet or by area.
- Output modes:
light(cells + table candidates + print areas only),libreoffice(best-effort non-COM mode for.xlsx/.xlsm; adds merged cells, shapes, connectors, and charts when the LibreOffice runtime is available),standard(Excel COM mode with texted shapes + arrows, charts, SmartArt, and merged-cell ranges),verbose(all shapes with width/height plus cell hyperlinks). - Formula extraction: emits
formulas_map(formula string -> cell coordinates) via openpyxl/COM. It is enabled by default inverboseand can be controlled withinclude_formulas_map. - Formats: JSON (compact by default,
--prettyfor formatting), YAML, and TOON (optional dependencies). - Backend metadata is opt-in: shape/chart
provenance,approximation_level, andconfidenceare omitted from serialized output by default. Enable them with--include-backend-metadataorinclude_backend_metadata=True. - Workbook editing interfaces: use the editing CLI for primary ExStruct edit flows, keep MCP for host-owned safety controls, and use
exstruct.editonly when you need the same patch contract from Python. - Table detection tuning: heuristics can be adjusted dynamically through the API.
- Hyperlink extraction: in
verbosemode, or withinclude_cell_links=True, cell links are emitted inlinks. - CLI rendering: in
standard/verbose, PDF and sheet images can be generated when Excel COM is available. - Safe fallback: if Excel COM or the LibreOffice runtime is unavailable, the process does not crash and falls back to cells + table candidates + print areas.
Installation
pip install exstruct
Optional extras:
- YAML:
pip install pyyaml - TOON:
pip install python-toon - Rendering (PDF/PNG): Excel +
pip install pypdfium2 pillow(mode=libreofficeis not supported) - Install everything at once:
pip install exstruct[yaml,toon,render]
Platform note:
- Full COM extraction for shapes/charts targets Windows + Excel (xlwings/COM). On Linux/macOS/server environments, use
mode=libreofficeas the best-effort rich mode ormode=lightfor minimal extraction..xlsis not supported inmode=libreoffice. - On Debian/Ubuntu/WSL, install LibreOffice together with
python3-uno. ExStruct probes a compatible system Python automatically formode=libreoffice; if your environment needs an explicit interpreter, setEXSTRUCT_LIBREOFFICE_PYTHON_PATH=/usr/bin/python3. - LibreOffice Python detection now runs the bundled bridge in
--probemode before selection. An incompatibleEXSTRUCT_LIBREOFFICE_PYTHON_PATHfails fast instead of surfacing a delayed bridgeSyntaxErrorduring extraction. - If the isolated temporary LibreOffice profile fails before the UNO socket becomes ready, ExStruct retries once with the shared/default LibreOffice profile as a compatibility fallback and reports per-attempt startup detail if both launches fail.
- GitHub Actions includes dedicated LibreOffice smoke jobs on
ubuntu-24.04andwindows-2025. Linux installslibreoffice+python3-uno; Windows installslibreoffice-fresh, setsEXSTRUCT_LIBREOFFICE_PATH, and both jobs runtests/core/test_libreoffice_smoke.pywithRUN_LIBREOFFICE_SMOKE=1.
Quick Start CLI
exstruct input.xlsx > output.json # compact JSON to stdout by default
exstruct input.xlsx -o out.json --pretty # write pretty JSON to a file
exstruct input.xlsx --format yaml # YAML (requires pyyaml)
exstruct input.xlsx --format toon # TOON (requires python-toon)
exstruct input.xlsx --sheets-dir sheets/ # write one file per sheet
exstruct input.xlsx --auto-page-breaks-dir auto_areas/ # always shown; execution requires standard/verbose + Excel COM
exstruct input.xlsx --alpha-col # output column keys as A, B, ..., AA
exstruct input.xlsx --include-backend-metadata # include shape/chart backend metadata
exstruct input.xlsx --mode light # cells + table candidates only
exstruct input.xlsx --mode libreoffice # best-effort extraction of shapes/connectors/charts without COM
exstruct input.xlsx --pdf --image # PDF and PNGs (Excel COM required)
Auto page-break export is available from both the API and the CLI when Excel/COM is available. The CLI always exposes --auto-page-breaks-dir, but validates it at execution time.
mode=libreoffice rejects --pdf, --image, and --auto-page-breaks-dir early, and mode=light also rejects --auto-page-breaks-dir. Use standard or verbose with Excel COM for those features.
By default, the CLI keeps legacy 0-based numeric string column keys ("0", "1", ...). Use --alpha-col when you need Excel-style keys ("A", "B", ...).
By default, serialized shape/chart output omits backend metadata (provenance, approximation_level, confidence) to reduce token usage. Use --include-backend-metadata or the corresponding Python/MCP option when you need it.
Note: MCP exstruct_extract defaults to options.alpha_col=true, which differs from the CLI default (false).
Quick Start Editing CLI
exstruct patch --input book.xlsx --ops ops.json --backend openpyxl
exstruct patch --input book.xlsx --ops - --dry-run --pretty < ops.json
exstruct make --output new.xlsx --ops ops.json --backend openpyxl
exstruct ops list
exstruct ops describe create_chart --pretty
exstruct validate --input book.xlsx --pretty
patchandmakeprint JSONPatchResultto stdout.- This is the canonical operational / agent interface for workbook editing.
ops list/ops describeexpose the public patch-op schema.validatereports workbook readability (is_readable,warnings,errors).- Phase 2 keeps the legacy extraction CLI unchanged; it does not add
exstruct extractor interactive safety flags yet.
Recommended edit flow:
- Build patch ops.
- Run
exstruct patch --dry-runand inspectPatchResult, warnings, and diff. - Pin
--backend openpyxlwhen you want the dry run and the real apply to use the same engine. - If you keep
--backend auto, inspectPatchResult.engine; on Windows/Excel hosts the real apply may switch to COM. - Re-run without
--dry-runonly after the result is acceptable.
ExStruct CLI Skill
ExStruct also ships one repo-owned Skill for agents that should follow the editing CLI safely instead of rediscovering the workflow each time.
Canonical repo source:
.agents/skills/exstruct-cli/
You can install it with the following single command:
npx skills add harumiWeb/exstruct/.agents/skills --skill exstruct-cli
That command should install exstruct-cli directly from this repository's
Related Skills
node-connect
343.1kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
90.0kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
343.1kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
343.1kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
