SciTeX (<code>scitex</code>)

<a href="https://scitex.ai"> <img src="docs/assets/images/scitex-logo-blue-cropped.png" alt="SciTeX" width="400"> </a> Modular Python Toolkit for Scientific Research Automation <a href="https://badge.fury.io/py/scitex"><img src="https://badge.fury.io/py/scitex.svg" alt="PyPI version"></a> <a href="https://pypi.org/project/scitex/"><img src="https://img.shields.io/pypi/pyversions/scitex.svg" alt="Python Versions"></a> <a href="https://scitex-python.readthedocs.io"><img src="https://readthedocs.org/projects/scitex-python/badge/?version=latest" alt="Documentation"></a> <a href="https://github.com/ywatanabe1989/scitex-python/blob/main/LICENSE"><img src="https://img.shields.io/github/license/ywatanabe1989/scitex-python" alt="License"></a> <a href="https://scitex-python.readthedocs.io">Docs</a> · <a href="https://scitex-python.readthedocs.io/en/latest/quickstart.html">Quick Start</a> · <a href="https://scitex-python.readthedocs.io/en/latest/api/index.html">API</a> · <code>pip install scitex[all]</code>

This repository provides scitex, the orchestration layer of the SciTeX ecosystem — solving key problems in scientific research:

Problem and Solution

| # | Problem | Solution | |---|---------|----------| | 1 | Fragmented tools -- literature search, statistics, figures, and writing each require separate tools with incompatible formats | Unified toolkit -- import scitex as stx provides 50+ modules under one namespace, accessible via Python API, CLI, and MCP | | 2 | No verification -- existing tools address whether work could be reproduced, not whether it has been verified | Cryptographic verification -- Clew builds SHA-256 hash-chain DAGs linking every manuscript claim back to source data | | 3 | AI agents lack context -- general-purpose LLMs cannot operate across the full research lifecycle without domain-specific tools | 293 MCP tools -- AI agents run statistics, create figures, search literature, and compile manuscripts through structured tool calls | | 4 | No custom tooling -- every lab needs domain-specific tools, but building and sharing them requires deep infrastructure knowledge | App Maker and Store -- researchers create custom apps with scitex-app SDK and share via SciTeX Cloud |

Research Workflow

<img src="scripts/assets/workflow_out/workflow.png" alt="SciTeX Research Workflow" width="600"> Figure 1. SciTeX research pipeline -- from literature search to manuscript compilation, with every step cryptographically linked.

Demo

40 min, minimal human intervention — an AI agent using SciTeX completed a full research cycle: literature search, statistical analysis, publication-ready figures, a 21-page manuscript, and peer review simulation.

Installation

pip install scitex[all]                # Recommended: everything

<details> <summary>Per-module extras</summary>

pip install scitex                     # Core only (minimal)
pip install scitex[plt,stats,scholar]  # Typical research setup
pip install scitex[plt]                # Publication-ready figures (figrecipe)
pip install scitex[stats]              # Statistical testing (23+ tests)
pip install scitex[scholar]            # Literature search, PDF download, BibTeX enrichment
pip install scitex[writer]             # LaTeX manuscript compilation
pip install scitex[audio]              # Text-to-speech
pip install scitex[ai]                 # LLM APIs (OpenAI, Anthropic, Google) + ML tools
pip install scitex[dataset]            # Scientific datasets (DANDI, OpenNeuro, PhysioNet)
pip install scitex[browser]            # Web automation (Playwright)
pip install scitex[capture]            # Screenshot capture and monitoring
pip install scitex[cloud]              # Cloud platform integration

Requires Python 3.10+. We recommend uv for fast installs.

</details> <details> <summary>Module Overview</summary>

| Category | Modules | Description | |----------|---------|-------------| | Core | session, io, config, clew | Experiment tracking, file I/O, config, cryptographic verification | | Analysis | stats, plt, dsp, linalg | Statistics, plotting, signal processing, linear algebra | | Research | scholar, writer, diagram, canvas | Literature, manuscripts, diagrams, figure composition | | ML/AI | ai, nn, torch, cv, benchmark | LLM APIs, neural networks, PyTorch, computer vision | | Data | pd, db, dataset, schema | Pandas utilities, databases, scientific datasets | | Infra | app, cloud, tunnel, container | App SDK, cloud, SSH tunnels, containers | | Automation | browser, capture, audio, notification | Web automation, screenshots, TTS, notifications | | Dev | dev, template, linter, introspect | Ecosystem tools, scaffolding, code analysis |

</details>

Quick Start

<details> <summary><code>@scitex.session</code> -- Reproducible Experiment Tracking</summary>

One decorator gives you: auto-CLI, YAML config injection, random seed fixation, structured output, and logging.

import scitex as stx
import numpy as np

@stx.session
def main(
    data_path: str = "./data.csv",   # --data-path data.csv
    n_samples: int = 100,            # --n-samples 200
    CONFIG=stx.session.INJECTED,     # Aggregated ./config/*.yaml
    plt=stx.session.INJECTED,        # Pre-configured matplotlib
    logger=stx.session.INJECTED,     # Session logger
):
    """Analyze data. Docstring becomes --help text."""
    
    # Load
    data = stx.io.load(data_path)
    
    # Demo data
    x = np.linspace(0, 2 * np.pi, n_samples)
    y = np.sin(x) + np.random.randn(n_samples) * 0.1
    
    # FigRecipe Plot
    fig, ax = stx.plt.subplots()
    ax.plot(x, y)
    ax.set_xyt("Time", "Amplitude", "Noisy Sine Wave")
    
    # Save sine.png + sine.csv with logging message
    stx.io.save(fig, "sine.png")
    
    return 0

if __name__ == "__main__":
    main()

$ python script.py --data-path experiment.csv --n-samples 200
$ python script.py --help
# usage: script.py [-h] [--data-path DATA_PATH] [--n-samples N_SAMPLES]
# Analyze data. Docstring becomes --help text.

script_out/FINISHED_SUCCESS/2026-03-18_14-30-00_Z5MR/
├── sine.png, sine.csv         # Figure + auto-exported plot data
├── CONFIGS/CONFIG.yaml        # Frozen parameters
└── logs/{stdout,stderr}.log   # Execution logs

</details> <details> <summary><code>scitex.clew</code> -- Cryptographic Verification for AI-Driven Science</summary>

As AI agents produce research at scale, the question shifts from "could this be reproduced?" to "has this been verified?". Clew builds a SHA-256 hash-chain DAG linking every manuscript claim back to source data.

import scitex as stx

# Every stx.io.load/save automatically records file hashes -- zero config
stx.clew.status()                          # {'verified': 12, 'mismatched': 0, 'missing': 0}
stx.clew.chain("results/figure1.png")      # Trace one file back to source data
stx.clew.dag(claims=True)                  # Verify all manuscript claims

# Register traceable assertions
stx.clew.add_claim(
    file_path="paper/main.tex", claim_type="statistic", line_number=142,
    claim_value="t(58) = 2.34, p = .021",
    source_session="2026-03-18_14-30-00_Z5MR", source_file="results/stats.csv",
)

stx.clew.mermaid(claims=True)              # Visualize provenance DAG

| Mode | Function | Answers | |------|----------|---------| | Project | clew.dag() | Is the whole project intact? | | File | clew.chain("output.csv") | Can I trust this specific file? | | Claim | clew.verify_claim("Fig 1") | Is this manuscript assertion valid? |

L1 hash comparison (ms) / L2 sandbox re-execution (min) / L3 registered timestamp proof (optional).

<img src="docs/clew-dag.png" alt="Clew DAG" width="300"> Figure 2. Clew verification DAG -- green nodes are verified (hash match), red nodes have mismatches. Each node shows its SHA-256 hash prefix. </details> <details> <summary><code>scitex.io</code> -- Unified File I/O (50+ Formats)</summary>

import scitex as stx

# Save and load -- format detected from extension
stx.io.save(df, "results.csv")
df = stx.io.load("results.csv")

stx.io.save(arr, "data.npy")
arr = stx.io.load("data.npy")

stx.io.save(fig, "figure.png")       # Also exports figure data as CSV
stx.io.save(config, "config.yaml")
stx.io.save(model, "model.pkl")

# Aggregate ./config/*.yaml into a single DotDict
CONFIG = stx.io.load_configs(config_dir="./config")
print(CONFIG.MODEL.hidden_size)      # Dot-notation access

# Register custom formats
@stx.io.register_saver(".custom")
def save_custom(obj, path, **kw):
    with open(path, "w") as f:
        f.write(str(obj))

@stx.io.register_loader(".custom")
def load_custom(path, **kw):
    with open(path) as f:
        return f.read()

Supports: CSV, JSON, YAML, TOML, HDF5, NPY, NPZ, PKL, PNG, JPG, SVG, PDF, Excel, Parquet, Zarr, INI, TXT, MAT, WAV, MP3, BibTeX, and more.

Built-in features: Auto directory creation, path resolution to <script_name>_out/, symlinks (symlink_from_cwd=True), save logging with

Scitex Python

Install / Use

README