GTFlow

End-to-end tool for Grounded Theory research, featuring Segmentation, Open Coding, Gioia Method, Axial (CAR) Coding, Selective Coding, Negative Case Analysis, Theoretical Saturation, and Reporting functionalities. Provides intuitive UI and is compatible with multiple APIs and local models.

Generate Convert Improve

Install / Use

/learn @zw-zhtlab/GTFlow

About this skill

Quality Score

0/100

README

GTFlow: End-to-end tool for Grounded Theory research

GTFlow turns raw qualitative text into theory-building artifacts. It ships with a Streamlit-based UI and a CLI, supports OpenAI‑protocol compatible endpoints, Azure OpenAI, and Anthropic, and tracks token usage and estimated costs.

Features
Quick Start
Installation
Configuration
CLI Usage
UI Usage
Inputs and Outputs
Reproducibility, Usage, and Cost
Roadmap
Contributing
License

Features

End-to-end pipeline: Segmentation, Open coding, Codebook building with Gioia view, Axial coding with CAR triples, Selective coding and storyline, Negative case scan, Saturation check, and Report generation.
Two interfaces: Streamlit UI for interactive work and a CLI for scripted, reproducible runs.
Provider compatibility: OpenAI‑protocol compatible, Azure OpenAI, and Anthropic. Gateways are supported if they expose an OpenAI‑compatible API.
Structured output first: Prompts prefer JSON with robust parsing and graceful fallbacks.
Reproducibility utilities: Run directory with intermediate artifacts, token usage and cost estimation, and a one‑command report.

Quick Start

1) Install

Python 3.9+ is required.

From PyPI (if published):

pip install -U gtflow
# or in an isolated tool environment
pipx install gtflow

From source:

git clone https://github.com/your-org/your-repo.git
cd your-repo
pip install -e .

2) Configure provider credentials

Choose one path below.

OpenAI‑compatible (OpenAI, compatible gateways, or Ollama’s /v1):

# minimally set your key (and optionally a custom base URL)
export OPENAI_API_KEY=sk-...
# for self-hosted or gateways
export OPENAI_BASE_URL=https://your-endpoint/v1
# optional organization header
export OPENAI_ORG_ID=org_...

Azure OpenAI:

export AZURE_OPENAI_API_KEY=...
export AZURE_OPENAI_ENDPOINT=https://YOUR-RESOURCE.openai.azure.com
export AZURE_OPENAI_DEPLOYMENT=YOUR-DEPLOYMENT
export AZURE_OPENAI_API_VERSION=2024-02-15-preview

Anthropic:

export ANTHROPIC_API_KEY=...

Tip: when using the YAML config below, either put real values or omit the api_key field. Avoid literal placeholders like ${OPENAI_API_KEY} in YAML because they will be treated as a string rather than expanded.

3) Run the UI or the pipeline

UI (recommended for first run)

gtflow-ui

Follow the left sidebar to set provider and run parameters, upload or paste your text, and export artifacts.

One‑command pipeline (CLI)

# Example data and config may live under examples/
gtflow run-all -i examples/data/sample_dialog.txt -c examples/config.example.yaml -o output
gtflow report -o output
# Open output/report.html in your browser

Installation

Python 3.9+
Linux, macOS, or Windows
Optional: Graphviz installed system‑wide if you plan to generate graph files

Dependencies are automatically installed via pip. See pyproject.toml or requirements.txt for versions.

Configuration

You can control providers and runtime behavior via a YAML file. A minimal example:

# config.yaml
provider:
  name: openai_compatible          # openai_compatible | openai | azure_openai | anthropic | ollama
  model: gpt-4o-mini               # change as needed
  output_language: English         # choose output language for model responses
  base_url: https://api.openai.com/v1
  use_responses_api: false         # true to try /v1/responses first
  structured: true                 # request JSON when supported
  max_tokens: 1024
  temperature: 0.2
  price_input_per_1k: 0.002
  price_output_per_1k: 0.006

run:
  segmentation_strategy: dialog    # dialog | paragraph | line
  max_segment_chars: 800
  batch_size: 10
  concurrent_workers: 6
  rate_limit_rps: 2.0
  retry_max: 3
  timeout_sec: 60
  max_prompt_chars: 200000         # soft ceiling per LLM call; sized for 128k context by default
  stream_open_coding: false        # set true to stream open-coding batches to JSONL instead of holding all in memory
  stream_open_coding_threshold: 2000  # segments >= this will stream even if the flag is false

output:
  out_dir: output
  save_graphviz: true
  log_file: analysis.log

Notes:

OpenAI‑compatible: if api_key is omitted in YAML, OPENAI_API_KEY is used automatically. OPENAI_BASE_URL overrides base_url at runtime.
Azure OpenAI: set endpoint, deployment, api_version, and api_key in YAML. The CLI does not read Azure env vars automatically.
Anthropic: set api_key in YAML or export ANTHROPIC_API_KEY and wire it in your own wrapper before creating the config.

Output language control

Set provider.output_language (default: English) to control the language used in LLM-generated text across the pipeline (codes, definitions, memos, theory/storyline, negatives, and the HTML report). This does not translate your source text or change the CLI/UI language; verbatim excerpts (e.g. in-vivo phrases) stay in the original language. JSON field names remain in English for schema stability.

Example:

provider:
  output_language: Chinese  # e.g., English | Chinese | Japanese | French

CLI Usage

gtflow exposes focused commands:

# 1) Segment an input file into analysis units
#    Supported inputs: .txt/.md (raw text), .jsonl, .csv
gtflow segment   -i data/interview_1.txt   -o output   --strategy dialog   --max-segment-chars 800

# 2) Run the entire pipeline using a YAML config
gtflow run-all   -i data/interview_1.txt   -c config.yaml   -o output   --force                  # optional, overwrite existing artifacts

# 3) Build a report from saved artifacts
gtflow report -o output

# General help
gtflow --help
gtflow run-all --help

What run-all produces under output/:

segments.json
open_codes.json
codebook.json
axial_triples.json
theory.json and theory.md
gioia.json
negatives.json
saturation.json
report.html
run_meta.json (token usage by stage and estimated cost)
For large runs, JSONL streams are also emitted: segments.jsonl, open_codes.jsonl (full results), plus open_codes.json keeps a sample for quick inspection.

Handling Large Inputs

Keep segment size small (run.max_segment_chars 400–800) so each model call stays within context.
Enable streaming for long corpora (run.stream_open_coding: true or lower the run.stream_open_coding_threshold) to write open codes directly to open_codes.jsonl instead of keeping all in memory.
Cap prompt size with run.max_prompt_chars (default 200k chars, tuned for 128k-context models). Lower it if you target smaller-context models; raise only if your provider guarantees a larger window.
JSONL artifacts (segments.jsonl, open_codes.jsonl) let you resume or post-process without re-running the whole pipeline.

UI Usage

Launch:

gtflow-ui

The dashboard lets you:

Configure the provider (OpenAI‑compatible, Azure OpenAI, Anthropic), model, temperature, and token limits.
Choose segmentation strategy and batching.
Run open coding, build a codebook, generate axial CAR triples, derive a core category and storyline, scan negatives, approximate saturation.
Download a ZIP of artifacts and an HTML report.

Inputs and Outputs

Typical inputs

Plain text files (.txt, .md).
JSONL with one record per line. Suggested keys: id, text, optional speaker, optional meta.
CSV with columns: id, text, optional speaker and meta.

Key outputs

segments.json: segmented units used for analysis.
open_codes.json: initial codes per segment.
codebook.json: first‑order codes, definitions, and Gioia groupings.
axial_triples.json: CAR triples with short evidence spans.
theory.json and theory.md: core category and storyline.
gioia.json: compact Gioia view used by the report.
negatives.json: candidate negative cases from the corpus.
saturation.json: sliding‑window estimate of new‑code discovery.
report.html: consolidated visual report with a Gioia panel and a Mermaid diagram for CAR relations.
run_meta.json: per‑stage token counts and estimated cost.

Reproducibility, Usage, and Cost

Each run writes a structured set of artifacts to the output directory so you can rerun, diff, and audit results.
The CLI prints a Token Usage by Stage table and writes run_meta.json with input tokens, output tokens, totals, and an estimated cost using your configured price_input_per_1k and price_output_per_1k.
For ethics and privacy, ensure consent for any interview or sensitive text and follow your IRB or organizational guidelines.

Roadmap

Batch editing and alignment in the Gioia panel.
Visualizations for negative cases and participant‑level contrasts.
Multiple saturation metrics in parallel.
Configurable report templates for methods, results, and appendices.
Project‑level comparison and merging utilities.

Contributing

Contributions are welcome. A suggested process:

Open an issue to discuss the proposal.
Fork the repo and create a feature branch.
Add tests or local checks where reasonable.
Submit a PR describing motivation and changes.

License

This project is licensed under the MIT License.

Related Skills

YC-Killer

2.7k

A library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.

best-practices-researcher

The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app

groundhog

400

Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).

last30days-skill

19.9k

AI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary

zw-zhtlab

View profile

View on GitHub

GitHub Stars54

CategoryEducation

Updated1mo ago

Forks3

zw-zhtlab/GTFlow

Languages

Python

Security Score

95/100

Audited on Feb 24, 2026

No findings