Txt2stix
Extracts IoCs, TTPs and the relationships between them. Outputs a STIX 2.1 bundle.
Install / Use
/learn @muchdogesec/Txt2stixREADME
txt2stix
Before you begin...
We have build two products on-top of txt2stix that provide more user-friendly experience:
- Stixify: Extract machine readable cyber threat intelligence from unstructured data
- Obstracts: Turn any blog into structured threat intelligence
Overview

txt2stix is a Python script that is designed to identify and extract IoCs and TTPs from text files, identify the relationships between them, convert them to STIX 2.1 objects, and output as a STIX 2.1 bundle.
The general design goal of txt2stix was to keep it flexible, but simple, so that new extractions could be added or modified over time.
In short txt2stix;
- takes a txt file input
- extracts observables for enabled extractions (ai, pattern, or lookup)
- converts extracted observables to STIX 2.1 objects
- generates the relationships between extracted observables (ai, standard)
- converts extracted relationships to STIX 2.1 SRO objects
- outputs a STIX 2.1 bundle
Usage
Setup
Install the required dependencies using:
# clone the latest code
git clone https://github.com/muchdogesec/txt2stix
cd txt2stix
# create a venv
python3 -m venv txt2stix-venv
source txt2stix-venv/bin/activate
# install requirements
pip3 install txt2stix
Note, by default txt2stix will install OpenAI to use as the AI provider. You can also use Anthropic, Gemini, OpenRouter or Deepseek. You need to install these manually if you plan to use them as follows (remove those that don't apply)
pip3 install txt2stix[deepseek,gemini,anthropic,openrouter]
Set variables
txt2stix has various settings that are defined in an .env file.
To create a template for the file:
cp .env.example .env
To see more information about how to set the variables, and what they do, read the .env.markdown file.
Then test your configoration
python3 txt2stix.py \
--check-credentials
It will return a response to show what API keys are working
============= Service Statuses ===============
ctibutler : authorized ✔
vulmatch : authorized ✔
binlist : authorized ✔
LLMS:
openai : authorized ✔
deepseek : unsupported –
gemini : unsupported –
openrouter : unsupported –
anthropic : unsupported –
Not all services need to be configured, if you have no intention of using them.
Usage
python3 txt2stix.py \
--relationship_mode MODE \
--input_file FILE.txt \
...
The following arguments are available:
Input settings
--input_file(path/to/file.txt, required): the file to be converted. Must be.txt
STIX Report generation settings
--name(text, required): name of file, max 72 chars. Will be used in the STIX Report Object created.--report_id(UUIDv4, default is random UUIDv4): Sometimes it is required to control the id of thereportobject generated. You can therefore pass a valid UUIDv4 in this field to be assigned to the report. e.g. passing2611965-930e-43db-8b95-30a1e119d7e2would create a STIX object idreport--2611965-930e-43db-8b95-30a1e119d7e2. If this argument is not passed, the UUID will be randomly generated.--tlp_level(dictionary, default,clear): Options areclear,green,amber,amber_strict,red.--confidence(value between 0-100): If not passed, report will be assigned no confidence score value--labels(OPTIONAL): comma seperated list of labels. Case-insensitive (will all be converted to lower-case). Alloweda-z,0-9. e.g.label1,label2would create 2 labels.--created(datetime, optional): by default all objectcreatedtimes will take the time the script was run. If you want to explicitly set these times you can do so using this flag. Pass the value in the formatYYYY-MM-DDTHH:MM:SS.sssZe.g.2020-01-01T00:00:00.000Z--use_identity(stix identity, optional, default txt2stix identity): can pass a full STIX 2.1 identity object (make sure to properly escape). Will be validated by the STIX2 library.--external_refs(OPTIONAL): txt2stix will automatically populate theexternal_referencesof the report object it creates for the input. You can use this value to add additional objects toexternal_references. Note, you can only addsource_nameandexternal_idvalues currently. Pass assource_name=external_id. e.g.--external_refs txt2stix=demo1 source=idwould create the following objects under theexternal_referencesproperty:{"source_name":"txt2stix","external_id":"demo1"},{"source_name":"source","external_id":"id"}
Output settings
How the extractions are performed
--use_extractions(dictionary, required): if you only want to use certain extraction types, you can pass their slug found in eitherincludes/ai/config.yaml,includes/lookup/config.yamlincludes/pattern/config.yaml(e.g.pattern_ipv4_address_only). Default if not passed, no extractions applied. You can also pass a catch all wildcard*which will match all extraction paths (e.g.'pattern_*'would run all extractions starting withpattern_-- make sure to use quotes when using a wildcard)- Important: if using any AI extractions (
ai_*), you must set an AI API key in your.envfile - Important: if you are using any MITRE ATT&CK, CAPEC, CWE, ATLAS or Location extractions you must set
CTIBUTLERor NVD CPE or CVE extractions you must setVULMATCHsettings in your.envfile
- Important: if using any AI extractions (
--relationship_mode(dictionary, required): either.ai: AI provider must be enabled. extractions performed by either regex or AI for extractions user selected. Rich relationships created from AI provider from extractions.standard: extractions performed by either regex or AI (AI provider must be enabled) for extractions user selected. Basic relationships created from extractions back to master Report object generated.
--ignore_extraction_boundary(boolean, defaultfalse, not compatible with AI extractions): in some cases the same string will create multiple extractions depending on extractions set (e.g.https://www.google.com/file.txtcould create a url, url with file, domain, subdomain, and file). The default behaviour is for txt2stix to take the longest extraction and ignore everything else (e.g. only extract url with file, and ignore url, file, domain, subdomain, and file). If you want to override this behaviour and get all extractions in the output, set this flag totrue.--ignore_image_refs(boolean, defaulttrue): images references in documents don't usually need extracting. e.g.<img src="https://example.com/image.png" alt="something">you would not want domain or file extractions extractingexample.comandimage.png. Hence these are ignored by default (they are removed from text sent to extraction). Note, only theimg srcis ignored, all other values e.g.altare considered. If you want extractions to consider this data, set it tofalse--ignore_link_refs(boolean, defaulttrue): link references in documents don't usually need extracting e.g.<a href="https://example.com/link.html" title="something">Bad Actor</a>you would only wantBad actorto be considered for extraction. Hence these part of the link are ignored by default (they are removed from text sent to extraction). Note, only thea hrefis ignored, all other values e.g.titleare considered. Setting this tofalsewill also include everything inside the link tag (e.g.example.comwould extract as a domain)
AI settings
If any AI extractions, or AI relationship mode is set, you must set the following accordingly
--ai_settings_extractions(model:provider, required if one or more AI extractions set):- defines the
provider:modelto be used for extractions. You can supply more than one provider. Seperate with a space (e.g.openrouter:openai/gpt-4oopenrouter:deepseek/deepseek-chat) If more than one provider passed, txt2stix will take extractions from all models, de-dupelicate them, and them package them in the output. Currently supports:- Provider (env var required
OPENROUTER_API_KEY):openrouter:, providers/modelsopenai/gpt-4o,deepseek/deepseek-chat(More here) - Provider (env var required
OPENAI_API_KEY):openai:, models e.g.:gpt-4o,gpt-4o-mini,gpt-4-turbo,gpt-4(More here) - Provider (env var required
ANTHROPIC_API_KEY):anthropic:, models e.g.:claude-3-5-sonnet-latest,claude-3-5-haiku-latest,claude-3-opus-latest(More here) - Provider (env var required
GOOGLE_API_KEY):gemini:models/, models:gemini-1.5-pro-latest,gemini-1.5-flash-latest(More here) - Provider (env var required
DEEPSEEK_API_KEY):deepseek:, modelsdeepseek-chat(More here)
- Provider (env var required
- See
tests/manual-tests/cases-ai-extraction-type.mdfor some examples
- defines the
--ai_settings_relationships(model:provider, required if AI relationship mode set):- similar to
ai_settings_extractionsbut defines the model used to generate relationships. Only one model can be provided. Passed in same format asai_settings_extractions - See
tests/manual-tests/cases-ai-relationships.mdfor some examples
- similar to
Other AI related settings
--ai_content_check_provider(model:provider, required if passed): Passing this flag will get the AI to try and classify the text in the input to 1) determine if it is talking about threat intelligence, and 2) what type of threat intelligence it is talking about
Related Skills
node-connect
343.3kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
92.1kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
343.3kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
343.3kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
