SkillAgentSearch skills...

DocxBox

CLI tool for Word DOCX templating and analysis

Install / Use

/learn @gyselroth/DocxBox

README

Build Status CodeFactor

docxBox

CLI tool for Word DOCX templating and analysis.

Table of contents

Commands

Unzip DOCX: All files, or only media files, format XML

Unzip all files: docxbox uz foo.docx

Unzip only media files:
docxbox uzm foo.docx or docxbox uz foo.docx -m
or docxbox uz foo.docx --media

Unzip all files and indent XML files:
docxbox uzi foo.docx
or docxbox uz foo.docx -i
or docxbox uz foo.docx --indent

Zip files into DOCX

docxbox zp path/to/directory out.docx

Compress XML, than zip files into DOCX:

When having indented XML (i.e. via uzi command) for manual manipulation, the zpc command compresses (= unindents) all XML files before zipping them into a new DOCX:

docxbox zpc path/to/directory out.docx

Output DOCX contents

List files: All, filtered, images only

Lists files contained within a given DOCX, and their attributes:

docxbox ls foo.docx

List contents of DOCX archive

To output as JSON:

docxbox lsj foo.docx
or docxbox ls foo.docx -j
or docxbox ls foo.docx --json

Filter by wildcard

docxbox ls foo.docx *.xml Lists only files ending w/ .xml

List all files containing string or regular expression

docxbox lsl foo.docx foo Lists all files containing the string foo
or docxbox ls foo.docx -l foo
or docxbox ls foo.docx --locate foo

This command is a shorthand to the grep tool (must be installed on your system when using this command).
The search-string therefor can also be given as a regular expression:

docxbox lsl foo.docx '/[0-9A-Z]\{8\}/'
Lists all files containing 8-digit IDs, e.g. word recent session IDs (ISO/IEC 29500-1).

List all files containing string or regular expression as JSON

docxbox lslj foo.docx foo
or docxbox lsl foo.docx -j foo
or docxbox ls foo.docx -lj foo
or docxbox lsl foo.docx --json foo
or docxbox ls foo.docx --locate -j foo
or docxbox ls foo.docx --locate --json foo

List image files

Output list of contained images and their media attributes (like width, height, encoding, compression, etc.)

docxbox lsi foo.docx
or docxbox ls foo.docx -i
or docxbox ls foo.docx --images

To output as JSON:

docxbox lsij foo.docx
or docxbox lsi foo.docx -j
or docxbox ls foo.docx -ij
or docxbox lsi foo.docx --json
or docxbox ls foo.docx --images --json

Note: Media attributes are read using the file command, which must be installed on your system (but usually should be already) when using docxBox's lsi command.

List meta data

docxBox displays only attributes that are contained within the current DOCX file (the attributes can vary by DOCX version and word processor used for creation), also if given empty.

Output meta data of given DOCX:

docxbox lsm foo.docx
or docxbox ls foo.docx -m
or docxbox ls foo.docx --meta

List document meta data

To output as JSON:

docxbox lsmj foo.docx
or docxbox lsm foo.docx -j
or docxbox ls foo.docx -mj
or docxbox lsm foo.docx --json
or docxbox ls foo.docx --meta --json

Reference: Recognized meta attributes

  • Authors: Creator, lastModifiedBy (<dc:creator> and <cp:lastModifiedBy> of docProps/core.xml)
  • Dates (ISO 8601): Creation-, modification and print-date
    (<dcterms:created> and <cp:modified> and <cp:lastPrinted> of docProps/core.xml)
  • Descriptions: Description, Keywords, Subject, Title
    (<dc:description>, <dc:keywords>, <dc:subject>, <dc:title> of docProps/core.xml)
  • Language (<dc:language> of docProps/core.xml)
  • Revision (<cp:revision> of docProps/core.xml)
  • Application created with and its version, name of used template, company, XML schema of document (<Application>, <AppVersion>, <Template>, <Properties xmlns ... and <Company> of docProps/app.xml)

List referenced fonts

docxbox lsf foo.docx
or docxbox ls foo.docx -f
or docxbox ls foo.docx --fonts

List referenced fonts

To output as JSON:

docxbox lsfj foo.docx
or docxbox lsf foo.docx -j or docxbox ls foo.docx -fj
or docxbox lsf foo.docx --json
or docxbox ls foo.docx --fonts --json

List fields

docxbox lsd foo.docx
or docxbox ls foo.docx -d
or docxbox ls foo.docx --fields

To output as JSON:

docxbox lsdj foo.docx
or docxbox ls foo.docx -dj
or docxbox lsd foo.docx --json
or docxbox ls foo.docx --fields --json

Output XML

docxbox cat foo.docx word/_rels/document.xml.rels
outputs the given file's XML, indented for better readability.

Hint: For viewing or editing complex XML, e.g. with syntax highlightning, you can use your favorite text editor via the cmd command

Output document as plaintext

docxbox txt foo.docx outputs the given document's plaintext (ATM: w/o header and footer)

Output plaintext segments:
docxbox txt foo.docx -s
or docxbox txt foo.docx --segments

Outputs the plaintext from document, with markup sections separated by newlines. This can be helpful to identify "segmented" sentences: Texts which visually appear as a unit, but are declared within multiple separate XML elements (due to formatting or change-tracking purposes).

Compare DOCX documents

docxBox helps tracing changes to the files contained within DOCX archives, made when manipulating documents in word processor applications.

When given two DOCX files, the ls command lists all files of both DOCX documents side-by-side. docxBox compares all files and highlights files w/ different attributes or (identical attributes but) different content.

docxbox ls foo_v1.docx foo_v2.docx

Note: Comparisons are always output as plaintext, JSON output is not supported.

Compare two documents

Compare specific file from two DOCX archives

Files that have changed between versions of a document, can be inspected using the diff tool (which must be installed on your system).

Display side-by-side comparison of the formatted XML of given file (word/settings.xml), with differences indicated:
docxbox diff foo_v1.docx foo_v2.docx word/settings.xml

Display unified diff: docxbox diff foo_v1.docx foo_v2.docx word/settings.xml -u
or: docxbox diff foo_v1.docx foo_v2.docx word/settings.xml --unified

Compare file from two documents

Modify document

Modify meta data

docxBox allows to modify existing attributes, or adds attributes if not present.

  • Set *
View on GitHub
GitHub Stars30
CategoryDevelopment
Updated23d ago
Forks10

Languages

C++

Security Score

95/100

Audited on Mar 10, 2026

No findings