TranslatePPTX
Extract text strings from PowerPoint (.pptx) files, edit/translate them in your favorite text editor, then replace in .pptx file.
Install / Use
/learn @HomerReid/TranslatePPTXREADME
TranslatePPTX
TranslatePPTX is a very simple code based on the
Apache POI package
for facilitating editing or translation of Powerpoint
presentations in .pptx format.
Using TranslatePPTX to edit or translate a Powerpoint file is a three-step process:
-
You do a first run of TranslatePPTX on your
.pptxfile in text extraction mode. TranslatePPTX extracts all text strings on all slides in the presentation (including all text boxes, all text in graphs, etc.) and writes these out, with some identifying labeling info, to a plain text file. -
You edit this file in your favorite text editor, modifying any text strings that you want to edit or translate and deleting all the rest.
-
Finally, you pass your edited list of text strings as a command-line argument to a second run of TranslatePPTX now running in text replacement mode; this time the code replaces all the strings you edited with your edited versions and writes out a new
.pptxfile reflecting your edits.
A tutorial example
Here's an example involving a presentation called
MyDeck.pptx,
which contains two slides and has a couple of text boxes
and a table:
Step 1: Extract text strings
% TranslatePPTX.sh MyDeck.pptx
Wrote 8 text strings to MyDeck.text
(9 text runs, 9 table entries)
Thank you for your support.
This produces a file named MyDeck.text,
which looks like this:
--------------------------------------------------
## Slide 1: Homer’s Deck
--------------------------------------------------
TEXT_STRING 2 0
==================================================
This is a text box.
==================================================
TEXT_STRING 3 0
==================================================
Huge red Courier font and comic italics
==================================================
TEXT_STRING 3 1
==================================================
Huge red Courier
==================================================
TEXT_STRING 3 2
==================================================
font
==================================================
TEXT_STRING 3 3
==================================================
and comic italics
==================================================
TEXT_STRING 5 0
==================================================
これは日本語である。
==================================================
TEXT_STRING 4 0
==================================================
我是中文!
==================================================
--------------------------------------------------
## Slide 2: Slide with graph
--------------------------------------------------
TEXT_STRING 7 0 (table)
==================================================
Name
==================================================
TEXT_STRING 8 0 (table)
==================================================
Lifespan
==================================================
TEXT_STRING 9 0 (table)
==================================================
Masterpiece
==================================================
TEXT_STRING 10 0 (table)
==================================================
Bach
==================================================
TEXT_STRING 11 0 (table)
==================================================
1685-1750
==================================================
TEXT_STRING 12 0 (table)
==================================================
BWV 140, Wachet auf, ruft uns die Stimme
==================================================
TEXT_STRING 13 0 (table)
==================================================
Mozart
==================================================
TEXT_STRING 14 0 (table)
==================================================
1756-1791
==================================================
TEXT_STRING 15 0 (table)
==================================================
K. 452, Quintet in E flat Major for Piano, Winds and Brass
==================================================
TEXT_STRING 16 0
==================================================
Slide with graph
==================================================
Note the following points here:
-
Each text string appears between two separator strings comprised of equals signs (
======.....=====), prepended by an identifier string of the formTEXT_STRING M NwhereMandNare integers. -
Some text strings seem to be appearing twice; for example, the text "Huge red Courier font and comic italics" appears
-
once in full with indices
(M,N)=(3,0),and -
again split into three pieces with indices
(M,N)=(3,1) (3,2) (3,3).
This is discussed in more detail below.
Step 2: Edit text strings
Now you use your favorite text editor to edit any of the text strings you want to modify, deleting the ones you don't. (Or you can just leave them there to be re-written as-is to the output file, although this will slow things down for huge files.)
In this case, I will copy MyDeck.text to a new file called
MyDeck.translations and make edits to just a few of the
text strings; after I've finished making my edits, the MyDeck.translations
file looks like this:
TEXT_STRING 2 0
==================================================
This is a wonderful text box.
==================================================
TEXT_STRING 3 3
==================================================
and awesome comic italics
==================================================
TEXT_STRING 5 0
==================================================
This was previously Japanese!
==================================================
TEXT_STRING 4 0
==================================================
I was previously Mandarin!
==================================================
TEXT_STRING 6 1
==================================================
Homer's Translated Deck
==================================================
TEXT_STRING 6 2
==================================================
==================================================
--------------------------------------------------
## Slide 2:Slide with graph
--------------------------------------------------
TEXT_STRING 12 0 (table)
==================================================
Cello suite in C Minor, BWV 1011, Courante
==================================================
TEXT_STRING 15 0 (table)
==================================================
K. 448 Sonata for four hands
==================================================
Step 3: Replace text strings
Finally, you do a second run of TranslatePPTX
with the same .pptx file but now using
the new command-line argument --Translations
used to specify the list of revised text strings:
% TranslatePPTX.sh MyDeck.pptx --Translations MyDeck.translations
Wrote translated document to MyDeck_Translated.pptx.
Thank you for your support.
This produces a new .pptx file called MyDeck_Translated.pptx,
whose slides now look like this:
<a name="CommandLineOptions"></a>
Other command-line options
There are a couple of other command-line options, which you can
see by running TranslatePPTX with no options:
% TranslatePPTX
usage: TranslatePPTX Original.pptx [options]
options:
--Translations Translations.txt
--WideOnly
--OmitRuns
--Autosize
--WriteFormats
--Verbose
--WriteLog
The additional options here are:
-
--WideOnly- Requests that only text strings containing double-byte characters (i.e. characters in Japanese, Chinese, or other languages) be considered.
-
--OmitRuns- Requests that the
.textoutput file omit separate lines for text runs, retaining only lines for text shapes. (See below for more on this distinction.)
- Requests that the
-
--Autosize- Requests that TranslatePPTX attempt to rescale text font sizes to preserve the size of text boxes.
-
--WriteFormats- Requests that TranslatePPTX write formatting information (font, font size, font color, URL addresses, bold, italic, etc.) for text runs to the
.textoutput file.
- Requests that TranslatePPTX write formatting information (font, font size, font color, URL addresses, bold, italic, etc.) for text runs to the
-
--Verbose- Requests more verbose console output.
-
--WriteLog- Requests that TranslatePPTX write a
.logfile explaining what is doing. This is useful for debugging.
- Requests that TranslatePPTX write a
<a name="ShapesRuns"></a>
Text shapes vs. text runs
As noted above, some text strings in .pptx files
appear twice within the the .text file produced by TranslatePPTX
What's going on here is that PowerPoint internally distinguishes
between text shapes and text runs. Text shapes are larger-scale
entities that consist of one or more text runs. Each text run
has a fixed font size and color.
In most cases, it is convenient to edit entire text shapes all at
once; however, for cases in which a single text shape
includes multiple distinct fonts/colors, you will want to
edit the individual text runs to preserve that fine-grained detail.
TranslatePPTX allows for both of these possibilities. For the Nth
text shape in your .pptx file, the .text file includes
-
one
TEXT_STRINGwith indicesN 0containing the full text of the shape (including the contributions of all text runs), and -
separate
TEXT_STRINGs with indices `N
