SkillAgentSearch skills...

Ooopy

OOoPy is a library in Python for inspecting, creating or modifying OpenOffice.org documents. It uses the existing ElementTree XML library by for manipulation of the OOo XML.

Install / Use

/learn @gebi/Ooopy
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

.. image:: http://sflogo.sourceforge.net/sflogo.php?group_id=134329&type=7 :height: 62 :width: 210 :alt: SourceForge.net Logo :target: http://sourceforge.net/projects/ooopy/

OOoPy: Modify OpenOffice.org documents in Python

:Author: Ralf Schlatterbeck rsc@runtux.com

OpenOffice.org (OOo) documents are ZIP archives containing several XML files. Therefore it is easy to inspect, create, or modify OOo documents. OOoPy is a library in Python for these tasks with OOo documents. To not reinvent the wheel, OOoPy uses an existing XML library, ElementTree_ by Fredrik Lundh. OOoPy is a thin wrapper around ElementTree_ using Python's ZipFile to read and write OOo documents.

.. _ElementTree: http://effbot.org/zone/element-index.htm

In addition to being a wrapper for ElementTree_, OOoPy contains a framework for applying XML transforms to OOo documents. Several Transforms for OOo documents exist, e.g., for changing OOo fields (OOo Insert-Fields menu) or using OOo fields for a mail merge application. Some other transformations for modifying OOo settings and meta information are also given as examples.

Applications like this come in handy in applications where calling native OOo is not an option, e.g., in server-side Web applications.

If the mailmerge transform doesn't work for your document: The OOo format is well documented but there are ordering constraints in the body of an OOo document. I've not yet figured out all the tags and their order in the OOo body. Individual elements in an OOo document (like e.g., frames, sections, tables) need to have their own unique names. After a mailmerge, there are duplicate names for some items. So far I'm renumbering only frames, sections, and tables. See the renumber objects at the end of ooopy/Transforms.py. So if you encounter missing parts of the mailmerged document, check if there are some renumberings missing or send me a bug report_.

.. _bug report: http://ooopy.sourceforge.net/#reporting-bugs

There is currently not much documentation except for a python doctest in OOoPy.py and Transformer.py and the command-line utilities_. For running these test, after installing ooopy (assuming here you installed using python into /usr/local)::

cd /usr/local/share/ooopy python run_doctest.py /usr/local/lib/python2.X/site-packages/ooopy/Transformer.py python run_doctest.py /usr/local/lib/python2.X/site-packages/ooopy/OOoPy.py

Both should report no failed tests. For running the doctest on python2.3 with the metaclass trickery of autosuper, see the file run_doctest.py. For later versions of python the bug in doctest is already fixed.

Usage

There were some slight changes to the API when supporting the open document format introduced with OOo 2.0. See below if you get a traceback when upgrading from an old version.

See the online documentation, e.g.::

% python

from ooopy.OOoPy import OOoPy help (OOoPy) from ooopy.Transformer import Transformer help (Transformer)

Help, I'm getting an AssertionError traceback from Transformer, e.g.::

Traceback (most recent call last): File "./replace.py", line 17, in ? t = Transformer(Field_Replace(replace = replace_dictionary)) File "/usr/local/lib/python2.4/site-packages/ooopy/Transformer.py", line 1226, in init assert (mimetype in mimetypes) AssertionError

The API changed slightly when implementing handling of different versions of OOo files. Now the first parameter you pass to the Transformer constructor is the mimetype of the OpenOffice.org document you intend to transform. The mimetype can be fetched from another opened OOo document, e.g.::

ooo = OOoPy (infile = 'test.odt', outfile = 'test_out.odt') t = Transformer(ooo.mimetype, ...

Usage of Command-Line Utilities

A, well, there are command-line _utilities now:

  • ooo_cat for concatenating several OOo files into one
  • ooo_grep to do equivalent of grep -l on OOo files -- only runs on Unix-like operating systems, probably only with the GNU version of grep (it's a shell-script using ooo_as_text)
  • ooo_fieldreplace for replacing fields in an OOo document
  • ooo_mailmerge for doing a mailmerge from a template OOo document and a CSV (comma separated values) input
  • ooo_as_text for getting the text from an OOo-File (e.g., for doing a "grep" on the output).
  • ooo_prettyxml for pretty-printing the XML nodes of one of the XML files inside an OOo document. Mainly useful for debugging.

All utilities take a --help option.

Resources

Project information and download from Sourceforge main page_

.. _Sourceforge main page: http://sourceforge.net/projects/ooopy/

You need at least version 2.3 of python.

For using OOoPy with Python versions below 2.5, you need to download and install the ElementTree Library_ by Fredrik Lundh. For documentation about the OOo XML file format, see the book by J. David Eisenberg called OASIS OpenDocument Essentials_ which is under the Gnu Free Documentation License and is also available in print. For a reference document you may want to check out the XML File Format Specification (PDF) by OpenOffice.org.

A german page for OOoPy exists at runtux.com_

.. _ElementTree Library: http://effbot.org/downloads/#elementtree .. _OASIS OpenDocument Essentials: http://books.evc-cit.info/ .. _in print: http://www.lulu.com/product/paperback/oasis-opendocument-essentials/392512 .. _XML File Format Specification: http://xml.openoffice.org/xml_specification.pdf .. _runtux.com: http://www.runtux.com/ooopy.html

Reporting Bugs

Please use the Sourceforge Bug Tracker_ and

  • attach the OOo document that reproduces your problem
  • give a short description of what you think is the correct behaviour
  • give a description of the observed behaviour
  • tell me exactly what you did.

.. _Sourceforge Bug Tracker: http://sourceforge.net/tracker/?group_id=134329&atid=729727

Changes

Version 1.11: Small Bug fix ooo_mailmerge

Now ooo_mailmerge uses the delimiter option, it was ignored before. Thanks to Bob Danek for report and test.

  • Fix setting csv delimiter in ooo_mailmerge

Version 1.10: Fix table styles when concatenating

Now ooo_cat fixes tables styles when concatenating (renaming): We optimize style usage by re-using existing styles. But for some table styles the original names were not renamed to the re-used ones. Fixes SF Bug 10, thanks to Claudio Girlanda for reporting.

  • Fix style renaming for table styles when concatenating documents
  • Add some missing namespaces (ooo 2009)

Version 1.9: Add Picture Handling for Concatenation

Now ooo_cat supports pictures, thanks to Antonio Sánchez for reporting that this wasn't working.

  • Add a list of filenames + contents to Transformer
  • Update this file-list in Concatenate
  • Add Manifest_Append transform to update META-INF/manifest.xml with added filenames
  • Add hook in OOoPy for adding files
  • Update tests
  • Update ooo_cat to use new transform
  • This is the first release after migration of the version control from Subversion to GIT

Version 1.8: Minor bugfixes

Distribute a missing file that is used in the doctest. Fix directory structure. Thanks to Michael Nagel for suggesting the change and reporting the bug.

  • The file testenum.odt was missing from MANIFEST.in
  • All OOo files and other files needed for testing are now in the subdirectory testfiles.
  • All command line utilities are now in subdirectory bin.

Version 1.7: Minor feature additions

Add --newlines option to ooo_as_text: With this option the paragraphs in the office document are preserved in the text output. Fix assertion error with python2.7, thanks to Hans-Peter Jansen for the report. Several other small fixes for python2.7 vs. 2.6.

  • add --newlines option to ooo_as_text
  • fix assertion error with python2.7 reported by Hans-Peter Jansen
  • fix several deprecation warnings with python2.7
  • remove zip compression sizes from regression test: the compressor in python2.7 is better than the one in python2.6

Version 1.6: Minor bugfixes

Fix compression: when writing new XML-files these would be stored instead of compressed in the OOo zip-file resulting in big documents. Thanks to Hans-Peter Jansen for the patch. Add copyright notice to command-line utils (SF Bug 2650042). Fix mailmerge for OOo 3.X lists (SF Bug 2949643).

  • fix compression flag, patch by Hans-Peter Jansen
  • add regression test to check for compression
  • now release ooo_prettyxml -- I've used this for testing for quite some time, may be useful to others
  • Add copyright (LGPL) notice to command-line utilities, fixes SF Bug 2650042
  • OOo 3.X adds xml:id tags to lists, we now renumber these in the mailmerge app., fixes SF Bug 2949643

Version 1.5: Minor feature enhancements

Add ooo_grep to search for OOo files containing a pattern. Thanks to Mathieu Chauvinc for the reporting the problems with modified manifest.xml. Support python2.6, thanks to Erik Myllymaki for reporting and anonymous contributor(s) for confirming the bug.

  • New shell-script ooo_grep (does equivalent to grep -l on OOo Files)
  • On deletion of an OOoPy object close it explicitly (uses del)
  • Ensure mimetype is the first element in the resulting archive, seems OOo is picky about this.
  • When modifying the manifest the resulting .odt file could not be opened by OOo. So when modifying manifest make sure the manifest namespace is named "manifest" not something auto-generated by ElementTree. I consider this a bug in OOo to require this. This now uses the _namespace_map of ElementTree and uses the same names as OOo for all namespaces. The META-INF/manifest.xml is now in the list of files to which Transforms can be applied.
  • When modifying (or creating) archive members, we create the OOo archive as if it was a DOS system (type fat) and

Related Skills

View on GitHub
GitHub Stars6
CategoryProduct
Updated3y ago
Forks1

Languages

Python

Security Score

55/100

Audited on Jul 21, 2022

No findings