SkillAgentSearch skills...

Caracal

Caracal is a ruby library for dynamically creating professional-quality Microsoft Word documents (.docx).

Install / Use

/learn @urvin-compliance/Caracal
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Caracal

Gem Version

Overview

Caracal is a ruby library for dynamically creating professional-quality Microsoft Word documents using an HTML-style syntax.

Caracal is not a magical HTML to Word translator. Instead, it is a markup language for generating Office Open XML (OOXML). Programmers create Word documents by issuing a series of simple commands against a document object. When the document is rendered, Caracal takes care of translating those Ruby commands into the requisite OOXML. At its core, the library is essentially a templating engine for the :docx format.

Or, said differently, if you use Prawn for PDF generation, you'll probably like Caracal. Only you'll probably like it better. :)

Please see the caracal-example repository for a working demonstration of the library's capabilities.

Teaser

How would you like to make a Word document like this?

Caracal::Document.save 'example.docx' do |docx|
  # page 1
  docx.h1 'Page 1 Header'
  docx.hr
  docx.p
  docx.h2 'Section 1'
  docx.p  'Lorem ipsum dolor....'
  docx.p
  docx.table @my_data, border_size: 4 do
    cell_style rows[0], background: 'cccccc', bold: true
  end

  # page 2
  docx.page
  docx.h1 'Page 2 Header'
  docx.hr
  docx.p
  docx.h2 'Section 2'
  docx.p  'Lorem ipsum dolor....'
  docx.ul do
    li 'Item 1'
    li 'Item 2'
  end
  docx.p
  docx.img 'https://www.example.com/logo.png', width: 500, height: 300
end

You can! Read on.

Why is Caracal Needed?

We created Caracal to satisfy a genuine business requirement. We were working on a system that produced a periodic PDF report and our clients asked if the report could instead be generated as a Word document, which would allow them to perform edits before passing the report along to their clients.

Now, as you may have noticed, the Ruby community has never exactly been known for its enthusiastic support of Microsoft standards. So it might not surprise you to learn that the existing options on Rubygems for Word document generation were limited. Those libraries, by and large, fell into a couple of categories:

  • HTML to Word Convertors We understand the motivating idea here (two output streams from one set of instructions), but the reality is the number of possible permutations of nested HTML tags is simply too great for this strategy to ever work for anything other than the simplest kinds of documents. Most of these libraries rely on a number of undocumented assumptions about the structure of your HTML (which undermines the whole value proposition of a convertor) and fail to support basic features of a professional-quality Word document (e.g., images, lists, tables, etc). The remaining libraries simply did not work at all.

  • Weekend Projects We also found a number of inactive projects that appeared to be experiments in the space. Obviously, these libraries were out of the question for a commercial product.

What we wanted was a Prawn-style library for the :docx format. In the absence of an active project organized along those lines, we decided to write one.

Design

Caracal is designed to separate the process of parsing and collecting rendering instructions from the process of rendering itself.

First, the library consumes all programmer instructions and organizes several collections of data models that capture those instructions. These collections are ordered and nested exactly as the instructions we given. Each model contains all the data required to render it and is responsible for declaring itself valid or invalid.

Note: Some instructions create more than one model. For example, the img method both appends an ImageModel to the main contents collection and determines whether or not a new RelationshipModel should be added to the relationships collection.

Only after all the programmer instructions have been parsed does the document attempt to render the data to XML. This strategy gives the rendering process a tremendous amount of flexibility in the rare cases where renderers combine data from more than one collection.

File Structure

You may not know that .docx files are simply a zipped collection of XML documents that follow the OOXML standard. (We didn't, in any event.) This means constructing a .docx file from scratch actually requires the creation of several files. Caracal abstracts users from this process entirely.

For each Caracal request, the following document structure will be created and zipped into the final output file:

    example.docx
      |- _rels
      	|- .rels
      |- docProps
        |- app.xml
        |- core.xml
        |- custom.xml
      |- word
        |- _rels
          |- document.xml.rels
        |- media
          |- image001.png
          |- image002.png
          ...
        |- document.xml
        |- fontTable.xml
        |- footer.xml
        |- numbering.xml
        |- settings.xml
        |- styles.xml
      |- [Content_Types].xml

File Descriptions

The following provides a brief description for each component of the final document:

_rels/.rels Defines an internal identifier and type for global content items. This file is generated automatically by the library based on other user directives.

docProps/app.xml Specifies the name of the application that generated the document. This file is generated automatically by the library based on other user directives.

docProps/core.xml Specifies the title of the document. This file is generated automatically by the library based on other user directives.

docProps/custom.xml Specifies the custom document properties. This file is generated automatically by the library based on other user directives.

word/_rels/document.xml.rels Defines an internal identifier and type with all external content items (images, links, etc). This file is generated automatically by the library based on other user directives.

word/media/ A collection of media assets (each of which should have an entry in document.xml.rels).

word/document.xml The main content file for the document.

word/fontTable.xml Specifies the fonts used in the document.

word/footer.xml Defines the formatting of the document footer.

word/numbering.xml Defines ordered and unordered list styles.

word/settings.xml Defines global directives for the document (e.g., whether to show background images, tab widths, etc). Also, establishes compatibility with older versions on Word.

word/styles.xml Defines all paragraph and table styles used through the document. Caracal adds a default set of styles to match its HTML-like content syntax. These defaults can be overridden.

[Content_Types].xml Pairs extensions and XML files with schema content types so Word can parse them correctly. This file is generated automatically by the library based on other user directives.

Units

OpenXML properties are specified in several different units, depending on which attribute is being set.

Points Most spacing declarations are measured in full points.

Half Points All font sizes are measure in half points. A font size of 24 is equivalent to 12pt.

Eighth Points Borders are measured in 1/8 points. A border size of 4 is equivalent to 0.5pt.

Twips A twip is 1/20 of a point. Word documents are printed at 72dpi. 1in == 72pt == 1440 twips.

Pixels In Word documents, pixels are equivalent to points.

EMUs (English Metric Unit) EMUs are a virtual unit designed to facilitate the smooth conversion between inches, millimeters, and pixels for images and vector graphics. 1in == 914400 EMUs == 72dpi x 100 x 254.

At present, Caracal expects values to be specified in whichever unit OOXML requires. This is admittedly difficult for new Caracal users. Eventually, we'll probably implement a utility object under the hood to convert user-specified units into the format expected by OOXML.

Syntax Flexibility

Generally speaking, Caracal commands will accept instructions via any combination of a parameters hash and/or a block. For example, all of the following commands are equivalent.

docx.style id: 'special', name: 'Special', size: 24, bold: true

docx.style id: 'special', size: 24 do
  name 'Special'
  bold true
end

docx.style do
  id   'special'
  name 'Special'
  size 24
  bold true
end

Parameter options are always evaluated before block options. This means if the same option is provided in the parameter hash and in the block, the value in the block will overwrite the value from the parameter hash. Tread carefully.

Validations

All Caracal models perform basic validations on their attributes, but this is, without question, the least sophisticated part of the library at present.

In forthcoming versions of Caracal, we'll be looking to expand the InvalidModelError class to provide broader error reporting abilities across the entire library.

Installation

Add this line to your application's Gemfile:

gem 'caracal'

Then execute:

bundle install

Commands

In the following examples, the variable docx is assumed to be an instance of Caracal::Document.

docx = Caracal::Document.new('example_document.docx')

Most code examples show optional values being passed in a block. As noted above, you may also pass these options as a parameter hash or as a combination of a parameter hash and a block.

File Name

The final output document's title can be set at initialization or via the file_name method.

docx = Caracal::Document.new('example_document.docx')

docx.file_name 'different_name.docx'

The current document name can be returned by invoking the name method:

docx.name    # => 'example_document.docx'

The default file name is caracal.docx.

Page S

Related Skills

View on GitHub
GitHub Stars333
CategoryDevelopment
Updated1mo ago
Forks161

Languages

Ruby

Security Score

95/100

Audited on Feb 23, 2026

No findings