Bookmaker
Macmillan's Bookmaker tool
Install / Use
/learn @macmillanpublishers/BookmakerREADME
Macmillan's Bookmaker Toolchain
Welcome to the Bookmaker toolchain! Bookmaker comprises a series of scripts that turn a Word document into an HTML document, and then into a PDF and/or EPUB file.
Each script in the Bookmaker sequence performs a distinct set of actions that builds on the scripts that came before, and depends on any number of other scripts or tools. While most of these scripts were originally written for internal use at Macmillan, we've done our best to hone them down to a cross-platform, generic core that can be used out of the box (though there are still a number of dependencies, discussed further down). The scripts all live here, in the core directory.
It's important to note that correct transformation depends on correct application of the Macmillan Word template, a set of styles and rules for Microsoft Word manuscripts that create the initial structure each manuscript needs in order to cleanly transform into valid HTMLBook HTML. You can learn more about styling and the Word template here.
Bookmaker Components
The scripts are as follows:
config.rb: This is where you configure your system set-up, for example, the location of your cloned core scripts, location of the external dependencies, etc.
header: This is the core Bookmaker library, that contains paths and references common to all the Bookmaker scripts.
tmparchive: Creates the temporary working directory for the file to be converted, and opens an alert to the user telling them the tool is in use.
Dependencies: Pre-determined folder structure
htmlmaker: Converts the .xml file to HTML using wordtohtml.xsl.
Dependencies: tmparchive, Python 2.7.x, correct application of the Macmillan Word template, Java JDK, Saxon, wordtohtml.xsl
filearchive: Creates the directory structure for the converted filesbookmaker_coverchecker: Verifies that a cover image has been submitted. If yes, copies the cover image file into the final archive. If no, creates an error file notifying the user that the cover is missing.
Dependencies: tmparchive, htmlmaker
imagechecker: Checks to see if any images are referenced in the HTML file, and if those image files exist in the submission folder. If images are present, copies them to the final archive; if missing, creates an error file noting which image files are missing.
Dependencies: tmparchive, htmlmaker, filearchive
coverchecker: Checks to see if a front cover image file exists in the submission folder. If the cover image is present, copies it to the final archive; if missing, creates an error file noting that the cover is missing.
Dependencies: tmparchive, htmlmaker, filearchive
stylesheets: Copies EPUB and PDF css into the final archive, while also counting how many chapters are in the book and adjusting the CSS to suppress chapter numbers if only one chapter is found.
Dependencies: tmparchive, htmlmaker, filearchive
pdfmaker: Preps the HTML file and sends to the DocRaptor service for conversion to PDF.
Dependencies: tmparchive, htmlmaker, filearchive, imagechecker, coverchecker, chapterheads, SSL cert file, DocRaptor cloud service, doc_raptor ruby gem
epubmaker: Preps the HTML file and converts to EPUB using the HTMLBook scripts.
Dependencies: tmparchive, htmlmaker, filearchive, imagechecker, coverchecker, chapterheads, Saxon, HTMLBook, python
cleanup: Removes all temporary working files and working dirs.
*Dependencies: tmparchive, htmlmaker, filearchive, imagechecker, coverchecker, stylesheets
Project Metadata
Bookmaker requires a few pieces of metadata to accompany each project, which you can provide in a JSON file. Here's a sample:
config.json
{
"title":"Alice in Wonderland",
"author":"Lewis Carroll",
"productid":"99237561",
"printid":"9781234567890",
"ebookid":"9781234567899",
"imprint":"Project Gutenberg",
"publisher":"Project Gutenberg",
"printcss":"/Users/nellie/Documents/css/pdf.css",
"printjs":"/Users/nellie/Documents/js/pdf.js",
"ebookcss":"/Users/nellie/Documents/css/epub.css",
"frontcover":"cover.jpg"
}
Each of the following fields is used for various purposes throughout the Bookmaker toolchain:
- title. Required for ebook metadata. If not found, will fallback to input file name.
- author. Required for ebook metadata. If not found, will fallback to "Unknown".
- productid. Required for file naming. If not found, will fallback to input file name.
- printid. Required for file naming. If not found, will fallback to input file name.
- ebookid. Required for ebook metadata and file naming. If not found, will fallback to input file name.
- imprint. Required for ebook metadata. If not found, will fallback to "Unknown".
- publisher. Required for ebook metadata. If not found, will fallback to "Unknown".
- printcss. Required for PDF formatting. Can be either a full path to a file on your computer, or just a filename (if just a filename is provided, bookmaker will assume the css file is in the assets directory, along with the cover and config.json files). If not found, will use the default Prince stylesheet.
- ebookcss. Required for ebook formatting. Can be either a full path to a file on your computer, or just a filename (if just a filename is provided, bookmaker will assume the css file is in the assets directory, along with the cover and config.json files). If not found, no extra formatting will be applied.
- frontcover. Front cover image to include in the ebook. If not found, no cover image will appear.
Folder Structure
By default, Bookmaker will look for all files (images, config.json) in the same folder as the input file, and create the output folders there as well. However, you can specify a custom submission folder and done folder in config.rb.
Additionally, the following directory structures are required:
- All supplemental resources (saxon, zip) should live in the same parent folder, at the same level (i.e., they should be siblings to each other).
- All bookmaker scripts (including WordXML-to-HTML, HTMLBook, and covermaker) should live within the same parent folder, at the same level.
- A folder must exist for storing log files. This can live anywhere.
- A temporary working directory should be created, where Bookmaker can perform the conversions before archiving the final files. This can live anywhere.
Paths for all of the above four folders must be configured in config.rb. See the installation instructions below for details.
Dependencies
The Bookmaker scripts depend on various other utilities, as follows:
- Java: Saxon requires the Java JDK.
- Node.js: Platform for server-side JavaScript execution, used for content transformations.
- Python (version 2.7.x): Converts Word .docx files to XML.
- Saxon: An XSLT processor that runs our Word-to-HTML scripts.
- Ruby: The primary scripting language used in the Bookmaker scripts.
- Prince or docraptor: The external service that performs the HTML-to-PDF conversion. Prince is downloadable software. Docraptor requires a ruby gem, and you'll also need to create an account and get your unique API key.
- An ftp server (if you'll be creating PDFs and your book contains images, custom fonts, custom CSS, or other resources besides the HTML).
- SSL Cert (Windows only): The SSL Cert file needs to be updated to allow the scripts to post and receive from DocRaptor.
- Imagemagick: enables command line image edits. Download here and add to path via cmd line: set PATH=C:\Program Files\ImageMagick-6.9.1-Q16n;%PATH% (<-version suffix may change, use your own path)
Installation
Install Bookmaker by following these steps, in order.
Create the Folder Structure
On your server, create the following folders and subfolders.
- A folder to drop the project to be converted (see above).
- Temp folder: A folder where the system can store temporary files created during conversion. This can live anywhere and have the name of your choosing (you'll tell Bookmaker where it is in config.rb).
- Bookmaker folder: A main parent folder to contain all of the separate bookmaker script folders. This can live anywhere and have the name of your choosing (you'll tell Bookmaker where it is in config.rb).
- Resources folder: A folder for all the supplemental utilities (saxon, zip, etc). This can live anywhere and have the name of your choosing (you'll tell Bookmaker where it is in config.rb).
- Log folder: A folder for storing log files. This can live anywhere and have the name of your choosing (you'll tell Bookmaker where it is in config.rb).
Install Git and Set Up Your GitHub Account
If you haven't yet set up a GitHub account, do that now (you can just set up a basic, free account).
Now install git on your server, following the standard instructions.
Clone the Repositories
The source code for the Bookmaker scripts is hosted in the Macmillan GitHub account, broken down into several repositories.
