UnicodeMathML
JavaScript-based translation of UnicodeMath to MathML that can be integrated into arbitrary HTML or Markdeep documents. An interactive "playground" allows for experimentation with the syntax and insight into the translation pipeline.
Install / Use
/learn @doersino/UnicodeMathMLREADME
UnicodeMathML
👉 Note: Murray Sargent III, the brain behind UnicodeMath and its implementation in Microsoft's products, has now retired from his role at Microsoft and spends some of his newly-free time working on fixing, extending and generally improving UnicodeMathML – please check out his fork of this repository and the list of changes!
This repository provides a JavaScript-based translation of UnicodeMath to MathML (hence "UnicodeMathML"). An interactive "playground" allows for experimentation with UnicodeMath's syntax and insight into the translation pipeline. UnicodeMathML can be easily integrated into arbitrary HTML or Markdeep documents.
🎮 Get familiar with the syntax via the playground!
📑 Learn how to integrate UnicodeMathML into your website or Markdeep document.
UnicodeMath is an easy-to-read linear format for mathematics initially developed as an input method and interchange representation for Microsoft Office. Its author, Murray Sargent III, has published a Unicode Technical Note detailing the format, based on which this UnicodeMath to MathML translator was built. More in the FAQ section below.

The initial development of UnicodeMathML was part of my Master's thesis.
Status
Generally consistent with version 3.1 of Sargent's tech note, some edge cases that aren't unambiguously specified (or, as UnicodeMath is not wholly context-free, impossible to parse with a PEG-based approach) might differ from the canonical implementation in Microsoft Office. Abstract boxes are largely unimplemented due to insufficient specification.
Getting Started
For a first look, check out...
- ...the UnicodeMathML playground, an interactive environment that allows you to play around with UnicodeMath's syntax and its translation into MathML.
- ...an example Markdeep document whose source can be found here.
- ...or an example HTML document whose source is located here.
Depending on whether you'd like to write UnicodeMath in a Markdeep document or use UnicodeMathML on your website, there are two paths. But first:
-
Clone this repository or download a ZIP.
git clone https://github.com/doersino/UnicodeMathML.git -
Before moving on, note that UnicodeMathML by default only transforms math surrounded by the UnicodeMath delimiters
⁅and⁆. For example, a typical sentence might read like this:Given a function ⁅f⁆ of a real variable ⁅x⁆ and an interval ⁅[a, b]⁆ of the real line, the **definite integral** ⁅∫_a^b f(x) ⅆx⁆ can be interpreted informally as the signed area of the region in the ⁅xy⁆-plane that is bounded by the graph of ⁅f⁆, the ⁅x⁆-axis and the vertical lines ⁅x = a⁆ and ⁅x = b⁆.
HTML
Open dist/example.html in a text editor of your choice and scroll to the bottom. There, you'll see the following lines:
<script>
var unicodemathmlOptions = {
resolveControlWords: true,
};
</script>
<script src="unicodemathml.js"></script>
<script src="unicodemathml-parser.js"></script>
<script src="unicodemathml-integration.js"></script>
<script>
document.body.onload = renderUnicodemath();
</script>
You'll need to include the same lines (modulo path changes) at the bottom of your own HTML document or website (but before the closing </body> tag).
- Of course, you can use webpack or similar tools to combine and minify the JavaScript files, which I definitely recommend if you're planning on using UnicodeMathML in even moderate-traffic production contexts: This will shrink them from ~500 kB down to ~150 kB, and gzipping can reduce this further to ~50 kB.
- If you need to support browsers that don't support MathML natively, you will also need to load a polyfill like MathJax – UnicodeMathML will notify MathJax when the generated MathML is ready to render.
- The
unicodemathmlOptionsvariable allows you to tweak things a bit – see the "Configuration" section below for more details.
Markdeep
UnicodeMathML comes with a lightly modified variant of Morgan McGuire's Markdeep that kicks off the translation at the correct point in the document rendering process. Open dist/example.md.html in a text editor of your choice and scroll to the bottom. There, you'll see the following lines:
<script>
var unicodemathmlOptions = {
resolveControlWords: true,
};
</script>
<script src="unicodemathml.js"></script>
<script src="unicodemathml-parser.js"></script>
<script src="unicodemathml-integration.js"></script>
<script src="markdeep-1.11.js" charset="utf-8"></script>
Replace the Markdeep loading code at the bottom of your document with this code (modulo path changes).
- Markdeep will automatically load MathJax, a polyfill that will allow browsers that don't support MathML natively to render the generate MathML.
- The
unicodemathmlOptionsvariable allows you to tweak things a bit – see the "Configuration" section below for more details.
Node
While I haven't tested server-side translation of UnicodeMath into MathML, there shouldn't be any problems integrating the core of UnicodeMathML into a Node project – it's all vanilla JavaScript. If you run into any trouble, or if you would prefer an officially supported NPM package or something, don't hesitate to file an issue!
Configuration
The unicodemathmlOptions variable must be a dictionary containing one or many of the key-value pairs described below. If you're happy with the defaults, you can leave unicodemathmlOptions undefined.
var unicodemathmlOptions = {
// whether a progress meter should be shown in the bottom right of the
// viewport during translation (you can probably disable this in most cases,
// but it should remain enabled for large documents containing more than
// 1000 UnicodeMath expressions where translation might take more than a
// second or two)
showProgress: true,
// whether to resolve control words like "\alpha" to "α", this also includes
// unicode escapes like "\u1234"
resolveControlWords: false,
// a dictionary defining a number of custom control words, e.g.:
// customControlWords: {'playground': '𝐏𝓁𝔞𝚢𝗴𝑟𝖔𝓊𝙣𝕕'},
// which would make the control word "\playground" available – this is handy
// in documents where certain expressions or subexpressions are repeated
// frequently
customControlWords: undefined,
// how to display double-struck symbols (which signify differentials,
// imaginary numbers, etc.; see section 3.11 of the tech note):
// "us-tech" (ⅆ ↦ 𝑑), "us-patent" (ⅆ ↦ ⅆ), or "euro-tech" (ⅆ ↦ d)
doubleStruckMode: "us-tech",
// a function that will run before the translation is kicked off
before: Function.prototype,
// a function that will run after the translation has finished (and after
// MathJax, if loaded, has been told to render the generated MathML)
after: Function.prototype
};
FAQ
Got further questions that aren't answered below, or ideas for potential improvements, or found a bug? Feel free to file an issue!
What's this UnicodeMath you're talking about?
UnicodeMath is a linear format for mathematics initially developed as an input method and interchange representation for Microsoft Office. Its author, Murray Sargent III, has published a Unicode Technical Note (a copy of which is included at docs/sargent-unicodemathml-tech-note.pdf) describing its syntax and semantics.
By using Unicode symbols in lieu of keywords wherever possible, it's significantly more readable than, say, LaTeX in plain text:

UnicodeMath, much like MathML, was desiged with accessibility in mind, taking cues from Nemeth braille and other preceding math encodings.
How does its syntax compare to AsciiMath, (La)TeX, and MathML?
Here's a table showing a few expressions as you'd formulate them in UnicodeMath, AsciiMath, and LaTeX:

There are many subtleties as you get into the nitty-gritty, of course, but you'll see that UnicodeMath consistently makes for the most readable and concise plaintext. LaTeX, in contrast, is significantly more verbose – but since it's been around forever, you might find it to be more versatile in practice.
To summarize, here's a **totally-not-biased-and-super-scientific evaluation of these notatio
