Jessie, simple universal safe mobile code

This document is an early draft. Comments appreciated! Thanks.

Today, JavaScript is the pervasive representation for (somewhat) safe mobile code. For another representation to achieve universality quickly, it must be a subset of JavaScript, and so runs at least everywhere JavaScript runs.

Whereas JSON is a simple universal representation for safe mobile data, Jessie is a simple universal representation for safe mobile data and behavior.

Jessie is a small safe ocap subset of JavaScript that

is pleasant and expressive to program in,
can easily run within a JavaScript system,
can be safely linked with adversarial SES code,
can be easily implemented for standalone use,
can be transmitted as lightweight safe mobile code,
is amenable to a range of static analysis,
omits most of JavaScript's bad parts,
non-experts can use to write non-trivial non-exploitable smart contracts.

Subsetting EcmaScript

Unless stated otherwise, all references to EcmaScript refer to EcmaScript 2017, the eighth edition of the standard.

EcmaScript subsets Venn diagram

JSON <SA Jessie <DAT TinySES <SA SES <DA ES-strict <SDA EcmaScript

One language is a static subset (<S) of another if every program statically accepted by the smaller language is also statically accepted by the larger language with the same meaning.

One language is a dynamic subset (<D) of another if non-erroneous execution of code in the smaller language operates the same way in the larger language. The smaller language may treat some dynamic cases as errors that the larger language would not consider errors. Programs in the smaller language whose correctness relies on these errors, even if it does not provoke them itself, would generally become incorrect as programs in the larger language.

One language is absorbed (<A) by another if code in the smaller language can be run as code in the larger language without modification. A smaller language which is not absorbed may often be transpiled (<T) into the larger language by source-to-source transformation.

The diagram above illustrates the subsetting relationship between various subsets of EcmaScript. The vertical dimension represents syntactic subsetting by static means. The horizontal dimension represents semantic subsetting by either static or dynamic means. The word cloud in the contour between each language and its subset represents the features of the containing language omitted by that next smaller subset. The relative sizes of feature names reflects only their explanatory significance.

Each step needs to be explained. Proceeding from larger to smaller.

EcmaScript code may be in either strict mode or sloppy mode, so the ES-strict sublanguage is a static, dynamic, absorbed subset of full EcmaScript by definition. (Historically, the strict sublanguage started by approximating a static and dynamic subset of the sloppy language, excluding with and throwing errors where the sloppy language would instead silently act insane. But this approximation has too many exceptions to remain useful.) EcmaScript classes and modules are implicitly strict, so the vestigial sloppy language is best seen as an EcmaScript 3 compatibility mode.

Unlike full EcmaScript, ES-strict is statically scoped, ES-strict functions are strongly encapsulated, and implicit access to the global object is severely restricted. These are necessary steps towards ocap safety, but are not sufficient by themselves.

SES, or Secure EcmaScript, is a dynamic, absorbed subset of ES-strict. To achieve this subsetting, SES builds on Frozen Realms which builds on [Shadow]Realms. (Shims at Realms shim and Frozen Realms shim.) SES statically accepts all programs accepted by ES-strict and can run on ES-strict without internal modification.

Via Realms, SES removes ambient authority from the global scope, so attempts to dereference a variable named, for example, document that might succeed in ES-strict on a given host might instead throw a ReferenceError within a SES environment run on that host. Via Frozen Realms, SES freezes the primordials, so mutations that would succeed in ES-strict might instead throw a TypeError in SES.

SES is the largest subset of ES-strict which is still an ocap language. Its purpose is to run as many conventional EcmaScript programs as possible while staying within ocap rules.

TinySES is a static, absorbed subset of SES. TinySES approximates the smallest useful subset of SES that is still pleasant to program in using the objects-as-closures pattern. TinySES omits this and classes. Once initialized, the API surface of a TinySES object must be tamper-proofed before exposure to clients. TinySES is not intended to run legacy code or code that uses inheritance.

Jessie is a dynamic subset of TinySES. Jessie and TinySES have the same grammar and static restrictions. The Jessie grammar is simple enough to be parsed easily. Jessie imposes static validation rules that are easy to check locally, to ensure that objects are tamper-proofed before they escape. Statically valid Jessie programs enable sound static analysis of useful safety properties. A SES IDE can thereby flag which code is withiin the Jessie static restrictions and provide sound static analysis info for that code.

The only difference between TinySES and Jessie is that correct TinySES programs may rely on the presence of the entire SES runtime. Correct Jessie programs may only rely on a minimal subset of the SES runtime that standalone Jessie implementations can implement for reasonable effort. However, correct Jessie programs also cannot rely on the absence of the rest of the SES runtime. Jessie and TinySES programs may be linked with programs written in SES, and so may rely on SES's ocap rules to constrain these other programs.

Thus, every correct Jessie program is also a correct TinySES and SES program, and works unmodified within a SES environment run on a normal JavaScript implementation. Correct Jessie programs will also run on a standalone implementation of Jessie (which still needs to obey SES's ocap rules) in which it is linked only with other Jessie code.

JSON is a static, absorbed subset of all the languages above. JSON achieved universal adoption because

it was a subset of JavaScript, which was already pervasive
it was easy to implement on any language and any platform

Likewise, Jessie is small enough to be easily implemented as a compiler or interpreter in a wide range of other languages and platforms. Its character resembles a simple Scheme with records.

Jessie as a subset of SES

The Jessie grammar is based on the ECMAScript 2017 Grammar Summary. Unlike the Ecma page, lexical productions in the Jessie grammar are named in all upper case.

Unlike EcmaScript and SES, Jessie has no semicolon insertion, and so does not need a parser able to handle that. However, Jessie must impose the NO_NEWLINE constraints from EcmaScript, so that every non-rejected Jessie program is accepted as the same SES program. NO_NEWLINE is a lexical-level placeholder that must never consume anything. It should fail if the whitespace to skip over contains a newline. TODO: Currently this placeholder always succeeds.

Jessie omits the RegularExpressionLiteral. Some Jessie environments may instead include the RegExp.make template string tag. By omitting RegularExpressionLiteral and automatic semicolon insertion, our lexical grammar avoids the context dependencies that are most difficult for JavaScript lexers.

In Jessie, all reserved words are unconditionally reserved. By contrast, in EcmaScript and SES, yield, await, implements, etc are conditionally reserved. Thus we avoid the need for parameterized lexical-level productions.

Jessie omits both the in expression and the for/in loop, and thus avoids the need for parameterized parser-level productions.

QUASI_* are lexical-level placeholders. QUASI_ALL should match a self-contained template literal string that has no holes " `...` ". QUASI_HEAD should match the initial literal part of a template literal with holes " `...${ ". QUASI_MID should match the middle " }...${ ", and QUASI_TAIL the end " }...` ". The reason these are difficult is that a close curly "}" during a hole only terminates the hole if it is balanced. TODO: All these placeholders currently fail. There is not yet the logic needed to tell whether a close curly terminates a hole.

Outside the lexical grammar, other differences from ECMAScript 2017 Grammar Summary are noted as comments within the grammar. The Ecma page uses a cover grammar to avoid unbounded lookahead. Because Jessie grammar is defined using a PEG (parsing expression grammar) which supports unbounded lookahead, we avoid the need for a cover grammar. TODO: Determine where difficulties arise parsing according to this Jessie grammar with bounded lookahead. If difficult, we may reintroduce a cover grammar.

Jessie array literals omit elision (i.e., nothing between commas).

Jessie treats async, arguments, and eval as reserved keywords. Strict mode already limits arguments and eval to the point that they are effec

Jessie

Install / Use

README

Jessie, simple universal safe mobile code

Subsetting EcmaScript

Jessie as a subset of SES