libCODY: COmpiler DYnamism<a href="#1">1</a>

libCODY is an implementation of a communication protocol between compilers and build systems.

WARNING: This is preliminary software.

In addition to supporting C++modules, this may also support LTO requirements and could also deal with generated #include files and feed the compiler with prepruned include paths and whatnot. (The system calls involved in include searches can be quite expensive on some build infrastructures.)

Client and Server objects
Direct connection for in-process use
Testing with Joust (that means nothing to you, doesn't it!)

Problem Being Solved

The origin is in C++20 modules:

import foo;

At that import, the compiler needs<a href="#2">2</a> to load up the compiled serialization of module foo. Where is that file? Does it even exist? Unless the build system already knows the dependency graph, this might be a completely unknown module. Now, the build system knows how to build things, but it might not have complete information about the dependencies. The ultimate source of dependencies is the source code being compiled, and specifying the same thing in multiple places is a recipe for build skew.

Hence, a protocol by which a compiler can query a build system. This was originally described in <a href="https://wg21.link/p1184r1">p1184r1:A Module Mapper</a>. Along with a proof-of-concept hack in GNUmake, described in <a href="https://wg21.link/p1602">p1602:Make Me A Module</a>. The current implementation has evolved and an update to p1184 will be forthcoming.

Packet Encoding

The protocol is turn-based. The compiler sends a block of one or more requests to the builder, then waits for a block of responses to all of those requests. If the builder needs to compile something to satisfy a request, there may be some time before the response. A builder may service multiple compilers concurrently, each as a separate connection.

When multiple requests are in a block, the responses are also in a block, and in corresponding order. The responses must not be commenced eagerly -- they must wait until the incoming block has ended (as mentioned above, it is turn-based). To do otherwise risks deadlock, as there is no requirement for a sending end of the communication to listen for incoming responses (or new requests) until it has completed sending its current block.

Every request has a response.

Requests and responses are user-readable text. It is not intended as a transmission medium to send large binary objects (such as compiled modules). It is presumed the builder and the compiler share a file system, for that kind of thing.<a href="#3">3</a>

Messages characters are encoded in UTF8.

Messages are a sequence of octets ending with a NEWLINE (0xa). The lines consist of a sequence of words, separated by WHITESPACE (0x20 or 0x9). Words themselves do not contain WHITESPACE. Lines consisting solely of WHITESPACE (or empty) are ignored.

To encode a block of multiple messages, non-final messages end with a single word of SEMICOLON (0x3b), immediately before the NEWLINE. Thus a serial connection can determine whether a block is complete without decoding the messages.

Words containing characters in the set [-+_/%.A-Za-z0-9] need not be quoted. Words containing characters outside that set should be quoted. A zero-length word may be achieved with ''

Quoted words begin and end with APOSTROPHE (x27). Within the quoted word, BACKSLASH (x5c) is used as an escape mechanism, with the following meanings:

\n - NEWLINE (0xa)
\t - TAB (0x9)
\' - APOSTROPHE (')
\\ - BACKSLASH (\)

Characters in the range [0x00, 0x20) and 0x7f are encoded with one or two lowercase hex characters. Octets in the range [0x80,0xff) are UTF8 encodings of unicode characters outside the traditional ASCII set and passed as such.

Decoding should be more relaxed. Unquoted words containing characters in the range [0x20,0xff] other than BACKSLASH or APOSTROPHE should be accepted. In a quoted sequence, \ followed by one or two lower case hex characters decode to that octet. Further, words can be constructed from a mixture of abutted quoted and unquoted sequences. For instance FOO' 'bar would decode to the word FOO bar.

Notice that the block continuation marker of ; is not a valid encoding of the word ;, which would be ';'.

It is recommended that words are separated by single SPACE characters.

Messages

The message descriptions use $metavariable examples.

The request messages are specific to a particular action. The response messages are more generic, describing their value types, but not their meaning. Message consumers need to know the response to decode them. Notice the Packet::GetRequest() method records in response packets what the request being responded to was. Do not confuse this with the Packet::GetCode () method.

Responses

The simplest response is a single:

OK

This indicates the request was successful.

An error response is:

ERROR $message

The message is a human-readable string. It indicates failure of the request.

Pathnames are encoded with:

PATHNAME $pathname

Boolean responses use:

BOOL (TRUE|FALSE)

Handshake Request

The first message is a handshake:

HELLO $version $compiler $ident

The $version is a numeric value, currently 1. $compiler identifies the compiler — builders may need to keep compiled modules from different compilers separate. $ident is an identifier the builder might use to identify the compilation it is communicating with.

Responses are:

HELLO $version $builder [$flags]

A successful handshake. The communication is now connected and other messages may be exchanged. An ERROR response indicates an unsuccessful handshake. The communication remains unconnected.

There is nothing restricting a handshake to its own message block. Of course, if the handshake fails, subsequent non-handshake messages in the block will fail (producing error responses).

The $flags word, if present allows a server to control what requests might be given. See below.

C++ Module Requests

A set of requests are specific to C++ modules:

Flags

Several requests and one response have an optional $flags word. These are the Cody::Flags value pertaining to that request. If omitted the value 0 is implied. The following flags are available:

0, None: No flags.
1<<0, NameOnly: The request is for the name only, and not the CMI contents.

The NameOnly flag may be provded in a handshake response, and indicates that the server is interested in requests only for their implied dependency information. It may be provided on a request to indicate that only the CMI name is required, not its contents (for instance, when preprocessing). Note that a compiler may still make NameOnly requests even if the server did not ask for such.

Repository

All relative CMI file names are relative to a repository. (There are usually no absolute CMI files). The repository may be determined with:

MODULE-REPO

A PATHNAME response is expected. The $pathname may be an empty word, which is equivalent to .. When the response is a relative pathname, it must be relative to the client's current working directory (which might be a process on a different host to the server). You may set the repository to /, if you with to use paths relative to the root directory.

Exporting

A compilation of a module interface, partition or header unit can inform the builder with:

MODULE-EXPORT $module [$flags]

This will result in a PATHNAME response naming the Compiled Module Interface pathname to write.

The MODULE-EXPORT request does not indicate the module has been successfully compiled. At most one MODULE-EXPORT is to be made, and as the connection is for a single compilation, the builder may infer dependency relationships between the module being generated and import requests made.

Named module names and header unit names are distinguished by making the latter unambiguously look like file names. Firstly, they must be fully resolved according to the compiler's usual include path. If that results in an absolute name file name (beginning with /, or certain other OS-specific sequences), all is well. Otherwise a relative file name must be prefixed by ./ to be distinguished from a similarly named named module. This prefixing must occur, even if the header-unit's name contains characters that cannot appear in a named module's name.

It is expected that absolute header-unit names convert to relative CMI names, to keep all CMIs within the CMI repository. This means that steps must be taken to distinguish the CMIs for /here from ./here, and this can be achieved by replacing the leading ./ directory with ,/, which is visually similar but does not have the self-reference semantics of dot. Likewise, header-unit names containing .. directories, can be remapped to ,,. (When symlinks are involved bob/dob/.. might not be bob, of course.) C++ header-unit semantics are such that there is no need to resolve multiple ways of spelling a particular header-unit to a unique CMI file.

Successful compilation of an interface is indicated with a subsequent:

MODULE-COMPILED $module [$flags]

request. This indicates the CMI file has been written to disk, so that any other compilations waiting on it may proceed. Depending on compiler implementation, the CMI may be written before the compilation completes. A single OK response is expected.

Compilation failure can be inferred by lack of a MODULE-COMPILED request. It is presumed the builder can determine this, as it is also responsible for launching and reaping the compiler invocations themselves.

Importing

Importation, including that of header-units, uses:

MODULE-IMPORT $module [$flags]

A PA

Libcody

Install / Use

README

libCODY: COmpiler DYnamism<sup><a href="#1">1</a></sup>