Vallang

Maven Release

What is Vallang?

Vallang is a highly integrated and mostly-closed collection of mutually recursive fundamental data-types on the Java Virtual Machine:

locations represented by URIs: |java+class://java/lang/String| and |file:///tmp/HelloWorld.java|
integers of arbitrary size: 1,2,3, 134812345123841234
reals of arbitrary size, precision and scale: 1., 1.0, 1e10
rational numbers: 1r1, 1r7
unicode strings: "hello 🌐"
lists: [1,2, 1.0, "hello 🌐"], []
sets: {1,2, 1.0, "hello 🌐"}, {}
maps: (1:0, "a":"b"), ()
n-ary tuples with named fields: <1,2,"a",1.0>, <>
n-ary relations (represented as sets of n-ary tuples): {<1,2,"a",1.0>, <>}
tree nodes: "myNode"(1,2,3)
many-sorted algebraic terms, acting as typed tree nodes: myNode(1,2,3).
keyword fields or properties to tree nodes and algebraic data-types: "myNode"(name="Winston"), myNode(age=12)
functions: currently only function types are supported, we are working on adding function values.

Operations on these data-types are too many to list here. A selection is listed below, but you should expect the features to be pretty low level; i.e. directly accessing and manipulating the data rather than providing analysis algorithms. Algorithms in the library are added only if programming them below the abstraction layer of vallang provides a major efficiency benefit or it can factors out highly common client code into a reusable feature. More on this design decision later.

relational calculus operators such as transitive (reflexive) closure, query and projections, compositions and joins
generic tree traversal and primitives for implementing pattern matching

Vallang has a type system based on type systems in functional programming, but note that each value has a most specific dynamic type associated always at run-time. More on the design of the type system below, but here is a list of vallang types:

void - the bottom type with no values
value - the top type for all values
loc - the type for URI locations
int, real, rat are all sub-types of the aggregate type num
tuple[t1,...,tn] and tuple[t1 l1, ..., tn ln] to represent tuples of fixed but arbitrary arity
list[t], set[t], map[t1,t2] as incomparable alternative collection types.
node for trees
user-defined many-sorted mutually recursive algebraic data-types, acting as grammars for instances of tree nodes: data MyADT = myNode(int a, int b, int c, int age=...)
alias types for short-handed type equivalences
rel[t1, ..., tn] is an alias for set[tuple[t1,...tn]]
open type parameters with upper bounds, &T <: node, can be used to type parameterize composite types (used in type aliases) and to construct higher-order abstract algebraic datatypes.

Sub-typing is co-variant for the container types list, map, set and tuple. Otherwise these rules define the entire type system:

value is a strict supertype of all other types other than itself
node is the common supertype of all algebraic data-types
each algebraic constructor is a strict sub-type of its abstract sort
void is the sub-type of all types
num is the supertype of rat, int and real
an alias alias X = Y is a type equivalence
constructor types are sub-types if they have the same name and arity, and they are comparable in their argument types.
- Within a single abstract sort no two alternative constructors may be sub-types of each other.

There exists an extension mechanism for adding type kinds and their associated value kinds to the vallang system. Rascal, for example, uses this to represent functions and co-routines at run-time. The extension mechanism works by declaring a bi-directional transformation between the extensions and a symbolic representation of choice (chosen freely from the core representation mechanisms of vallang). This bidirectional mapping is mostly used when serializing and deserializing values (see below).

The types of vallang in Java are represented by a Composite design pattern with maximally shared instances of (the otherwise opaque) abstract class Type. These types expose fast implementations of sub-type and type equivalence for implementing fast pattern matching.

The values of vallang are all instances of IValue and sub-interfaces thereof. For every kind of value there is an interface, e.g. ISet, IList, ITuple and IConstructor but they are not type-parametrized because Java's type system can not represent the aforementioned co-variant sub-typing rules we require.

Why does Vallang exist?

vallang is a UseTheSource project recently renamed from rascal-values, which was known earlier as pdb.values.

The project started as a part of the IDE metatooling platform in 2007 as a generic library for representing symbolic facts about source code, for use in the construction of IDE services in Eclipse, then it continued to form the basis of the run-time system for Rascal starting 2009, and finally was renamed to vallang to serve a wider scope.

We designed of vallang based on experience with and studying the ATerm library and ASF+SDF, but also by learning from RSF (Rigi Standard Format), Rscript and GXL and S-expressions. Perhaps JSON and YAML have also had a minor influence.

The main purpose of vallang is to provide a flexible and fully typed collection of symbolic representations of data, specifically "ready" to represent facts about software systems but amenable to basically any form of symbolic data analysis purposes.

This purpose aligns with the mission of the Rascal metaprogramming language which is made to analyze and manipulate exactly such symbolic representations. Therefore vallang is the run-time environment for both interpreted and compiled Rascal programs.

Note that while vallang is a great fit for symbolic data analysis, it is currently not the best fit for numerical data analysis as it features only a uniform symbolic represetation of numbers of arbitrary precision and size (ints, reals, rationals). In other words, the numbers and collections of numbers in vallang are optimized for storage size, clarity and equational reasoning rather than optimal computational efficiency. This also means that indirect numerical encodings of data (i.e. using numerical vectors and matrices), which are often used in symbolic analyses to optimize computational efficiency are not the right strategy when using vallang: it's better to stick with a more direct symbolic representation and let vallang maintainers optimize them.

Next to the maintainers of Rascal, the main users of vallang are currently programmers who write data acquisition and (de)serialisation adapters for the Rascal ecosystem:

connecting open-compiler front-ends to Rascal
providing external data-sources such as SQL and MongoDB databases
connecting reusable interaction and visualization front-ends to Rascal

Nevertheless vallang is a generic and Rascal-independent library which may serve as the run-time system for other programming languages or analysis systems, such as term rewriting systems, relational calculus systems, constraint solvers, model checkers, model transformations, etc.

The latter perspective is the reason for the re-branding of rascal-values to vallang. You might consider vallang as a functional replacement for ECore, an alternative to the ATerm library on the JVM, or an alternative to JSON-based noSQL in-memory database systems, or a way of implementing graph databases.

Finally, vallang is a JVM library because that is where we needed it for Rascal and the Eclipse IDE Metatooling Platform. We hope other JVM programmers will also benefit from it and we have no plans of porting it at the moment to any other technological space.

What are the main design considerations of Vallang?

Vallang values are symbolic and immutable.

We think software analysis is complex enough to be confusing to even the most experienced programmers. Manipulating huge stores of hierarchical and relational data about software easily goes wrong; trivial bugs due to aliasing and sharing data between different stages of an analysis or transformation can take weeks to resolve, or worse: will never even be diagnosed.

Since our goal is to provide many more variants of all kind of software analyses, we wish to focus on the interesting algorithmic details rather than the trivial mistakes we make. Therefore, vallang values are immutable. Sharing of values or parts of values is allowed under-the-hood but is not observable. The library is implemented using persistent and/or maximally shared data structures for reasons of efficiency.

Users of vallang freely share references to their data to other parts of an analysis because they know the data can not change due to an unforeseen interaction. We also believe that the immutable values can be shared freely between threads on the JVM, but there are not enough tests yet to make such a bold claim with full confidence.

Vallang values are generated via the AbstractFactory design pattern and do not leak implementation representations

The reason is that client code must abstract from the implementation details to arrive

Vallang

Install / Use

README

Vallang

What is Vallang?

Why does Vallang exist?

What are the main design considerations of Vallang?

Vallang values are symbolic and immutable.

Vallang values are generated via the AbstractFactory design pattern and do not leak implementation representations