Example random network

Network Generation

Graphical networks are generic: natural language parse trees are graphs. So are the chains of axioms and logical deductions in a mathematical proof. So are plans for movement and motion through 3D space, as well as plans for accomplishing some more abstract action: planning a party or planning a coordinated disaster relief effort. Biological reaction pathways, involving genes, proteins and other molecules, are graphical networks as well. So are social networks. What these all have in common is an underlying set of allowed (or preferred) attachments, moves, changes or state transitions, as well as hard and soft constraints on what attachments, moves are possible.

The goal of the code in this repo is to generate such networks, given a (weighted) collection of constraints and/or transition rules and/or a syntactical (grammatical) description of the system. More precisely, it is to effect a transformation from a complex of initial states into a complex of final states, with the initial and final complexes possibly being of fundamentally different types. In between the start and end is a networked web of interconnected moves, transitions, constraints. In essence, this is as a constraint satisfaction solver which allows probabilistic weighting and guiding, and allows the general problem to be described as a collection of syntactical rules or moves (AKA rules of inference, axioms, judgments).

Currently focused on natural language generation; however, the theory is generic, and is intended for use on any deduction or induction problem. The image above was generated with this example code; it shows a random network, with all links and nodes being of the same type, and the node arity (number of connections) being Zipfian-distributed (that's why most nodes have only one connection, and only one has six connections).

Sheaves, germs and syntax

To accomplish the above, snapshots or instances of the interacting system are assumed to be describable as a graph (a generic directed, labeled, typed, weighted graph, possibly containing loops). Such generic graphs can be decomposed into pieces, which can be roughly envisioned as jigsaw-puzzle pieces, having mating connectors. In the typical case, each vertex in the graph corresponds to a jigsaw puzzle-piece, with "half-edges" or connectors connecting it to it's nearest neighbors. The pieces may be all alike, or all different, limited or unlimited in number, having connectors of only a few, or of many different types. In general, the pieces are NOT 2D, but of arbitrary dimension (that is, there may or may not be constraints on the crossing of links).

This vertex plus half-edge description of a graph is so general that it encodes a very large collection of concepts from mathematics and the sciences. The graphical interpretation allows concepts from cellular homology to be applied, most importantly, those of sheaf theory. In effect, the rules for assembling jigsaw puzzle pieces are precisely the sheaf gluing axioms. The pairing together of connectors resembles the contraction of raised/lowered indexes in a tensor algebra. That is, individual jigsaw-puzzle pieces are instances of tensors. Since a partially-assembled jigsaw looks like a particularly complicated single piece, this description applies to any category-theoretic category with a tensor product.

One particularly notable analogy to mating together a pair of connectors is that of beta-reduction (in lambda calculus and functional programming languages). The pair is of typed-variable to value! That is, one of the connectors can be taken as a variable (in a function or lambda expression) and the other connector is any "value" that can be substituted in for that variable. The "shapes" of the connectors enforce the requirement that the type of the value must correspond to the type of the variable. Unlike the variable/value distinction, thinking in terms of connectors makes beta-reduction more obviously symmetric. One can either apply a function to some values, or one can insert some values into a function: these are effectively the same, and there is no longer any particular necessity to distinguish between these two operations.

The puzzle-piece analogy is sufficient to capture and encode the concept of syntax, in all of it's fullness. For example, the Backus-Nauer Form (BNF) of (context-free) syntax rules are readily encoded as jigsaw-shapes: the production arrow is the piece itself; the expression on the left is just a connector; the term or expression on the right is just a conjunction of one or more connectors. A disjunction of terms on the right correspond to a disjunction of jigsaw-pieces: for a given connector tab on the left, one chooses one-and-only-one on the right. (This is consistent with the sheaf-gluing axioms, and resembles the rules of linear logic: viz. the no-cloning and the no-deleting theorems of dagger-compact categories.)

The grammar of natural language can also be encoded as jigsaw pieces; the foundational papers of the Link Grammar theory of natural language explicitly present parsing as the joining-together of jigsaw pieces.

In light of all of this, one can roughly say that the jigsaw piece analogy is the natural setting for the extension of Curry-Howard correspondence to a broad range of settings.

Parsing and generation

Graph generation via syntax rules can be understood as a search of a problem space. However, unlike traditional path search, where the start and end-points are of the same general kind (e.g. traveling salesman) here, path is transformative, and the start and endpoints might be of entirely different classes. For example, in (natural language) parsing, one starts with a linear sequence of words, and concludes with a parse tree. In natural language generation, one begins with a bag of semantic concepts, and generates a word-sequence capturing that semantics. The act of parsing or generation bridges between these two representations. The bridge itself is not a single ordered path or chain, but is also a network of connections and constraints that must be satisfied.

Parsing and generation can be thought of as a special case of the more general act of theorem-proving. In theorem-proving, one has a collection of axioms and inference rules (represented as jigsaw pieces!) together with a hypothesis (a collection of unconnected connectors), with the goal of a proof being to arrive at an appropriate assembly culminating in a "boundary" of "true" (all remaining unconnected connectors can be joined to a "true" jigsaw-piece, or rather, being all-true connectors). Unlike parsing, in theorem-proving, the jigsaw pieces can be freely chosen (as long as they fit together). In parsing, the jigsaw pieces must correspond to the words in a sentences, and the word-order must be preserved.

Markov logic and Bayesian blankets

Theorem proving (and generally, constraint satisfaction) are known hard problems, and so, defacto this is what this project is aiming at. All the pitfalls apply, including combinatoric explosion, algorithmic heuristics, etc. On top of all of this, a very explicit aim is that each of the jigsaw pieces and connectors are weighted, so that certain kinds of assemblies are preferred over others. Were it not for this weighting, one might consider using existing constraint satisfaction solvers. For example, Answer-Set Programming (ASP) is a particularly convenient way of specifying discrete, crisp-logic constraints. More generally, one might even consider SAT solvers, or maybe even Satisfiability Modules Theories (SMT) solvers. Once on throws in weightings, none of the existing SMT solvers appear to be adequate for the job. Thus, this project.

The goal of saying "weighting" is to avoid saying "probability". In reality (based on long experience) the weights do resemble log-probabilities, in that it is very convenient to assign a single preference score that is a sum of weights. There are some strong theoretical arguments that the correct weighting is the Boltzmann distribution, and thus, the assembled jigsaw pieces form a Markov blanket. Insofar as the jigsaw pieces can encode logic, the resulting blankets can be taken to be Markov Logic blankets. Again, the formation, discovery and data-mining of such blankets is a known-hard problem, filled to the brim with combinatoric explosions, and eminently lacking in general tractable algorithms. C'est la vie.

Unlike Markov logic, there is no presumption here that the jigsaw pieces are logical expressions. In particular, there is no constraint that the network be directed and acyclic. Thus, again, MLN solvers and Bayesian network solvers (belief networks, decision networks) are insufficient to tackle the problem at hand. Although the jigsaw connectors can have a polarity (and thus connections can be directed), there is no presumed constraint that the resulting graph is acyclic or directed (or planar, or has any other long-range properties that aren't encoded in the connector scheme.)

Strategic Development Plan

The above provides an very abstract description of the project, and emphasizes just how daunting it is. The abstract goals are not achievable in a handful of programmer-years; it is potentially some dozens if not hundreds of scientist-years of research effort. Thus, a narrower, tighter focus is needed.

Characterizing learning systems

This project is motivated by a need to characterize and evaluate unsupervised learning systems, specifically, learning systems for natural language. The normal course for such projects is to obtain a natural language corpus, usually in English, that has been an

Generate

Install / Use

README