Atomspace
The OpenCog (hyper-)graph database and graph rewriting system
Install / Use
/learn @opencog/AtomspaceREADME
OpenCog AtomSpace
The OpenCog AtomSpace is an in-RAM knowledge representation (KR) database with an associated query engine and graph-re-writing system. It is a kind of in-RAM generalized hypergraph (metagraph) database. Metagraphs offer more efficient, more flexible and more powerful ways of representing graphs: a metagraph store is literally just-plain better than a graph store. On top of this, the Atomspace provides a large variety of advanced features not available anywhere else.
The AtomSpace is a platform for building Artificial General Intelligence (AGI) systems. It provides the central knowledge representation component for OpenCog. As such, it is a fairly mature component, on which a lot of other systems are built, and which depend on it for stable, correct operation in a day-to-day production environment.
There are several dozen modules built on top of the AtomSpace. Notable ones include:
- Store AtomSpaces to disk
- Network-distributed AtomSpace storage
- Network shell to AtomSpaces, including a WebSocket API
- Sparse Vector/Matrix embeddings/access to graphs
- Sensori-motor research
- Language learning
Data as MetaGraphs
It is now commonplace to represent data as graphs; there are more graph databases than you can shake a stick at. What makes the AtomSpace different? A dozen features that no other graph DB does, or has even dreamed of doing.
But, first: five things everyone else does:
- Perform graphical database queries, returning results that satisfy a provided search pattern.
- Arbitrarily complex patterns with an arbitrary number of variable regions can be specified, by unifying multiple clauses.
- Modify searches with conditionals, such as "greater than", and with user callbacks into scheme, python or Haskell.
- Perform graph rewriting: use search results to create new graphs.
- Trigger execution of user callbacks... or of executable graphs (as explained below).
A key difference: the AtomSpace is a metagraph store, not a graph store. Metagraphs can efficiently represent graphs, but not the other way around. This is carefully explained here, which also gives a precise definition of what a metagraph is, and how it is related to a graph. As a side-effect, metagraphs open up many possibilities not available to ordinary graph databases. These are listed below. Things are things that no one else does:
- Search queries are graphs. (The API to the pattern engine is a graph.) That is, every query, every search is also a graph. That means one can store a collection of searches in the database, and access them later. This allows a graph rule engine to be built up.
- Inverted searches. (DualLink.) Normally, a search is like "asking a question" and "getting an answer". For the inverted search, one "has an answer" and is looking for all "questions" for which its a solution. This is pattern recognition, as opposed to pattern search. All chatbots do this as a matter of course, to handle chat dialog. No chatbot can host arbitrary graph data, or search it. The AtomSpace can. This is because queries are also graphs, and not just data.
- Both "meet" and "join" searches are possible: One can perform a "fill in the blanks" search (a meet, with MeetLink) and one can perform a "what contains this?" search (a join, with JoinLink.)
- Graphs are executable. Graph vertex types include "plus", "times", "greater than" and many other programming constructs. The resulting graphs encode "abstract syntax trees" and the resulting language is called Atomese. It resembles the intermediate representation commonly found in compilers, except that, here, its explicitly exposed to the user as a storable, queriable, manipulable, executable graph.
- Graphs are typed (TypeNode and type constructors.) Graph elements have types, and there are half a dozen type constructors, including types for graphs that are functions. This resembles programming systems that have type constructors, such as CaML or Haskell.
- Graph nodes carry vectors Values are mutable vectors of data. Each graph element (vertex or edge, node or link) can host an arbitrary collection of Values. This is, each graph element is also a key-value database.
- Graphs specify flows Values can be static or dynamic. For the dynamic case, a given graph can be thought of as "pipes" or "plumbing"; the Values can "flow" along that graph. For example, the FormulaStream allows numeric vector operations ("formulas") to be defined. Accessing a FormulaStream provides the vector value at that instant.
- Unordered sets (UnorderedLink.) A graph vertex can be an unordered set (Think of a list of edges, but they are not in any fixed order.) When searching for a matching pattern, one must consider all permutations of the set. This is easy, if the search has only one unordered set. This is hard, if they are nested and inter-linked: it becomes a constraint-satisfaction problem. The AtomSpace pattern engine handles all of these cases correctly.
- Alternative sub-patterns (ChoiceLink.) A search query can include a menu of sub-patterns to be matched. Such sets of alternatives can be nested and composed arbitrarily. (i.e. they can contain variables, etc.)
- Globby matching (GlobNode.) One can match zero, one or more subgraphs with globs This is similar to the idea of globbing in a regex. Thus, a variable need not be grounded by only one subgraph: a variable can be grounded by an indeterminate range of subgraphs.
- Quotations (QuoteLink.) Executable graphs can be quoted. This is similar to quotations in functional programming languages. In this case, it allows queries to search for other queries, without triggering the query that was searched for. Handy for rule-engines that use rules to find other rules.
- Negation as failure (AbsentLink.) Reject matches to subgraphs having particular sub-patterns in them. That is, find all graphs of some shape, except those having these other sub-shapes.
- For-all predicate (AlwaysLink.) Require that all matches contain a particular subgraph or satisfy a particular predicate. For example: find all baskets that have only red balls in them. This requires not only finding the baskets, making sure they have balls in them, but also testing each and every ball in a basket to make sure they are all of the same color.
- Frames (ChangeSets) Store a sequence of graph rewrites, changes of values as a single changeset. The database itself is a collection of such changesets or "Frames". Very roughly, a changeset resembles a git commit, but for the graph database. The word "Frame" is mean to invoke the idea of a stackframe, or a Kripke frame: the graph state, at this moment. By storing frames, it is possible to revert to earlier graph state. It is possible to compare different branches and to explore different rewrite histories starting from the same base graph. Different branches may be merged, forming a set-union of thier contents. This is useful for inference and learning algos, which explore long chains of large, complex graph rewrites.
What it Isn't
Newcomers often struggle with the AtomSpace, because they bring preconceived notions of what they think it should be, and its not that. So, a few things it is not.
-
It's not JSON. So JSON is a perfectly good way of representing structured data. JSON records data as
key:valuepairs, arranged hierarchically, with braces, or as lists, with square brackets. The AtomSpace is similar, except that there are no keys! The AtomSpace still organizes data hierarchically, and provides lists, but all entries are anonymous, nameless. Why? There are performance (CPU and RAM usage) and other design benefits in not using explicit named keys in the data structure. You can still have named values; it is just that they are not required. There are several different ways of importing JSON data into the AtomSpace. If your mental model of "data" is JSON, then you will be confused by the AtomSpace. -
It's not SQL. It's also not noSQL. Databases from 50 years ago organized structured data into tables, where the
keyis the label of a column, and differentvaluessit in different rows. This is more efficient than JSON, when you have many rows: you don't have to store the same key over and over again, for each row. Of course, tabular data is impractical if you have zillions of tables, each with
Related Skills
feishu-drive
344.1k|
things-mac
344.1kManage Things 3 via the `things` CLI on macOS (add/update projects+todos via URL scheme; read/search/list from the local Things database)
clawhub
344.1kUse the ClawHub CLI to search, install, update, and publish agent skills from clawhub.com
postkit
PostgreSQL-native identity, configuration, metering, and job queues. SQL functions that work with any language or driver
