Acropolis
A toolkit for building knowledge graphs
Install / Use
/learn @bbcarchdev/AcropolisREADME
Acropolis
The Research & Education Space software stack.
![Current build status][travis] ![Apache 2.0 licensed][license] ![Implemented in C][language] ![Follow @RES_Project][twitter]
This “umbrella” project has been assembled to make it easier to maintain and run tests on the all of the individual components which make up the Acropolis stack.
This software was developed as part of the Research & Education Space project and is actively maintained by a development team within BBC Design and Engineering. We hope you’ll find this project useful!
Table of Contents
- Requirements
- Using Acropolis
- Bugs and feature requests
- Building from source
- Automated builds
- Contributing
- Information for BBC Staff
- License
Requirements
In order to build Acropolis in its entirety, you will need:—
- A working build environment, including a C compiler such as Clang, along with Autoconf, Automake and Libtool
- The Jansson JSON library
- libfcgi
- The client libraries for PostgreSQL and/or MySQL; it may be possible to use MariaDB’s libraries in place of MySQL’s, but this is untested
- libcurl
- libxml2
- The Redland RDF library (a more up-to-date version than your operating system provides may be required for some components to function correctly)
Optionally, you may also wish to install:—
- Qpid Proton, if you wish to use AMQP messaging
- CUnit, to be able to compile some of the tests
xsltprocand the DocBook 5 XSL Stylesheets
On a Debian-based system, the following should install all of the necessary dependencies:
$ sudo apt-get install -qq libjansson-dev libmysqlclient-dev libpq-dev libqpid-proton-dev libcurl4-gnutls-dev libxml2-dev librdf0-dev libltdl-dev uuid-dev libfcgi-dev automake autoconf libtool pkg-config libcunit1-ncurses-dev build-essential clang xsltproc docbook-xsl-ns
Acropolis has not yet been ported to non-Unix-like environments, and will install as shared libraries and on macOS rather than a framework.
Much of it ought to build inside Cygwin on Windows, but this is untested.
Contributions for building properly with Visual Studio or Xcode, and so on, are welcome (provided they do not significantly complicate the standard build logic).
Using Acropolis
Once you have built and installed the Acropolis stack, you probably want to do something with it.
Acropolis consists of a number of different individual components, including libraries, command-line tools, web-based servers, and back-end daemons. They are:—
-
liburi: a library for parsing URIs
-
libsql: a library for accessing SQL databases
-
liblod: a library for interacting with Linked Data servers
-
libmq: a library for interacting with message queues
-
libcluster: a library for implementing load-balancing clusters
-
libawsclient: a library for access services which use Amazon Web Services authentication
-
Anansi: a web crawler
-
Twine: an RDF workflow engine
-
Quilt: a Linked Data web server (via FastCGI)
-
Spindle: a Linked Data indexing engine
-
twine-anansi-bridge: a plug-in module for Twine which allows it to retrieve resources for processing from Anansi’s cache
Note that this repository exists for development and testing purposes only: in a production environment, each component is packaged and deployed individually.
Components
Anansi
Anansi is a web crawler. It uses a relational database to track URLs that will be fetched, their status, and cache IDs. Anansi can operate in resizeable clusters of up to 256 nodes via libcluster.
Anansi has the notion of a processor —a named implementation of the "business logic" of evaluating resources that have been retrieved and using them to add new entries to the queue.
In the Research & Education Space, Anansi is configured to use the lod (Linked Open Data) processor, which:
- Only accepts resources which can be parsed by
librdf - Can apply licensing checks (based upon a configured list)
- Adds any URIs that it finds in retrieved RDF to the crawl queue (allowing spidering to actually occur)
Twine
Twine is a modular RDF-oriented processing engine. It can be configured to do a number of different things, but its main purpose is to fetch some data, convert it if necessary, perform some processing, and then put it somewhere.
Twine can operate as a daemon, which will continuously fetch data to process from a queue of some kind (see libmq), or it can be invoked from the command-line to ingest data directly from a file.
Twine is extended through two different kinds of loadable modules (which reside in ${libdir}/twine, by default /opt/res/lib/twine):
- Handlers are responsible for taking some input and populating an RDF model based upon it. The mechanism is flexible enough to allow, for example, the input data to be a set of URLs which the module should fetch and parse, but it's also used for data conversion.
- Processors are modules which can manipulate the RDF model in some way, including dealing with storage and output.
Twine ships with a number of modules for interacting with SPARQL servers, XML data ingest via XSLT transform, as well as parsing and outputting RDF. More information can be found in the Twine README.
Twine is always configured with a workflow: a list of processors which should be invoked in turn for each item of data being processed. Like all configuration options, the workflow can be specified on the command-line.
In the Research & Education Space, the Spindle project provides additional Twine modules which implement the key logic of the platform.
Quilt
Quilt is a Linked Data server designed to efficiently serve RDF data in a variety of serialisations, including templated HTML. Like Twine, Quilt is modular (see ${libdir}/quilt), and in particular modules are used to provide engine implementations—these are the code responsible for populating an RDF model based upon the request parameters (Quilt itself then handles the serving of that model). The Spindle project includes a Quilt module which implements the Research & Education Space public API.
Spindle
Spindle is the core of the Research & Education Space. It includes three processor modules for Twine:
spindle-stripuses a rulebase to decide which triples in an RDF model should be retained and which should be discarded.spindle-correlateprocesses graphs in conjunction with a SPARQL server (and optionally a PostgreSQL database) in order to aggregate co-references: the result is distinct RDF descriptions about the same things are clustered together.spindle-generateperforms indexing of RDF data, including dealing with media resource licensing, and is responsible for generating "proxy" sets of triples which summarise the Research & Education Space's interpretation of the various pieces of source data about each thing they describe.
It also includes a module for Quilt, which uses the data from spindle-correlate and spindle-generate in order to provide the Research & Education Space API.
Running the stack
Annotated configuration files are provided in the config directory which should help get you started. By default, the components expect to find these files in /opt/res/etc, but this can be altered by specifying the --prefix or --sysconfdir options when invoking the top-level configure script.
Requirements
You will need:
- A PostgreSQL database server, with databases for Anansi and Spindle (you do not need to define any tables or views, the schemas will be generated automatically)
- A RADOS or FakeS3 bucket for Anansi and one for Spindle. Note that pending resolution of
libawsclient#1, you can no longer use a real Amazon S3 bucket for storage. - A SPARQL server, such as 4store.
In production, the Research & Education Space uses PostgreSQL, RADOS, and 4store. It has been successfully used in development environments with FakeS3 and alternative SPARQL servers.
Running Anansi
Important! Do not run the Anansi daemon (crawld) without first carefully checking the configuration to ensure that it doesn’t simply start crawling the web unchecked. If you're using the `l
Related Skills
node-connect
353.3kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
111.7kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
353.3kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
353.3kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
