Relex
English Dependency Relationship Extractor
Install / Use
/learn @opencog/RelexREADME
RelEx Semantic Relation Extractor
Version 1.6.3 circa 2016
RelEx is a dependency parser for the English language. It extracts dependency relations from Link Grammar, and adds some shallow semantic analysis. The primary use of RelEx is as a language input front-end to the OpenCog artificial general intelligence system.
There are multiple inter-related parts to RelEx. The core component extracts the dependency relationships. An experimental module provides some simple anaphora resolution suggestions. Output is provided in various formats, including one format suitable for later batch post-processing, another format suitable for input to OpenCog, and an W3C OWL format. There are also a small assortment of perl scripts for cleaning up web and wiki pages, &c.
The main RelEx website is at
http://wiki.opencog.org/w/RelEx
It provides an overview of the project, as well as detailed documentation.
The source code management system is at
http://github.com/opencog/relex
Source tarballs may be downloaded from either of two locations:
https://launchpad.net/relex/+download
http://www.abisource.com/downloads/link-grammar/relex/
Build and install of the core package is discussed below.
Running the Relex Servers
Run via Docker
The easiest way to run RelEx is with Docker. The Docker system allows sandboxed containers to be easily created and deployed; the typical use of a container is to run some server. See the http://www.docker.io website for more info and tutorials.
Opencog has prebuilt images for relex available with the image tag: opencog/relex
Running the Plain-text Server
To have docker run the plain text server, type into a terminal:
$ docker run -it -p 3333:3333 opencog/relex /bin/sh plain-text-server.sh
To test the plain text server via telnet, type into another terminal:
telnet localhost 3333
This is a test sentence!
The server will return a plain-text analysis of the input sentence and disconnect the session.
Running the OpenCog-format Server
To have docker run the OpenCog format server, type:
$ docker run -it -p 4444:4444 opencog/relex /bin/sh opencog-server.sh
To test the OpenCog format server via telnet, type into another terminal:
telnet localhost 4444
This is a test sentence!
The server will return an OpenCog/Scheme version of the parse and disconnect the session.
Running the raw Link Grammar Server
To have docker run the raw link-grammar JSON-format server, type:
$ docker run -it -p 9000:9000 opencog/relex /bin/sh link-grammar-server.sh
You can now access the relex server with telnet.
The raw link-grammar server expects a JSON-formatted input, begining
with the 5 letters text: it returns a JSON-formatted response.
To test the link-grammar JSON format server via telnet, type into another terminal:
telnet localhost 9000
text:This is a test sentence!
This will return a JSON formatted parse and then disconnect the session.
Docker Cheat-Sheet
A docker cheat-sheet:
docker ps
docker ps -a
docker rm
docker images
docker rmi
Installation
Installing on Ubuntu/Debian
An installation script for Ubuntu/Debian is provided in the install-scripts directory.
Installing on all other systems
For other systems, follow the instructions below. To build and use RelEx, the following packages are required to be installed:
- libgetopt-java (GNU getopt)
- Link Parser
- WordNet 3.0
- JWNL Java wordnet library
- OpenNLP tools (optional, but recommended)
- W3C OWL (optional)
Pre-requisite dependencies
The following packages are required pre-requisites for building RelEx.
-
Link Grammar Parser. Compile and install the Link Grammar Parser. This parser is described at
http://abisource.com/projects/link-grammar/
and sources are available for download at
http://www.abisource.com/projects/link-grammar/#download
Link-grammar version 5.2.1 or later is needed to obtain a variety of required fixes.
The Link Grammar Parser is the underlying engine, providing the core sentence parsing ability.
If the parser is not installed in the default location, be sure to modify
-Djava.library.pathappropriately inrelation-extractor.shand other shell scripts. -
GNU getopt. This is a standard command-line option parsing library. For Ubuntu, install the
libgetopt-javapackage. -
Wordnet. Wordnet is used by RelEx to provide basic English morphology analysis, such as singular versions of (plural) nouns, base forms (lemmas) of adjectives, adverbs and infinitive forms of verbs.
Download, unpack and install WordNet 3.0. The install directory needs to be specified in
data/wordnet/file_properties.xml, with thename="dictionary_path"property in this file.Some typical install locations are:
/opt/WordNet-3.0/datafor RedHat and SuSE/usr/share/wordnetfor Ubuntu and DebianC:\Program Files\WordNet\3.0\datafor Windows
The
relex/Morphy/Morphy.javaclass provides a simple, easy-to-use wrapper around wordnet, providing the needed word morphology info.
The following packages are required pre-requisites for building RelEx. Note, that they are automatically installed if Maven system is used.
-
didion.jwnl. The didion JWNL is the "Java WordNet Library", and provides the Java programming API to access the wordnet data files. Its home page is at
http://sourceforge.net/projects/jwordnet
and can be downloaded from
http://sourceforge.net/project/showfiles.php?group_id=33824
Verify that the final installed location of
jwnl.jaris correctly specified in thebuild.xmlfile. Note that GATE also provides ajwnl.jar, but the GATE version ofjwnl.jaris not compatible (welcome to java DLL hell).When copying
jwnl.jar: verify the file permisions! Be sure to issue the following command:chmod 644 jwnl.jar, as otherwise, you'll get strange "java cannot unzip jar" error messages. -
Apache Commons Logging. The JWNL package requires that the Apache commons logging jar file be installed. In Debian/Ubuntu, this is supplied by the
libcommons-logging-javapackage. In RedHat/CentOS systems, the package name isjakarta-commons-logging. -
SLF4J and Logback. RelEx uses SLF4J as a facade for the Logback logging framework. SLF4J home pages is at
https://www.slf4j.org
and can be downloaded from
https://www.slf4j.org/download.html
Logback home pages is at
https://logback.qos.ch
and can be downloaded from
https://logback.qos.ch/download.html
Optional packages
The following packages are optional. If they are found, then additional parts of RelEx will be built, enabling additional function.
If you use Maven, these dependencies are already managed.
-
OpenNLP. RelEx uses OpenNLP for sentence detection, giving RelEx the ability to find sentence boundaries in free text. If OpenNLP is not found, then the (far) less accurate
java.text.BreakIteratorclass is used. Although Oracle documentation states that "Sentence boundary analysis allows selection with correct interpretation of periods within numbers and abbreviations", this is patently false, as it incorrectly breaks the sentence "Dr. Smith is late." into two sentences. Thus, OpenNLP is recommended.The OpenNLP home page is at
http://opennlp.sourceforge.net/
Download and install OpenNLP tools, and verify that the installed files are correctly identified in both
build.xmland inrelation-extractor.sh.OpenNLP also requires the installation of maxent from
http://maxent.sourceforge.net/
You'll need
maxent-3.0.0.jarandopennlp-tools-1.5.3.jar.The OpenNLP package is used solely in corpus/DocSplitter.java, which provides a simple, easy-to-use wrapper for splitting a document into sentences. Replace this file if an alternate sentence detector is desired.
-
Trove. Some users may require the GNU Trove to enable OpenNLP, although this depends on the JDK installed. GNU Trove is an implementation of the java.util class hierarchy, which may or may not be included in the installed JDK. If needed, download trove from:
http://trove4j.sourceforge.net/
Since trove is optimized, using it may improve performance and/or decrease memory usage, as compared to the standard Sun JDK implementation of the java.util hierarchy.
IMPORTANT OpenNLP expects Gnu Trove version 1.0, and will not work with version 2.0 !!
Building
With Maven
Maven manages almost all of dependencies automatically. Only exception is Link Grammar library which should be added into local maven repository manually, using:
mvn install:install-file \
-Dfile=<linkgrammar-jar-folder/linkgrammar.jar> \
-DgroupId=org.opencog \
-DartifactId=linkgrammar \
-Dversion=<linkgrammar.version> \
-Dpackaging=jar
Then you can build and install relex.jar using:
mvn install
Using RelEx
It is assumed that RelEx will be used in one of two different ways. These are in a "batch processing" mode, and a "custom Java development" mode.
In the "batch processing mode", RelEx is run once over a large text, and its output is saved to a file. This output can then be post-processed at a later time, to extract desired info. The goal here is to avoid the heavy CPU overhead of re-parsing a large text over and over. Example post-processing scripts are included (described below).
In the "custom Java development" mode, it is assumed that a capable
Java programmer can write new code to interface RelEx to meet their needs.
A good place to start is to review the workings of the output code in
src/java/relex/output/*.java.
The standard RelEx demo output is NOT SUITABLE for post-processing. It is meant to be a human-readable example of what the system gen
