LaMachine
LaMachine - A software distribution of our in-house as well as some 3rd party NLP software - Virtual Machine, Docker, or local compilation/installation script
Install / Use
/learn @proycon/LaMachineREADME
IMPORTANT NOTE: LaMachine is end-of-life and deprecated. There will be no further development and its usage is no longer recommended. See this post for reasons and alternative solutions
LaMachine is a unified software distribution for Natural Language Processing. We integrate numerous open-source NLP tools, programming libraries, web-services, and web-applications in a single Virtual Research Environment that can be installed on a wide variety of machines.
The software included in LaMachine tends to be highly specialised and generally depends on a lot of other interdependent software. Installing all this software can be a daunting task, compiling it from scratch even more so. LaMachine attempts to make this process easier by offering pre-built recipes for a wide variety of systems, whether it is on your home computer or whether you are setting up a dedicated production environment, LaMachine will safe you a lot of work.
We address various audiences; the bulk of the software is geared towards data scientists who are not afraid of the command line and some programming. We give you the instruments and it is up to you to yield them. However, we also attempt to accommodate researchers that require more high-level interfaces by incorporating webservices and websites that expose some of the functionality to a larger audience.
Installation
A) Guided installation with custom build option (recommended)
To build your own LaMachine instance, in any of the possible flavours, or to download a pre-built image, open a terminal on your Linux, BSD or MacOS system and run the following command:
bash <(curl -s https://raw.githubusercontent.com/proycon/LaMachine/master/bootstrap.sh)
This will prompt you for some questions on how you would like your LaMachine installation and allows you to include precisely the software you want or need and ensures that all is up to date. A screenshot is shown at the end of this subsection.
Are you on Windows 10 or 2016? Then you need to run this command in the Windows Linux subsystem, we do not support Windows natively. To do this you must first install the Linux Subsystem with a distribution of your choice (we recommend Ubuntu) from the Microsoft Store. Follow the instructions here. Alternatively, you may want to choose for a pre-built Virtual Machine image as explained in installation path C.
Building LaMachine can take quite some time, depending also on your computer's resources, internet connection, and the amount of software you selected to install. Half an hour to an hour is a normal build time. The bootstrap script alternatively also offers the option to download pre-built images (installation path B & C).

B) Pre-built container image for Docker
We regularly build a basic LaMachine image and publish it to Docker Hub. The above installation path A also offers access to this, but you may opt to do it directly:
To download and use it, run:
docker pull proycon/lamachine
docker run -p 8080:80 -h latest -t -i proycon/lamachine
This requires you to already have Docker installed and running on your system.
The pre-built image contains the stable version with only a basic set of common software rather than the full set, run lamachine-add
inside the container to select extra software to install. Alternatively, other specialised LaMachine builds may be available
on Docker Hub.
If you want another release, specify its tag explicitly:
docker pull proycon/lamachine:develop
docker run -p 8080:80 -h develop -t -i proycon/lamachine:develop
C) Pre-built Virtual Machine image for Vagrant (recommended for Windows users)
We regularly build a basic LaMachine image and publish it to the Vagrant Cloud. The above installation path A also offers (simplified) access to this (except on Windows), but you may opt to do it directly.
To download and use a LaMachine prebuilt image:
- Ensure you have Vagrant and VirtualBox installed on your system. Windows users also have to make sure that Hyper-V is disabled in Control Panel → Programs → Turn Windows features on or off → Hyper-V
- Open a terminal or command prompt
- Navigate to a folder of your choice (using
cd); this will be the base folder, files inside will be shared within the VM under/vagrant - Download this example vagrant file into
that same folder. If you are on linux or macOS you can download directly from command line like this:
wget https://raw.githubusercontent.com/proycon/LaMachine/master/Vagrantfile.prebuilt.erb - Run
vagrant init --template Vagrantfile.prebuilt.erb proycon/lamachinefrom the terminal. - Open
Vagrantfilein a text editor and change the memory and CPU options to suit your system (the more resources the better!).- On an up-to-date windows 10 installation (at least version 1809), you can use Notepad as a text editor, but on older Windows versions this won't work and you need a better text editor!
- Run
vagrant upfrom the terminal to boot your VM - Run
vagrant sshfrom the terminal to connect to the VM
The pre-built image contains only a basic set of common software rather than the full set, run lamachine-stable-update --edit
inside the virtual machine to select extra software to install.
To stop the VM when you're done, run: vagrant halt. Next time, navigate to the same base folder in your terminal and run vagrant up and vagrant ssh again.
Included Software
LaMachine includes a wide variety of open-source NLP software. You can select which software you want to include during the installation procedure (or any subsequent update).
- by the Centre of Language and Speech Technology, Radboud University Nijmegen (CLST, RU)
- Timbl - Tilburg Memory Based Learner
- Ucto - Tokenizer
- Frog - Frog is an integration of various memory-based natural language processing (NLP) modules developed for Dutch. It can do Part-of-Speech tagging, lemmatisation, named entity recogniton, shallow parsing, dependency parsing and morphological analysis.
- Mbt - Memory-based Tagger
- Wopr - Memory-based Word Predictor
- FoLiA-tools - Command line tools for working with the FoLiA format
- PyNLPl - Python Natural Language Processing Library
- Colibri Core - Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way.
- C++ libraries - ticcutils, libfolia
- Python bindings - python-ucto, python-frog, python-timbl
- CLAM - Quickly build RESTful webservices
- Gecco - Generic Environment for Context-Aware Correction of Orthography
- Valkuil - A context-aware spelling corrector for Dutch
- Toad - Trainer Of All Data, training tools for Frog
- foliadocserve - FoLiA Document Server
- FLAT - FoLiA Linguistic Annotation Tool
- TICCLTools - Tools that together constitute the bulk of TICCL: Text Induced Corpus-Cleanup.
- PICCL - PICCL: A set of workflows for corpus building through OCR, post-correction (using TICCL) and Natural Language Processing.
- Labirinto - A web-based portal listing all available tools in LaMachine, an ideal starting point for LaMachine
- Oersetter - A Frisian<->Dutch Machine Translation system in collaboration with the Fryske Akademy
- by the University of Groningen
- Alpino - a dependency parser and tagger for Dutch
- by the Vrije Universiteit Amsterdam (VU)
- KafNafParserPy - A python module to parse NAF files
- by Utrecht University (UU)
- T-scan - T-scan is a Dutch text analytics tool for readability prediction (initially developed at TiCC, Tilburg University).
- by Meertens Instituut
- [Python Course for the Humanities](http:/
