Slate

A Super-Lightweight Annotation Tool for Experts: Label text in a terminal with just Python

Generate Convert Improve

Install / Use

/learn @jkkummerfeld/Slate

About this skill

Quality Score

0/100

README

This is a tool for labeling text documents. Slate supports annotation at different scales (spans of characters, tokens, and lines, or a document) and of different types (free text, labels, and links). This covers a range of tasks, such as Part-of-Speech tagging, Named Entity Recognition, Text Classification (including Sentiment Analysis), Discourse Structure, and more.

Why use this tool over the range of other text annotation tools out there?

Fast
Trivial installation
Focuses all of the screen space on annotation (good for large fonts)
Terminal based, so it works in constrained environments (e.g. only allowed ssh access to a machine)
Not difficult to configure and modify

Note - this repository is not for the "Segment and Link-based Annotation Tool, Enhanced", which was first presented at LREC 2010. See 'Citing' below for additional notes on that work.

Installation

Two options:

1. Install with pip

pip install slate-nlp

Then run from any directory in one of two ways:

slate
python -m slate

2. Or download and run without installing

Either download as a zip file:

curl https://codeload.github.com/jkkummerfeld/slate/zip/master -o "slate.zip"
unzip slate.zip
cd slate-master

Or clone the repository:

git clone https://github.com/jkkummerfeld/slate
cd slate

Then run with either of:

python slate.py
./slate.py

To run from another directory, use:

python PATH_TO_SLATE/slate.py
PATH_TO_SLATE/slate.py

Requirements

The code requires only Python (2 or 3) and can be run out of the box. Your terminal must be at least 80 characters wide and 20 tall to use the tool.

Citing

If you use this tool in your work, please cite:

@InProceedings{acl19slate,
  title     = {SLATE: A Super-Lightweight Annotation Tool for Experts},
  author    = {Jonathan K. Kummerfeld},
  booktitle = {Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations},
  location  = {Florence, Italy},
  month     = {July},
  year      = {2019},
  pages     = {7--12},
  doi       = {10.18653/v1/P19-3002},
  url       = {https://aclweb.org/anthology/papers/P/P19/P19-3002/},
  software  = {https://jkk.name/slate},
}

While presenting this work at ACL I learned of another annotation tool called SLATE. That tool was first described in "Annotation Process Management Revisited", Kaplan et al. (LREC 2010) and then in "Slate - A Tool for Creating and Maintaining Annotated Corpora", Kaplan et al. (JLCL 2011). It takes a very different approach, using a web based interface that includes a suite of project management tools as well as annotation.

Getting Started

Note: if you used pip to install, reaplce python slate.py with slate everywhere below.

Run python slate.py <filename> to start annotating <filename> with labels over spans of tokens. The entire interface is contained in your terminal, there is no GUI. With command line arguments you can vary properties such as the type of annotation (labels or links) and scope of annotation (characters, tokens, lines, documents).

The input file should be plain text, organised however you like. Prepare the data with your favourite sentence splitting and/or tokenisation software (e.g., SpaCy). If you use Python 3 then unicode should be supported, but the code has not been tested extensively with non-English text (please share any issues!).

When you start the tool it displays a set of core commands by default. These are also specified below, along with additional commands.

The tool saves annotations in a separate file (<filename>.annotations by default, this can be varied with a file list as described below). Annotation files are formatted with one line per annotated item. The item is specified with a tuple of numbers. For labels, the item is followed by a hyphen and the list of labels. For links, there are two items on the line before the hyphen. For example, these are two annotation files, one for labels of token spans and the other for links between lines:

==> label.annotations <==
(2, 1) - label:a
((3, 5), (3, 8)) - label:a
(7, 8) - label:s label:a

==> link.annotations <==
13 0 - 
13 7 - 
16 7 -

A few notes:

The second label annotation is on a span of tokens, going from 5 to 8 on line 3.
The third label annotation has two labels.
The line annotations only have one number to specify the item.
When the same line is linked to multiple other lines, each link is a separate item.

Tutorials

Included in this repository are a set of interactive tutorials that teach you how to use the tool from within the tool itself.

Task | Command ---- | -------- Named Entity Recognition annotation | python slate.py tutorial/ner.md -t categorical -s token -o -c ner-book.config -l log.tutorial.ner.txt -sl -sm Labelling spans of text in a document | python slate.py tutorial/label.md -t categorical -s token -o -l log.tutorial.label.txt Linking lines in a document | python slate.py tutorial/link.md -t link -s line -o -l log.tutorial.link.txt

Example Workflow

This tool has already been used for two annotation efforts involving multiple annotators (Durrett et al., 2017 and Kummerfeld et al., 2018). Our workflow was as follows:

Create a repository containing (1) the annotation guide, (2) the data to be annotated divided into user-specific folders.
Each annotator downloaded slate and used it to do their annotations and commit the files to the repository.
Either the whole group or the project leader went through files that were annotated by multiple people, using the adjudication mode in the tool.

Comparing Annotations

The tool supports displaying annotations for the purpose of adjudicating disagreements. There are two steps involved. Specifically, you can request that a set of other annotation files be read. Then, whenever one of those annotation files includes something that your current adjudication does not, the text is shown in red.

Data list file creation

A data list file contains a series of lines in the format:

raw_file [output_file [cur_position [other_annotations]]]

For example, this line says there is a raw text file my-book.txt, that the adjudications should be saved in annotations-adjudicated.txt, that annotation should start at the very start of my-book.txt and that there are three existing annotations to be compared:

my-book.txt annotations-adjudicated.txt ((0, 0), (0, 0)) my-book.txt.annotations1 my-book.txt.annotations2 my-book.txt.annotations3

Note: you can have as many "other_annotation" files as you want.

Run slate with the data list file

Now run slate as follows:

python slate.py -d data-list-file [any other arguments]

Example

The tutorial folder contains two example data list files:

tutorial/data/list_with_disagreements.category.txt
tutorial/data/list_with_disagreements.link.txt

You can use them as follows:

cd tutorial/data
python ../../slate.py -d list_with_disagreements.category.txt -t categorical -s token

Efficiency Tip

You can save time by putting annotations that all annotators agreed on into the annotations-adjudicated.txt file. This bash pipeline will do that if you replace:

ANNOTATION_FILES with the names of all of your annotation files, separated by spaces
N_FILES with the number of annotation files you have

cat ANNOTATION_FILES | sort | uniq -c | awk -v count=N_FILES '$1 == count' | sed 's/^ *[0-9]* *//' > annotations-adjudicated.txt

Breaking this down, it does the following:

cat ANNOTATION_FILES , print the annotation files in the terminal
sort , sort their contents together
uniq -c , where there are consecutive lines that are the same, only keep one, and also indicate how many times each line occurred
awk -v count=N_FILES '$1 == count' , only keep lines where the number at the start matches N_FILES
sed 's/^ *[0-9]* *//' > annotations-adjudicated.txt, remove the number at the start of the line (placed there by the uniq command)

Detailed Usage Instructions

Invocation options

usage: slate.py [-h] [-d DATA_LIST [DATA_LIST ...]] [-t {categorical,link}]
                [-s {character,token,line,document}] [-c CONFIG_FILE] [-l LOG_PREFIX] [-ld]
                [-sh] [-sl] [-sp] [-sm] [-r] [-o] [-ps] [-pf] [--do-not-show-linked]
                [--alternate-comparisons]
                [data ...]

A tool for annotating text data.

positional arguments:
  data                  Files to be annotated

optional arguments:
  -h, --help            show this help message and exit
  -d DATA_LIST [DATA_LIST ...], --data-list DATA_LIST [DATA_LIST ...]
                        Files containing lists of files to be annotated
  -t {categorical,link}, --ann-type {categorical,link}
                        The type of annotation being done.
  -s {character,token,line,document}, --ann-scope {character,token,line,document}
                        The scope of annotation being done.
  -c CONFIG_FILE, --config-file CONFIG_FILE
                        A file containing configuration information.
  -l LOG_PREFIX, --log-prefix LOG_PREFIX
                        Prefix for logging files
  -ld, --log-debug      Provide detailed logging.
  -sh, --show-help      Show help on startup.
  -sl, --show-legend    Start with legend showing.
  -sp, --show-progress  Start with progress showing.
  -sm, --show-mark      Start with mark showing.
  -r, --readonly        Do not allow changes or save annotations.
  -o, --overwrite       If they exist already, read and overwrite output files.
  -ps, --prevent-self-links
                        Prevent an item

Related Skills

node-connect

349.9k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

claude-opus-4-5-migration

109.8k

Migrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5

frontend-design

109.8k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

model-usage

349.9k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

jkkummerfeld

View profile

View on GitHub

GitHub Stars111

CategoryDevelopment

Updated1d ago

Forks15

jkkummerfeld/slate

Languages

Python

Security Score

100/100

Audited on Apr 5, 2026

No findings