SkillAgentSearch skills...

Blackstone

:black_circle: A spaCy pipeline and model for NLP on unstructured legal text.

Install / Use

/learn @ICLRandD/Blackstone
About this skill

Quality Score

0/100

Category

Legal

Supported Platforms

Universal

README

<img src="https://iclr.s3-eu-west-1.amazonaws.com/assets/iclrand/sitecode.svg" width=20%> <img src="https://iclr.s3-eu-west-1.amazonaws.com/assets/iclrand/blackstone_seal.svg" height=75%>

Blackstone Built with spaCy

Blackstone is a spaCy model and library for processing long-form, unstructured legal text. Blackstone is an experimental research project from the Incorporated Council of Law Reporting for England and Wales' research lab, ICLR&D. Blackstone was written by Daniel Hoadley.

Contents

Why are we building Blackstone?

What's special about Blackstone?

Observations and other things worth noting

Installation

    Install the library

    Install the Blackstone model

About the model

    The pipeline

    Named-Entity Recogniser

    Text categoriser

Usage

    Applying the NER model

        Visualising entities

    Applying the text categoriser model

Custom pipeline extensions

    Abbreviation and long-form definition resolution

    Compound case reference detections

    Legislation linker

    Sentence segmenter

Why are we building Blackstone?

The past several years have seen a surge in activity at the intersection of law and technology. However, in the United Kingdom, the overwhelming bulk of that activity has taken place in law firms and other commercial contexts. The consequence of this is that despite the never ending flurry of development in the legal-informatics space, almost none of the research is made available on an open-source basis.

Moreover, the majoritry of research in the UK legal-informatics domain (whether open or closed) has focussed on the development of NLP applications for automating contracts and other legal documents that are transactional in nature. This is understandable, because the principal benefactors of legal NLP research in the UK are law firms and law firms tend not to find it difficult to get their hands on transactional documentation that can be harnessed as training data.

The problem, as we see it, is that legal NLP research in the UK has become over concentrated on commercial applications and that it is worthwhile making the investment in developing legal NLP research available with respect to other legal texts, such as judgments, scholarly articles, skeleton arguments and pleadings.

What's special about Blackstone?

  • So far as we are aware, Blackstone is the first open source model specifically trained for use on long-form texts containing common law entities and concepts.
  • Blackstone is built on spaCy, which makes it easy to pick up and apply to your own data.
  • Blackstone has been trained on data spanning a considerable temporal period (as early as texts drafted in the 1860s). This is useful because an interesting quirk of the common law is that older writings (particularly, judgments) go on to remain relevant for many, many years.
  • It is free and open source
  • It is imperfect and makes no attempt to hide that fact from you

Observations and other things worth noting:

  • Perfection is the enemy of the good. This is a prototype release of a highly experimental project. As such, the accuracy of Blackstone's models leaves something to be desired (F1 on the NER is approx 70%). The accuracy of these models will improve over time.
  • The models have been trained on English case law and the library has been built with the peculiarities of the legal system of England and Wales in mind. That said, the model has generalised well and should do a reasonably good job on Australasian, Canadian and American content, too.
  • The data used to train Blackstone's models was derived from the Incorporated Council of Law Reporting for England and Wales' archive of case reports and unreported judgments. That archive is proprietary and this prevents us from releasing any of the data used to train Blackstone.
  • Blackstone is not a judge or litigation analytics tool.

Installation

Note! It is strongly recommended that you install Blackstone into a virtual environment! See here for more on virtual environments. Blackstone should compatible with Python 3.6 and higher.

To install Blackstone follow these steps:

1. Install the library

The first step is to install the library, which at present contains a handful of custom spaCy components. Install the library like so:

pip install blackstone

2. Install the Blackstone model

The second step is to install the spaCy model. Install the model like so:

pip install https://blackstone-model.s3-eu-west-1.amazonaws.com/en_blackstone_proto-0.0.1.tar.gz

Installing from source

If you are developing Blackstone, you can install from source like so:

pip install --editable .
pip install -r dev-requirements.txt

About the model

This is the very first release of Blackstone and the model is best viewed as a prototype; it is rough around the edges and represents the first step in a larger ongoing programme of open source research into NLP on legal texts being carried out by ICLR&D.

With that out of the way, here's a brief rundown of what's happening in the proto model.

The pipeline

The proto model included in this release has the following elements in its pipeline:

<img src="https://iclr.s3-eu-west-1.amazonaws.com/assets/iclrand/Blackstone/blackstone_pipeline.svg">

Owing to a scarcity of labelled part-of-speech and dependency training data for legal text, the tokenizer, tagger and parser pipeline components have been taken from spaCy's en_core_web_sm model. By and large, these components appear to a do a decent job, but it would be good to revisit these components with custom training data at some point in the future.

The ner and textcat components are custom components trained especially for Blackstone.

Named-Entity Recogniser

The NER component of the Blackstone model has been trained to detect the following entity types:

| Ent | Name | Examples | | ------------- |-------------| -----:| | CASENAME | Case names | e.g. Smith v Jones, In re Jones, In Jones' case | | CITATION | Citations (unique identifiers for reported and unreported cases) | e.g. (2002) 2 Cr App R 123 | | INSTRUMENT | Written legal instruments | e.g. Theft Act 1968, European Convention on Human Rights, CPR | | PROVISION | Unit within a written legal instrument | e.g. section 1, art 2(3) | | COURT | Court or tribunal | e.g. Court of Appeal, Upper Tribunal | | JUDGE | References to judges | e.g. Eady J, Lord Bingham of Cornhill |

Text Categoriser

This release of Blackstone also comes with a text categoriser. In contrast with the NER component (which has been trainined to identify tokens and series of tokens of interest), the text categoriser classifies longer spans of text, such as sentences.

The Text Categoriser has been trained to classify text according to one of five mutually exclusive categories, which are as follows:

| Cat | Description | | ------------- |-------------| | AXIOM | The text appears to postulate a well-established principle | | CONCLUSION | The text appears to make a finding, holding, determination or conclusion | | ISSUE | The text appears to discuss an issue or question |
| LEGAL_TEST | The test appears to discuss a legal test |
| UNCAT | The text does not fall into one of the four categories above |

Usage

Applying the NER model

Here's an example of how the model is applied to some text taken from para 31 of the Divisional Court's judgment in R (Miller) v Secretary of State for Exiting the European Union (Birnie intervening) [2017] UKSC 5; [2018] AC 61:

import spacy

# Load the model
nlp = spacy.load("en_blackstone_proto")

text = """ 31 As we shall explain in more detail in examining the submission of the Secretary of State (see paras 77 and following), it is the Secretary of State’s case that nothing has been done by Parliament in the European Communities Act 1972 or any other statute to remove the prerogative power of the Cro

Related Skills

View on GitHub
GitHub Stars680
CategoryLegal
Updated8d ago
Forks107

Languages

Python

Security Score

100/100

Audited on Mar 26, 2026

No findings