Spacyopentapioca

A spaCy wrapper of OpenTapioca for named entity linking on Wikidata

Generate Convert Improve

Install / Use

/learn @UB-Mannheim/Spacyopentapioca

About this skill

Quality Score

0/100

README

spaCyOpenTapioca

A spaCy wrapper of OpenTapioca for named entity linking on Wikidata.

Installation
How to use
Local OpenTapioca
Vizualization

Installation

pip install spacyopentapioca

git clone https://github.com/UB-Mannheim/spacyopentapioca
cd spacyopentapioca/
pip install .

How to use

After installation the OpenTapioca pipeline can be used without any other pipelines:

import spacy
nlp = spacy.blank("en")
nlp.add_pipe('opentapioca', config={"verify": False})
doc = nlp("Christian Drosten works in Germany.")
for span in doc.ents:
    print((span.text, span.kb_id_, span.label_, span._.description, span._.score))

('Christian Drosten', 'Q1079331', 'PERSON', 'German virologist and university teacher', 3.6533377082098895)
('Germany', 'Q183', 'LOC', 'sovereign state in Central Europe', 2.1099332471902863)

Note the optional verify parameter of config defaults to True. If the URL is not secure this parameter must be False for the pipeline to work. For example, the default URL is not secure and requires this parameter to be False. See https://urllib3.readthedocs.io/en/latest/advanced-usage.html#tls-warnings for more details.

The types and aliases are also available:

for span in doc.ents:
    print((span._.types, span._.aliases[0:5]))

({'Q43229': False, 'Q618123': False, 'Q5': True, 'P2427': False, 'P1566': False, 'P496': True}, ['كريستيان دروستين', 'Крістіан Дростен', 'Christian Heinrich Maria Drosten', 'کریستین دروستن', '크리스티안 드로스텐'])
({'Q43229': True, 'Q618123': True, 'Q5': False, 'P2427': False, 'P1566': True, 'P496': False}, ['IJalimani', 'R. F. A.', 'Alemania', '도이칠란트', 'Germaniya'])

The Wikidata QIDs are attached to tokens:

for token in doc:
    print((token.text, token.ent_kb_id_))

('Christian', 'Q1079331')
('Drosten', 'Q1079331')
('works', '')
('in', '')
('Germany', 'Q183')
('.', '')

The raw response of the OpenTapioca API can be accessed in the doc- and span-objects:

raw_annotations1 = doc._.annotations
raw_annotations2 = [span._.annotations for span in doc.ents]

The partial metadata for the response returned by the OpenTapioca API is

doc._.metadata

All span-extensions are:

span._.annotations
span._.description
span._.aliases
span._.rank
span._.score
span._.types
span._.label
span._.extra_aliases
span._.nb_sitelinks
span._.nb_statements

Note that spaCyOpenTapioca does a tiny processing of entities appearing in doc.ents. All entities returned by OpenTapioca can be found in doc.spans['all_entities_opentapioca'].

Batching

Batched asynchronous requests to the OpenTapioca API via nlp.pipe(List[str]):

import spacy
nlp = spacy.blank("en")
nlp.add_pipe('opentapioca', config={"verify": False})
docs = nlp.pipe(
    [
        "Christian Drosten works in Germany.",
        "Momofuku Ando was born in Japan.".
    ]
)
for doc in docs:
    for span in doc.ents:
        print((span.text, span.kb_id_, span.label_, span._.description, span._.score))

('Christian Drosten', 'Q1079331', 'PERSON', 'German virologist and university teacher', 3.6533377082098895)
('Germany', 'Q183', 'LOC', 'sovereign state in Central Europe', 2.1099332471902863)
('Momofuku Ando', 'Q317858', 'PERSON', 'Taiwanese-Japanese businessman', 3.6012208212234302)
('Japan', 'Q17', 'LOC', 'sovereign state in East Asia, situated on an archipelago of five main and over 6,800 smaller islands', 2.349944834167907)

Local OpenTapioca

If OpenTapioca is deployed locally, specify the URL of the new OpenTapioca API in the config:

import spacy
nlp = spacy.blank("en")
nlp.add_pipe('opentapioca', config={"url": OpenTapiocaAPI, "verify": False})
doc = nlp("Christian Drosten works in Germany.")

Vizualization

NEL vizualization is added to spaCy via pull request 9199 for issue 9129. It is supported by spaCy >= 3.1.4.

Use manual option in displaCy:

import spacy
nlp = spacy.blank("en")
nlp.add_pipe('opentapioca', config={"verify": False})
doc = nlp("Christian Drosten works\n in Charité, Germany.")
params = {"text": doc.text,
          "ents": [{"start": ent.start_char,
                    "end": ent.end_char,
                    "label": ent.label_,
                    "kb_id": ent.kb_id_,
                    "kb_url": "https://www.wikidata.org/entity/" + ent.kb_id_}
                   for ent in doc.ents],
          "title": None}
spacy.displacy.serve(params, style="ent", manual=True)

The visualizer is serving on http://0.0.0.0:5000

alt text

In Jupyter Notebook replace spacy.displacy.serve by spacy.displacy.render.

Related Skills

node-connect

350.8k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

110.4k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

350.8k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

350.8k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。

UB-Mannheim

View profile

View on GitHub

GitHub Stars96

CategoryDevelopment

Updated2mo ago

Forks9

UB-Mannheim/spacyopentapioca

Languages

Python

Security Score

100/100

Audited on Feb 5, 2026

No findings

Spacyopentapioca

Install / Use

README

spaCyOpenTapioca

Table of contents

Installation

How to use

Batching

Local OpenTapioca

Vizualization

Related Skills