Extractacy
Spacy pipeline object for extracting values that correspond to a named entity (e.g., birth dates, account numbers, laboratory results)
Install / Use
/learn @jenojp/ExtractacyREADME
extractacy - pattern extraction and named entity linking for spaCy
spaCy pipeline object for extracting values that correspond to a named entity (e.g., birth dates, account numbers, or laboratory results)
Installation and usage
Install the library.
pip install extractacy
Import library and spaCy.
import spacy
from spacy.pipeline import EntityRuler
from extractacy.extract import ValueExtractor
Load spacy language model. Set up an EntityRuler for the example.
nlp = spacy.load("en_core_web_sm")
# Set up entity ruler
ruler = nlp.add_pipe("entity_ruler")
patterns = [
{"label": "TEMP_READING", "pattern": [{"LOWER": "temperature"}]},
{"label": "TEMP_READING", "pattern": [{"LOWER": "temp"}]},
{
"label": "DISCHARGE_DATE",
"pattern": [{"LOWER": "discharge"}, {"LOWER": "date"}],
},
]
ruler.add_patterns(patterns)
Define which entities you would like to link patterns to. Each entity needs 3 things:
- patterns to search for (list). This relies on spaCy token matching syntax.
- n_tokens to search around a named entity (
intorsent) - direction (
right,left,both)
# Define ent_patterns for value extraction
ent_patterns = {
"DISCHARGE_DATE": {"patterns": [[{"SHAPE": "dd/dd/dddd"}],[{"SHAPE": "dd/d/dddd"}]],"n": 2, "direction": "right"},
"TEMP_READING": {"patterns": [[
{"LIKE_NUM": True},
{"LOWER": {"IN": ["f", "c", "farenheit", "celcius", "centigrade", "degrees"]}
},
]
],
"n": "sent",
"direction": "both"
},
}
Add ValueExtractor to spaCy processing pipeline
nlp.add_pipe("valext", config={"ent_patterns":ent_patterns}, last=True)
doc = nlp("Discharge Date: 11/15/2008. Patient had temp reading of 102.6 degrees.")
for e in doc.ents:
if e._.value_extract:
print(e.text, e.label_, e._.value_extract)
## Discharge Date DISCHARGE_DATE 11/15/2008
## temp reading TEMP_READING 102.6 degrees
Contributing
Authors
- Jeno Pizarro
License
Related Skills
node-connect
346.8kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
107.6kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
346.8kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
346.8kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
