Affilparser
Conditional Random Field (CRF) Parser for Affiliation String in MEDLINE and Pubmed OA
Install / Use
/learn @titipata/AffilparserREADME
Affiliation Parser
Python Conditional Random Field (CRF) Parser for Affiliation String in MEDLINE and Pubmed OA. We implement the parser using python-crfsuite. See this example on how to implement.
Usage
You can use parse method from AffiliationParser class to parse affilition string.
from affilparser import AffiliationParser
text = """
Department of Agricultural and Biosystems Engineering, Iowa State University, Ames, IA 50011-3080, USA;
Department of Energy, Power Engineering and Environment, Faculty of Mechanical Engineering and Naval Architecture, University of Zagreb, Ivana Lucica 5, HR-10000 Zagreb, Croatia;
Department of Civil, Construction and Environmental Engineering, Iowa State University, Ames, IA 50011-3232, USA.
"""
parser = AffiliationParser()
parsed_affil = parser.parse(text)
Training dataset
We obtained 190k parsed affiliations string in following format
<aff xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML">
<institution>Department of Emergency Medicine Hennepin County Medical Center</institution>
<addr-line>Minneapolis, MN</addr-line>
</aff>
from Pubmed Open-Access subset using pubmed_parser.
We did some preprocessing to make it into tokens in (text, postag, label) format before training using
Conditional Random Field. Example of the training for one affiliation string is as follows.
[('Department', 'PROPN', 'department'),
('of', 'ADP', 'department'),
('Orthopaedics', 'PROPN', 'department'),
(',', 'PUNCT', 'unknown'),
('Chonnam', 'PROPN', 'institution'),
('National', 'PROPN', 'institution'),
('University', 'PROPN', 'institution'),
('Medical', 'PROPN', 'institution'),
('School', 'PROPN', 'institution'),
('and', 'CCONJ', 'institution'),
('Hospital', 'PROPN', 'institution'),
(',', 'PUNCT', 'unknown'),
('Gwangju', 'PROPN', 'addr-line'),
(',', 'PUNCT', 'unknown'),
('South', 'PROPN', 'country'),
('Korea', 'PROPN', 'country')]
We also made the dataset available in JSON format that you can download here.
Installation
Clone the repository and install using setup.py
git clone https://github.com/titipata/affilparser
cd affilparser
python setup.py install
Requirements
- pycrfsuite
- spacy with English corpus
Citation
If you use this package, please cite it like this
Titipat Achakulvisut, Daniel E. Acuna (2017) "Affiliation Parser" https://github.com/titipata/affilparser
Related Skills
node-connect
346.8kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
107.6kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
346.8kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
346.8kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
