SkillAgentSearch skills...

Affilparser

Conditional Random Field (CRF) Parser for Affiliation String in MEDLINE and Pubmed OA

Install / Use

/learn @titipata/Affilparser
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Affiliation Parser

Python Conditional Random Field (CRF) Parser for Affiliation String in MEDLINE and Pubmed OA. We implement the parser using python-crfsuite. See this example on how to implement.

Usage

You can use parse method from AffiliationParser class to parse affilition string.

from affilparser import AffiliationParser

text = """
  Department of Agricultural and Biosystems Engineering, Iowa State University, Ames, IA 50011-3080, USA;
  Department of Energy, Power Engineering and Environment, Faculty of Mechanical Engineering and Naval Architecture, University of Zagreb, Ivana Lucica 5, HR-10000 Zagreb, Croatia;
  Department of Civil, Construction and Environmental Engineering, Iowa State University, Ames, IA 50011-3232, USA.
"""

parser = AffiliationParser()
parsed_affil = parser.parse(text)

Training dataset

We obtained 190k parsed affiliations string in following format

<aff xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML">
  <institution>Department of Emergency Medicine Hennepin County Medical Center</institution>
  <addr-line>Minneapolis, MN</addr-line>
</aff>

from Pubmed Open-Access subset using pubmed_parser. We did some preprocessing to make it into tokens in (text, postag, label) format before training using Conditional Random Field. Example of the training for one affiliation string is as follows.

[('Department', 'PROPN', 'department'),
 ('of', 'ADP', 'department'),
 ('Orthopaedics', 'PROPN', 'department'),
 (',', 'PUNCT', 'unknown'),
 ('Chonnam', 'PROPN', 'institution'),
 ('National', 'PROPN', 'institution'),
 ('University', 'PROPN', 'institution'),
 ('Medical', 'PROPN', 'institution'),
 ('School', 'PROPN', 'institution'),
 ('and', 'CCONJ', 'institution'),
 ('Hospital', 'PROPN', 'institution'),
 (',', 'PUNCT', 'unknown'),
 ('Gwangju', 'PROPN', 'addr-line'),
 (',', 'PUNCT', 'unknown'),
 ('South', 'PROPN', 'country'),
 ('Korea', 'PROPN', 'country')]

We also made the dataset available in JSON format that you can download here.

Installation

Clone the repository and install using setup.py

git clone https://github.com/titipata/affilparser
cd affilparser
python setup.py install

Requirements

Citation

If you use this package, please cite it like this

Titipat Achakulvisut, Daniel E. Acuna (2017) "Affiliation Parser" https://github.com/titipata/affilparser

Related Skills

View on GitHub
GitHub Stars13
CategoryDevelopment
Updated2y ago
Forks3

Languages

Python

Security Score

60/100

Audited on Aug 9, 2023

No findings