HAP

Code for "Hierarchical Attention Propagation for Healthcare Representation Learning", KDD 2020.

Generate Convert Improve

Install / Use

/learn @muhanzhang/HAP

About this skill

Quality Score

0/100

README

Instructions

Hierarchical Attention Propagation (HAP) is a medical ontology embedding framework which generalizes GRAM by hierarchically propagating attention across the entire ontology structure, where a medical concept adaptively learns its embedding from all other concepts in the hierarchy instead of only its ancestors.

For more information, please check our paper:

M. Zhang, C. King, M. Avidan, and Y. Chen, Hierarchical Attention Propagation for Healthcare Representation Learning, Proc. ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD-20), 2020. [PDF]

Code Description

Like GRAM, the code trains an RNN (Gated Recurrent Units) to predict, at each timestep (i.e. visit), the diagnosis/procedure codes occurring in the next visit. The code uses Multi-level Clinical Classification Software for ICD-9-CM as the domain knowledge.

Running HAP

STEP 1: Installation

Install python, Theano. We use Python 2.7, Theano 0.8.2. Theano can be easily installed in Ubuntu as suggested here
If you plan to use GPU computation, install CUDA

STEP 2: Run on MIMIC-III

You will first need to request access for MIMIC-III, a publicly avaiable electronic health records collected from ICU patients over 11 years.
You can use "process_mimic.py" located in "data/mimic3/" to process MIMIC-III dataset and generate a suitable training dataset for HAP. Place the script to the same location where the MIMIC-III CSV files are located, and run the script with:
```
 python process_mimic.py ADMISSIONS.csv DIAGNOSES_ICD.csv mimic
```
More instructions are described inside the script. You may use the already processed files included in "data/mimic3/"; otherwise, please copy your generated "mimic.*" files to "data/mimic3/".
Use "build_trees.py" in "data/mimic3/" to build files that contain the ancestor information of each medical code. This requires "ccs_multi_dx_tool_2015.csv" (Multi-level CCS for ICD9), which can be downloaded from here. We also include it in "data/mimic3/".

Run "build_trees.py" with:
```
 python build_trees.py ccs_multi_dx_tool_2015.csv mimic.seqs mimic.types remap
```
Running this script will re-map integer codes assigned to all medical codes. Therefore you also need the ".seqs" file and the ".types" file created by process_mimc.py. The execution command is python build_trees.py ccs_multi_dx_tool_2015.csv <seqs file> <types file> <output path>. This will build five files "remap.level#.pk" and a "remap.p2c" which contain level information and parent to children mapping extracted from the hierarchy. This will replace the old "mimic.seqs" and "mimic.types" files with the correct ones.
Run HAP using the "remap.seqs" and "remap.p2c" files generated by "build_trees.py". The ".seqs" file contains the sequence of visits for each patient. Each visit consists of multiple diagnosis codes. The command is:
```
 python hap.py data/mimic3/ remap.seqs remap.seqs remap result/mimic3/HAP/ --p2c_file remap.p2c --sep_attention --L2 0 --n_epochs 50 
```
More commands for generating the experimental results are contained in "run_mimic.sh".

STEP 3: How to pretrain the code embedding

For sequential diagnoses prediction, it is very effective to pretrain the code embeddings with some co-occurrence based algorithm such as word2vec or GloVe To pretrain the code embeddings with GloVe, do the following:

Use "create_glove_comap.py" with ".seqs" file, which is generated by "build_trees.py". The execution command is:
```
 python create_glove_comap.py remap.seqs remap
```
This will create a file "cooccurrenceMap.pk" that contains the co-occurrence information of codes and ancestors.
Use "glove.py" on the co-occurrence file generated by "create_glove_comap.py". The execution command is:
```
 python glove.py cooccurrenceMap.pk remap pretrained_embedding
```
Use the pretrained embeddings when you train HAP by appending "--embed_file pretrained_embedding.npz" to your command.

Reference

If you find the code useful, please cite our paper:

@inproceedings{zhang2020hierarchical,
  title={Hierarchical Attention Propagation for Healthcare Representation Learning},
  author={Zhang, Muhan and King, Christopher R and Avidan, Michael and Chen, Yixin},
  booktitle={Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery \& Data Mining},
  pages={249--256},
  year={2020}
}

Muhan Zhang, Washington University in St. Louis muhan@wustl.edu 11/2/2020

Related Skills

YC-Killer

2.7k

A library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.

groundhog

398

Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).

last30days-skill

13.8k

AI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary

000-main-rules

Project Context - Name: Interactive Developer Portfolio - Stack: Next.js (App Router), TypeScript, React, Tailwind CSS, Three.js - Architecture: Component-driven UI with a strict separation of conce

muhanzhang

View profile

View on GitHub

GitHub Stars8

CategoryEducation

Updated9mo ago

Forks1

muhanzhang/HAP

Languages

Python

Security Score

77/100

Audited on Jun 17, 2025

No findings