HAP
Code for "Hierarchical Attention Propagation for Healthcare Representation Learning", KDD 2020.
Install / Use
/learn @muhanzhang/HAPREADME
Instructions
Hierarchical Attention Propagation (HAP) is a medical ontology embedding framework which generalizes GRAM by hierarchically propagating attention across the entire ontology structure, where a medical concept adaptively learns its embedding from all other concepts in the hierarchy instead of only its ancestors.
For more information, please check our paper:
M. Zhang, C. King, M. Avidan, and Y. Chen, Hierarchical Attention Propagation for Healthcare Representation Learning, Proc. ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD-20), 2020. [PDF]
Code Description
Like GRAM, the code trains an RNN (Gated Recurrent Units) to predict, at each timestep (i.e. visit), the diagnosis/procedure codes occurring in the next visit. The code uses Multi-level Clinical Classification Software for ICD-9-CM as the domain knowledge.
Running HAP
STEP 1: Installation
-
Install python, Theano. We use Python 2.7, Theano 0.8.2. Theano can be easily installed in Ubuntu as suggested here
-
If you plan to use GPU computation, install CUDA
STEP 2: Run on MIMIC-III
-
You will first need to request access for MIMIC-III, a publicly avaiable electronic health records collected from ICU patients over 11 years.
-
You can use "process_mimic.py" located in "data/mimic3/" to process MIMIC-III dataset and generate a suitable training dataset for HAP. Place the script to the same location where the MIMIC-III CSV files are located, and run the script with:
python process_mimic.py ADMISSIONS.csv DIAGNOSES_ICD.csv mimicMore instructions are described inside the script. You may use the already processed files included in "data/mimic3/"; otherwise, please copy your generated "mimic.*" files to "data/mimic3/".
-
Use "build_trees.py" in "data/mimic3/" to build files that contain the ancestor information of each medical code. This requires "ccs_multi_dx_tool_2015.csv" (Multi-level CCS for ICD9), which can be downloaded from here. We also include it in "data/mimic3/".
Run "build_trees.py" with:
python build_trees.py ccs_multi_dx_tool_2015.csv mimic.seqs mimic.types remapRunning this script will re-map integer codes assigned to all medical codes. Therefore you also need the ".seqs" file and the ".types" file created by process_mimc.py. The execution command is
python build_trees.py ccs_multi_dx_tool_2015.csv <seqs file> <types file> <output path>. This will build five files "remap.level#.pk" and a "remap.p2c" which contain level information and parent to children mapping extracted from the hierarchy. This will replace the old "mimic.seqs" and "mimic.types" files with the correct ones. -
Run HAP using the "remap.seqs" and "remap.p2c" files generated by "build_trees.py". The ".seqs" file contains the sequence of visits for each patient. Each visit consists of multiple diagnosis codes. The command is:
python hap.py data/mimic3/ remap.seqs remap.seqs remap result/mimic3/HAP/ --p2c_file remap.p2c --sep_attention --L2 0 --n_epochs 50More commands for generating the experimental results are contained in "run_mimic.sh".
STEP 3: How to pretrain the code embedding
For sequential diagnoses prediction, it is very effective to pretrain the code embeddings with some co-occurrence based algorithm such as word2vec or GloVe To pretrain the code embeddings with GloVe, do the following:
-
Use "create_glove_comap.py" with ".seqs" file, which is generated by "build_trees.py". The execution command is:
python create_glove_comap.py remap.seqs remapThis will create a file "cooccurrenceMap.pk" that contains the co-occurrence information of codes and ancestors.
-
Use "glove.py" on the co-occurrence file generated by "create_glove_comap.py". The execution command is:
python glove.py cooccurrenceMap.pk remap pretrained_embedding -
Use the pretrained embeddings when you train HAP by appending "--embed_file pretrained_embedding.npz" to your command.
Reference
If you find the code useful, please cite our paper:
@inproceedings{zhang2020hierarchical,
title={Hierarchical Attention Propagation for Healthcare Representation Learning},
author={Zhang, Muhan and King, Christopher R and Avidan, Michael and Chen, Yixin},
booktitle={Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery \& Data Mining},
pages={249--256},
year={2020}
}
Muhan Zhang, Washington University in St. Louis muhan@wustl.edu 11/2/2020
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
groundhog
398Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
last30days-skill
13.8kAI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary
000-main-rules
Project Context - Name: Interactive Developer Portfolio - Stack: Next.js (App Router), TypeScript, React, Tailwind CSS, Three.js - Architecture: Component-driven UI with a strict separation of conce
