SemaTyP

The source code and data of our paper: "SemaTyP: a knowledge graph based literature mining method for drug discovery"

Generate Convert Improve

Install / Use

/learn @ShengtianSang/SemaTyP

About this skill

Quality Score

0/100

README

SemaTyP: a knowledge graph based literature mining method for drug discovery

This is the source code and data for the task of drug discovery as described in our paper: "SemaTyP: a knowledge graph based literature mining method for drug discovery"

Requirements

scikit-learn
numpy
tqdm

Data

In order to use the code, you have to provide

Theraputic Target Database You don't need to download by yourself, I have uploaded all the TTD 2016 version in <./data/TTD>.
SemedDB You need to download from here with password:1234 to obtain the whole knowledge graph. After downloading the "predications.txt" file, please replace the file <./data/SemedDB/predications.txt>. with this new downloaded file.

Run the codes

Install the environment.

pip install -r requirements.txt

Construct training and test data.

python experimental_data.py

Train and test the model.

python main.py

Illustration of feature selection

<div align=center><img width="800" height="300" src="https://github.com/ShengtianSang/SemaTyP/blob/main/figures/Illustration_of_Feature_selection.jpg"/></div> <p align="center"> An illustration of the features constructed in our work. </p>

File declaration

data/SemmedDB： contains all relations extracted from SemmedDB, which are used for constructing the Knowledge Graph in our experiment. The whole "predications.txt" contains 39,133,975 relations, we just leave a small sample "predications.txt" file here which contain 100 relation. The whole "predications.txt" file coule be downloaded from

data/TTD： contains the drug, target and disease relations retrieved from Theraputic Target Database.

experimental_data.py: constuct the drug-target-disease associations from TTD and Knowledge Graph.

knowledge_graph.py: construct the Knowledge Graph used in our experiment.

data_loader.py：used to load traing and test data.

main.py：used to train and test the models

Cite

Please cite our paper if you use this code in your own work:

@article{sang2018sematyp,
  title={SemaTyP: a knowledge graph based literature mining method for drug discovery},
  author={Sang, Shengtian and Yang, Zhihao and Wang, Lei and Liu, Xiaoxia and Lin, Hongfei and Wang, Jian},
  journal={BMC bioinformatics},
  volume={19},
  number={1},
  pages={1--11},
  year={2018},
  publisher={Springer}
}

Related Skills

node-connect

351.2k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

110.6k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

351.2k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

351.2k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。