GDP
Group lasso based Deep Neural Network for Cancer Prognosis
Install / Use
/learn @WGLab/GDPREADME
GDP
GDP stands for Group lasso regularized Deep learning for cancer Prognosis from multi-omics and clinical features, and it is a python package based on tensorflow to do survival analysis for the cancer survival data with high-dimensional features but smaller sample size.
PREREQUISITES
- Python 2.7
- Google Tensorflow 1.4.1
- Linux environment
- CUDA (only for GPU version, skip it if using CPU)
- lifelines (https://lifelines.readthedocs.io/en/latest/Quickstart.html)
Quick Start
Train GDP model
$bash run_train.sh
or under SGE high-performance cluster
$bash qsub_run_train.sh
Make prediction
$bash run_prediction.sh
or under SGE high-performance cluster
$bash qsub_run_prediction.sh
USAGE
Training GDP model
usage: train_model.py [-h] [--initial_learning_rate INITIAL_LEARNING_RATE]
[--alpha ALPHA] [--scale SCALE] [--delta DELTA]
[--reg_type REG_TYPE] [--activation ACTIVATION]
[--max_steps MAX_STEPS] [--hidden_nodes HIDDEN_NODES]
[--batch_size BATCH_SIZE] [--eval_steps EVAL_STEPS]
[--dropout_keep_rate DROPOUT_KEEP_RATE]
[--skip_eval SKIP_EVAL] [--model MODEL]
[--log_dir LOG_DIR] [--train_dir TRAIN_DIR]
[--train_file TRAIN_FILE]
[--evaluation_file EVALUATION_FILE]
[--evaluation_cindex_file EVALUATION_CINDEX_FILE]
[--output_evaluation_cindex OUTPUT_EVALUATION_CINDEX]
[--evaluation_dir EVALUATION_DIR]
optional arguments:
-h, --help show this help message and exit
--initial_learning_rate INITIAL_LEARNING_RATE
Initial learning rate.
--alpha ALPHA alpha for group regularizer proportion control
--scale SCALE Lambda scale for regularization
--delta DELTA delta value used for log transformation in partial
likelihood calculation
--reg_type REG_TYPE types of regularization (available: lasso, l2,
group_lasso, sparse_group_lasso,none)
--activation ACTIVATION
activation function (relu, sigmoid or tanh)
--max_steps MAX_STEPS
Number of steps to run trainer.
--hidden_nodes HIDDEN_NODES
Number of nodes in each hidden layer.
--batch_size BATCH_SIZE
Batch size. Must divide evenly into the dataset sizes.
--eval_steps EVAL_STEPS
number of steps used for evaluation average score
calculation
--dropout_keep_rate DROPOUT_KEEP_RATE
dropout keep rate(default:0.6)
--skip_eval SKIP_EVAL
YES or NO for if skipping evaluation
--model MODEL model used for the training, NN: neural network,
linear: linear regression (alpha should be set to 1
for group lasso)
--log_dir LOG_DIR Directory to put the log data.
--train_dir TRAIN_DIR
Directory for the input file to do model
training,validation and testing
--train_file TRAIN_FILE
input file for model training , validation and testing
--evaluation_file EVALUATION_FILE
evaluation scores output file(cost, cindex changes in
each epoch after one batch of data feeding)
--evaluation_cindex_file EVALUATION_CINDEX_FILE
evaluation cindex output file(cindex changes in each
training)
--output_evaluation_cindex OUTPUT_EVALUATION_CINDEX
whether or not to output evaluation cindex detail
information, default(NO), other choice YES
--evaluation_dir EVALUATION_DIR
evaluation scores output directory(cost, cindex
changes in each epoch after one batch of data feeding)
Making survival hazard prediction
usage: gdp_prediction.py [-h] [--reg_type REG_TYPE] [--activation ACTIVATION]
[--hidden_nodes HIDDEN_NODES]
[--saver_file_dir SAVER_FILE_DIR]
[--saver_file_prefix SAVER_FILE_PREFIX]
[--model MODEL] [--prediction_dir PREDICTION_DIR]
[--prediction_file PREDICTION_FILE]
[--out_dir OUT_DIR] [--out_file OUT_FILE]
optional arguments:
-h, --help show this help message and exit
--reg_type REG_TYPE types of regularization (available: lasso, l2,
group_lasso, sparse_group_lasso,none)
--activation ACTIVATION
activation function (relu, sigmoid or tanh)
--hidden_nodes HIDDEN_NODES
Number of nodes in each hidden layer.
--saver_file_dir SAVER_FILE_DIR
Directory to put the files with saved variables.
--saver_file_prefix SAVER_FILE_PREFIX
prefix of the saver files (saver files used for
reloading the model for prediction)
--model MODEL model used for the training, NN: neural network,
linear: linear regression (alpha should be set to 1
for group lasso)
--prediction_dir PREDICTION_DIR
Directory for the input file to do model
training,validation and testing
--prediction_file PREDICTION_FILE
input file for making prediction,the sample size in
this file should equal to the batch size in the
training process
--out_dir OUT_DIR output file directory
--out_file OUT_FILE output file name
About Input Data
simple example
Here is a simple example with 2 groups of features and 10 patients input file:
1,1,1,2,2,2
X1,X2,X3,X4,X5,X6,days,censors
-1.85902783053298,0.0994494352091435,-0.807210866434911,0.303156984244015,-0.275110805413045,0.579571367421607,27.7573213813725,1
0.86859158824812,-0.618129783705067,-0.308273526302284,-0.86749748596954,-1.19636100811427,-0.745102422373332,118.705992121249,1
-0.525185612462805,0.241140196288319,-1.01799171699816,0.991736132807852,-1.29180454442748,0.101100728946668,47.2236119511256,0
0.651322038444394,0.624104844298306,0.74730472763471,0.815543899943677,0.771588304339387,-0.466081795713416,195.641828371416,0
1.09407882660999,-0.0184001549027565,-0.0283117031875402,-0.27400025250167,-1.02635165546504,-1.65681317376197,8.72296970337629,1
0.797017366622877,0.647989168058519,1.96239849878692,0.65190924214759,-0.154590911151089,-1.35643031260092,103.915377632488,1
-0.827356228955125,-1.03067954961316,1.32470461704247,1.21053565312543,0.189932779563431,0.0174347878628153,23.692097151167,1
-0.431529861637166,-0.167016525491645,-0.205556633566519,-1.12039115892178,-0.502304556662159,0.410028189118801,85.330219194293,1
-0.558018708249103,0.832222076104534,0.346448789650193,0.384576276352321,0.856890192421964,-1.05987085048325,104.801233112812,1
-0.481907510681125,0.644392832802829,0.762831469086699,1.09207762024784,-0.799866311327226,0.42173466089875,172.114941469374,1

first line: providing group prior knowledge
eg: 1,1,1,2,2,2
The first line is the group prior knowledge line with group information provided as numbers. In this example, there are 4 groups,
and the group size of the 1 group is 2, and the group size for other groups (2,3,4) is 3 for each.
second line: header information for the name of features and days and censors
eg: X1,X2,X3,X4,X5,X6,days,censors
other lines: feature matrix + survival days and censoring information
eg: 0.1,1,1.5,2,2,3.3,3,3,9.2,2.1,1
For censors column: if 1 then means the survival date of the patient is censored, otherwise non-censored.
about the input file for making prediction
If the survival data and survival censoring status are not availabe for prediction data, then random numbers can be added to make the format consistent with the format used in the training process. And in order to load the model based on the training, the number of patients (samples) in this prediction input file should equal to the batch size used in the training process.
common error solution
CXXABI_1.3.9 import error
ImportError: /lib64/libstdc++.so.6: version 'CXXABI_1.3.9' not found (required by /home/gangcai/bin/miniconda2/lib/python2.7/site-packages/scipy/sparse/_sparsetools.so)
You can export your library path, and if you use conda, you can export by:
export LD_LIBRARY_PATH="/home/gangcai/bin/miniconda2/lib:$LD_LIBRARY_PATH"
Reference
Xie G, Dong C, Kong Y, Zhong JF, Li M, Wang K. GDP: Group lasso regularized Deep learning for cancer Prognosis from multi-omics and clinical features. Genes, 10(3):240, 2019
Related Skills
node-connect
344.4kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
99.2kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
344.4kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
344.4kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
