DeepMSProfiler
DeepMSProfiler, an innovative data analysis tool focused on liquid chromatography mass spectrometry (LC-MS) data.
Install / Use
/learn @yjdeng9/DeepMSProfilerREADME
DeepMSProfiler
Welcome to DeepMSProfiler, an innovative data analysis tool focused on liquid chromatography-mass spectrometry (LC-MS) data. It harnesses the potential of deep learning to process complex data from different diseases and generate unique disease features.
Contents
Overview
Unlike traditional metabolomics data analysis tools, DeepMSProfiler is a tool for mining global features from raw metabolomics data. It takes raw metabolomics data from different disease groups as input and provides three main outputs:
- Sample disease type labels.
- Heatmaps depicting the correlation of different metabolite signals with diseases.
- Disease-associated metabolite-protein network plots.
Why DeepMSProfiler?
DeepMSProfiler stands out due to the following advantages:
- Superior analysis accuracy.
- Increased efficiency.
- User-friendliness.
- Suitable for both experts and beginners.
System Requirements
Ensure that your system meets these requirements before proceeding with installation and usage.
Hardware Requirements
The DeepMSProfiler package is designed to support development in environments with CUDA capability(CUDA VERSION=10.1), but it is also compatible with CPU-only environments. The hardware requirements are as follows:
- Standard Computer: The meta package requires a standard computer with sufficient RAM to support in-memory operations.
Software Requirements
The DeepMSProfiler package development version has been tested on CentOS 7 but is also compatible with Windows environments. It is essential to ensure that the Python environment and dependencies are properly installed. The software requirements are as follows:
- Operating System Compatibility: CentOS 7 (tested), Windows (compatible)
- Python Version: Python >= 3.6
- TensorFlow Version: TensorFlow == 2.2.0
- Keras Version: Keras == 2.3.1
Python Dependencies
Please refer to the requirements.txt script for installing Python dependencies.
Installation Guide
Install from PyPI:
pip install DeepMSProfiler
https://pypi.org/project/DeepMSProfiler/
Install from source code:
git clone https://github.com/yjdeng9/DeepMSProfiler
cd DeepMSProfiler
bash install_dependencies.sh
Install time: <10 minutes
Usage Guide
You can run DeepMSProfiler using the following command:
python mainRun.py -data ../example/data/ -label ../example/label.txt -out ../jobs -run_train -run_pred -run_feature
We provide a pre-trained model based on 859 serum metabolomics samples (210 healthy individuals, 323 lung nodules, 326 lung cancer) for academic use. Please contact the author for access.
Command Line Arguments:
-
Data Options:
-data: Specifies the path to the raw metabolomics data. Default is../example/data/.-label: Specifies the path to the sample disease type labels file. Default is../example/label.txt.-out: Specifies the directory where the output results will be saved. Default is../jobs.
-
Run Options:
-run_train: Initiates the training process (Boolean, default isFalse).-run_pred: Initiates the prediction process (Boolean, default isFalse).-run_feature: Initiates the feature extraction process (Boolean, default isTrue).
-
Model Parameters:
-arch: Specifies the model architecture, e.g.,'DenseNet121'.-pretrain: Specifies the path to the pre-trained model, default isNone.-lr: Sets the learning rate, default is 1e-4.-opt: Specifies the optimizer, e.g.,'adam'.-batch: Sets the batch size, default is8.-epoch: Sets the number of training epochs, default is2.-run: Specifies the number of runs, default is10.
-
Other Options:
-models: Specifies the models to be used for prediction, e.g.,'use_old'.-mode: Specifies the mode, e.g.,'ensemble'.-boost: Enables boosting mode (Boolean).-plot_auc: Plots the AUC curve (Boolean).-plot_cm: Plots the confusion matrix (Boolean).
Please adjust and expand this explanation based on the specific parameters and usage of your project. This example will help users understand how to run your project and utilize different command-line arguments.
Demo
The demo data can be downloaded from example dir or Baidu Netdisk: https://pan.baidu.com/s/14v82CMsFZwcTI13iWaTWxA, Passward: acaa
The demo files are in. npy format. If you upload a file in. mzML format, and the script will automatically convert to. npy format automatically.
Run with Demo Data (PyPI Version)
from DeepMSProfiler import *
run_train(datalist_path='DeepMSProfiler/example/datalist.txt',data_dir='DeepMSProfiler/example/data',
job_dir='DeepMSProfiler/example/out/jobs007',epoch=2)
run_predict(job_dir = 'DeepMSProfiler/example/out/jobs007',plot_auc=True,plot_cm=True)
run_feature(job_dir='DeepMSProfiler/example/out/jobs007')
show_feature(job_dir='DeepMSProfiler/example/out/jobs007',mode='ensemble')
Run with Demo Data (Github Source Code Version)
python mainRun.py -data ../example/data/ -label ../example/label.txt -out ../jobs -run_train -run_pred -run_feature
Demo Log
[INFO] Start in 2024-02-23 09:38:15
[...]
2024-02-23 09:38:16.529642: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5072 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:02:00.0, compute capability: 6.0)
2024-02-23 09:38:16.532211: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 6799 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:03:00.0, compute capability: 6.0)
2024-02-23 09:40:14.107222: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2024-02-23 09:40:15.990060: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
[INFO] run in linux platform!
[INFO] Start in ../jobs/jobs007! [2024-02-23 09:38:15]
FilePath Label Dataset
0 C01.npy cancer test
1 C02.npy cancer train
2 C03.npy cancer train
3 C04.npy cancer train
4 C05.npy cancer train
5 C06.npy cancer train
6 C07.npy cancer train
7 C08.npy cancer train
8 C09.npy cancer test
9 H01.npy health test
10 H02.npy health train
11 H03.npy health test
12 H04.npy health train
13 H05.npy health test
14 H06.npy health train
15 H07.npy health train
16 H08.npy health train
17 H09.npy health train
18 N01.npy nodule train
19 N02.npy nodule train
20 N03.npy nodule train
21 N04.npy nodule test
22 N05.npy nodule train
23 N06.npy nodule train
24 N07.npy nodule train
25 N08.npy nodule train
26 N09.npy nodule train
[INFO] Step1 Done! [2024-02-23 09:38:16]
[INFO] run model with args: DenseNet121 adam lr 1.0e-04 0
[INFO] run model with args: DenseNet121 adam lr 1.0e-04 1
[INFO] run model with args: DenseNet121 adam lr 1.0e-04 2
[INFO] run model with args: DenseNet121 adam lr 1.0e-04 3
[INFO] run model with args: DenseNet121 adam lr 1.0e-04 4
[INFO] run model with args: DenseNet121 adam lr 1.0e-04 5
[INFO] run model with args: DenseNet121 adam lr 1.0e-04 6
[INFO] run model with args: DenseNet121 adam lr 1.0e-04 7
[INFO] run model with args: DenseNet121 adam lr 1.0e-04 8
[INFO] run model with args: DenseNet121 adam lr 1.0e-04 9
[...]
[INFO] Step2 Train Done! [2024-02-23 10:03:39]
[...]
[INFO] Step2 Pred Done! [2024-02-23 10:09:42]
[INFO] Heatmap Shape: (6, 341, 341)
[INFO] Save Heatmap to: ../jobs/jobs007/feature_results/ensemble_RISE.npy
[INFO] Step2 Feature Done! [2024-02-23 10:25:30]
[INFO] All Done! [2024-02-23 10:25:30]
[INFO] End in 2024-02-23 10:25:30
Run with Pretrain Model
python mainRun.py -data ../example/all_data/ -label ../example/all_label.txt -out ../jobs -run_pred -pretrain ../example/pretrain_model -plot_cm
Show Feature Heatmaps
python mainRun.py -data ../example/data/ -label ../example/label.txt -out ../jobs -run_feature
After run_feature ,the heatmaps were saved in ../jobs/jobs007/feature_results/ensemble_RISE.npy, so we can then show the feature heatmaps for different classes.
python showFeature.py

License
This project is licensed under the Apache License, Version 2.0 and is open for any academic use. Papers related to this project will be submitted, please cite for use and contact the author for data acquisition.
Yongjie Deng - dengyj9@mail2.sysu.edu.cn
Weizhong Li - liweizhong@mail.sysu.edu.cn
Citation
The Paper is published in Nature Communications.
For usage of the package and associated manuscript, please cite: [**Deng, Y., Yao,
