Mda
Revealing Hidden Patterns in Deep Neural Network Feature Space Continuum via Manifold Learning (Nature Communications, 2023)
Install / Use
/learn @xinglab-ai/MdaREADME
Manifold discovery and analysis
This code implements the manifold discovery and analysis (MDA) algorithm presented in "Revealing Hidden Patterns in Deep Neural Network Feature Space Continuum via Manifold Learning, Nature Communications, 2023". Please run demo_MDA.ipynb for analyzing deep learning features for five different tasks (medical image segmentation and superresolution, gene expression prediction, survival prediction, and COVID-19 x-ray image classification). You can also run our reproducible codes in Code Ocean (https://doi.org/10.24433/CO.0076930.v1).
Citation
Md Tauhidul Islam, Zixia Zhou, Hongyi Ren, Masoud Badiei Khuzani, Daniel Kapp, James Zou, Lu Tian, Joseph C. Liao and Lei Xing. 2023, "Revealing Hidden Patterns in Deep Neural Network Feature Space Continuum via Manifold Learning", Nature Communications, 14(1), p.8506.
Installation
The easiest way to start with MDA is to install it using PyPI.
pip install MDA-learn
Required packages
scikit-learn, scipy, tensorflow, umap-learn, pandas, matplotlib, jupyter, jupyterlab and mat73. The tested package versions are: jupyter (1.0.0),jupyterlab (3.6.1), mat73 (0.62), matplotlib (3.4.1), pandas (1.2.4), scikit-learn (0.24.2), scipy (1.6.1), tensorflow (2.5.1), umap-learn (0.5.3).
Data
The analyzed data can be downloaded from https://drive.google.com/drive/folders/1MUvngB04qd1XU6oFV_aJwSaScj0KP2c3?usp=sharing.
Example code
The following code gives an example of how to extract interlayer features
# All trained models used in our experiments, including those for the super-resolution task, segmentation task,
# gene expression prediction task, survival prediction task, and classification task, have been uploaded to
# the provided data drive link (MDA_Datasets/data/Trained Models/...).
# Here, we provide an example of the feature extraction process from Tensorflow models for the segmentation task.
# For PyTorch models, the feature extraction process is demonstrated in example_pytorch_feature_extraction.py.
from tensorflow.keras.models import load_model
from sklearn.decomposition import PCA
import numpy as np
import tensorflow as tf
# Path to the pre-trained model
model_path='../data/trained_models/model_seg.h5'
# Load the pre-trained model
model=load_model(model_path)
# Load test data
X_test=np.load('../data/trained_models/X_test_seg.npy')
# Extract output from a specific layer ('conv2d_9') of the model
interlayer_output=model.get_layer('conv2d_9').output
# Create a new model that outputs the interlayer output
inter_model = tf.keras.Model(inputs=model.input, outputs=interlayer_output)
# Initialize an empty list to store outputs
inter_out=[]
# Loop through the test data to extract features
for i in range(len(X_test)):
test_img=X_test[i] # Get an individual test image
test_img=test_img[np.newaxis,:, :] # Add an extra dimension
test_img=test_img/255 # Normalize the image
test_out=inter_model.predict(test_img) # Predict using the intermediate model
test_out=np.squeeze(test_out) # Remove single-dimensional entries
inter_out.append(test_out) # Append the output to the list
# Convert list to numpy array
inter_out=np.array(inter_out)
# Reshape the output for PCA
n1, h1, w1, c1 = inter_out.shape
inter_out = inter_out.reshape(-1, h1*w1*c1)
# Apply PCA for dimensionality reduction
pca = PCA(n_components=400, svd_solver='arpack')
inter_out = pca.fit_transform(inter_out)
print(inter_out.shape)
# Save the PCA-transformed features
np.save('../data/Seg/feature_test.npy',inter_out)
The following code shows the MDA analyses of deep neural network (DNN) features at intermediate layers for five different tasks
# For the tasks below, five datasets analysed in the manuscript will be automatically loaded.
# However, you can upload your own dataset, and analyze it using MDA
# Our data were saved as .npy file to reduce the data size (normally .csv file needs more disk space).
# However, .csv or other type of files can also be loaded and analyzed using MDA
# Load all necessary python packages needed for the reported analyses
# in our manuscript
import warnings
# Disable all warnings
warnings.filterwarnings("ignore")
%matplotlib inline
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' # FATAL
import matplotlib.pyplot as plt
import scipy
import scipy.io as sio
import sklearn
import umap
import pandas as pd
from umap.parametric_umap import ParametricUMAP
import numpy as np
from mda import *
# Font size for all the MDA visualizations shown below
FS = 16
Example 1 - MDA analysis of the DNN features in superresolution task
Superresolution Network
In the superresolution task, we employed the super resolution generative adversarial network (SRGAN) to enhance the resolution of dermoscopic images (ISIC-2019) from 32×32 to 64×64. The selected SRGAN is a well-established deep network for super resolution, which is composed of a generator and a discriminator. In our implementation, the generator contains 4 residual blocks with shortcut connection batch normalization and PReLU and 1 upsampling block; the discriminator contains 7 convolution layers with leaky RuLU. of them stacks eight convolutional layers. Every two convolutional layers are linked together in a feed-forward mode to maximize feature reuse.
Dataset and feature selection
We adopted ISIC-2019 dataset, which consists of a total of 25,331 dermoscopic images, including 4522 melanoma, 12,875 melanocytic nevus, 3323 basal cell carcinoma, 867 actinic keratosis, 2624 benign keratosis, 239 dermatofibroma, 253 vascular lesion, and 628 squamous cell carcinoma cases. To visualize the intermediate layers of the SRGAN, we selected features of (a) output of the first residual block, (b) output of the third residual block, (c) output of the fourth residual block, and (d) output of the upsampling block in the generator. In this demo, feature (d) is given as a example.
# Number of neighbors in MDA analyses
neighborNum = 5
# Load feature data extracted by the SRGAN at umsampling block from test images
testDataFeatures = np.load('../data/SR/feature4_test_pca.npy')
# Load data labels (target high resolution images) corresponding to low resolution test images
Y = np.load('../data/SR/y_test.npy')
# Reshape the target images into vectors so that they can be analyzed by MDA
Y = Y.reshape(Y.shape[0],-1)
# Load output images prediced by the SRGAN
Y_pred = np.load('../data/SR/y_test_pred_trained.npy')
# Reshape the predicted output images into vectors so that they can be analyzed by MDA
Y_pred = Y_pred.reshape(Y_pred.shape[0],-1)
# Create color map for MDA visualization from the target manifold topology
clusterIdx = discoverManifold(Y, neighborNum)
# Compute the outline of the output manifold
clusterIdx_pred = discoverManifold(Y_pred, neighborNum)
# Use the outline of the output manifold to generate the MDA visualization of the SRGAN features
Yreg = mda(testDataFeatures,clusterIdx_pred)
# Plot the MDA results
plt.figure(1)
plt.scatter(Yreg[:,0],Yreg[:,1],c=clusterIdx.T, cmap='jet', s=5)
plt.xlabel("MDA1")
plt.ylabel("MDA2")
plt.title('MDA visualization of the SRGAN features for superresolution task')
Visualization and analysis of SRGAN features for super resolution task:
Figure 1. MDA Visualization of SRGAN features for super resolution task after network training. Here, RB1 denotes the first residual block, RB3 denotes the third residual block, RB4 denotes the fourth residual block, and UB denotes the up-sampling block. t-SNE, UMAP and MDA results are shown in (a), (b), (c), respectively for training and testing datasets at different network layers. The colorbar denotes the normalized manifold distance. (d) Pearson correlations between the geodesic distances among feature data points in HD and low dimensional representation from different methods are shown for training and testing data.
Example 2 - MDA analysis of the DNN features in segmentation task
Segmentation Network
In the segmentation task, we employed Dense-UNet for automatic brain tumor segmentation from MR images. The Dense-UNet combines the U-net with the dense concatenation to deepen the depth of the network architecture and achieve feature reuse. The network is formed from seven dense blocks (four in encoder and three in decoder), each of them stacks eight convolutional layers. Every two convolutional layers are linked together in a feed-forward mode to maximize feature reuse.
Dataset and feature selection
Here, we used BraTS 2018 dataset, which provides multimodality 3D MRI images with tumor segmentation labels annotated by physicians. The dataset includes 484 cases in total, which can be divided into 210 high-grade gliomas (HGG) and 75 low-grade gliomas (LGG) cases. To visualize the intermediate layers of the Dense-UNet, we selected features of (a) the second convolutional layer in the third dense block, (b) the 8th convolutional layer in the fourth dense block, (c) the second convolutional layer in the 6th dense block, and (d) the last convolutional layer before the final output. In this demo, feature (d) is given as a example.
# Load feature data extracted by the Dense-UNet from test images at the last layer before output
testDataFeatures = np.load('../data/Seg/feature4_test.npy')
# Load data labels (segmented images) corresponding to input test images
Y = np.load('../data/Seg/y_test.npy')
# Reshape the binary images into vectors
Y = Y.reshape(Y.shape[0],-1)
# Load output segmentation prediced by the Dense-UNet
Y_pred = np.load('../data/Seg/y_test_pred_trained.npy')
# Reshape the output binary images into vectors
Y_pred = Y_pred.reshape(Y_pred.shape[0],-1)
# Create color map for MDA visualization from the topology of the target manifold
clusterIdx = discoverManifold(Y, neighbo
