IFeature
iFeature is a comprehensive Python-based toolkit for generating various numerical feature representation schemes from protein or peptide sequences. iFeature is capable of calculating and extracting a wide spectrum of 18 major sequence encoding schemes that encompass 53 different types of feature descriptors. Furthermore, iFeature also integrates five kinds of frequently used feature clustering algorithms, four feature selection algorithms and three dimensionality reduction algorithms.
Install / Use
/learn @Superzchen/IFeatureREADME
update news!!!
-
iFeatureOmega - an updated version of iFeature is now available! (2022-05-23)
iFeatureOmega is a comprehensive platform for generating, analyzing and visualizing more than 180 representations for biological sequences, 3D structures and ligands. To the best of our knowledge, iFeatureOmega supplies the largest number of feature extraction and analysis approaches for most molecule types compared to other pipelines. Three versions (i.e. iFeatureOmega-Web, iFeatureOmega-GUI and iFeatureOmega-CLI) of iFeatureOmega have been made available to cater to both experienced bioinformaticians and biologists with limited programming expertise. iFeatureOmega also expands its functionality by integrating 15 feature analysis algorithms (including ten cluster algorithms, three dimensionality reduction algorithms and two feature normalization algorithms) and providing nine types of interactive plots for statistical features visualization (including histogram, kernel density plot, heatmap, boxplot, line chart, scatter plot, circular plot, protein three dimensional structure plot and ligand structure plot). iFeatureOmega is an open-source platform for academic purposes. The web server can be accessed through https://ifeatureomega.erc.monash.edu and the GUI and CLI versions can be download at: https://github.com/Superzchen/iFeatureOmega-GUI and https://github.com/Superzchen/iFeatureOmega-CLI, respectively.
iFeatureOmega flowchart:

-
iLearnPlus - an updated version of iFeature and iLearn is now available! (2021-02-28)
iLearnPlus is the first machine-learning platform with both graphical- and web-based user interface that enables the construction of automated machine-learning pipelines for computational analysis and predictions using nucleic acid and protein sequences. iLearnPlus integrates 21 machine-learning algorithms (including 12 conventional classification algorithms, two ensemble-learning frameworks and seven deep-learning approaches) and 19 major sequence encoding schemes (in total 147 feature descriptors), outnumbering all the current web servers and stand-alone tools for biological sequence analysis, to the best of our knowledge. In addition, the friendly GUI (Graphical User Interface) of iLearnPlus is available to biologists to conduct their analyses smoothly, significantly increasing the effectiveness and user experience compared to the existing pipelines. iLearnPlus is an open-source platform for academic purposes and is available at https://github.com/Superzchen/iLearnPlus/. The iLearnPlus-Basic module is online accessible at http://ilearnplus.erc.monash.edu/.
iLearnPlus-Basic module interface:

-
iLearn - the updated version of iFeature is now available! (2019-03-13)
iLearn is a Python Toolkit and Web Server Integrating the Functionality of Feature Calculation, Extraction, Clustering, Feature Selection, Feature Normalization, Dimension Reduction and Model Construction for Classification, Best Model Selection, Ensemble Learning and Result Visualization for DNA, RNA and Protein Sequences. Please refer to https://github.com/Superzchen/iLearn for details.
iFeature: A python package and web server for features extraction and selection from protein and peptide sequences
iFeature is a comprehensive Python-based toolkit for generating various numerical feature representation schemes from protein or peptide sequences. iFeature is capable of calculating and extracting a wide spectrum of 18 major sequence encoding schemes that encompass 53 different types of feature descriptors. Furthermore, iFeature also integrates five kinds of frequently used feature clustering algorithms, four feature selection algorithms and three dimensionality reduction algorithms.
Installation
- Download iFeature by
git clone https://github.com/Superzchen/iFeature
iFeature is an open-source Python-based toolkit, which operates depending on the Python environment (Python Version 3.0 or above) and can be run on multi-OS systems (such as Windows, Mac and Linux operating systems). Before running iFeature, user should make sure all the following packages are installed in their Python environment: sys, os, shutil, scipy, argparse, collections, platform, math, re, numpy (1.13.1), sklearn (0.19.1), matplotlib (2.1.0), and pandas (0.20.1). For convenience, we strongly recommended users to install the Anaconda Python 3.0 version (or above) in your local computer. The software can be freely downloaded from https://www.anaconda.com/download/.
For users who want to generate descriptors by our provided iFeature package :
cd to the iFeature folder which contains iFeature.py, iFeaturePseKRAAC.py, cluster.py and feaSelector.py. All the functions regarding feature extraction, feature or sample clustering and feature selection analysis can be executed through these four main programs by specifying the parameter '--type'.
"iFeature.py" is the main program used to extract 37 different types of feature descriptors. For details of other parameters, run:
python iFeature.py --help
"iFeaturePseKRAAC.py" is the program used to extract the 16 types of pseudo K-tuple reduced amino acid composition (PseKRAAC) feature descriptors. For details of other parameters, run:
python iFeaturePseKRAAC.py --help
"cluster.py" is the program used for running the feature or sample clustering algorithms. For details of other parameters, run:
python cluster.py --help
"feaSelector.py" is the fourth main program used to implement the feature selection algorithms. For details of other parameters, run:
python feaSelector.py --help
Furthermore, the iFeature package contains other Python scripts to generate the position-specific scoring matrix (PSSM) profiles, predicted protein secondary structure and predicted protein disorder, which have also been often used to improve the prediction performance of machine learning-based classifiers in conjunction with sequence-derived information. The three dimensionality reduction algorithms are also included in the scripts directory.
Examples for users to extract descriptors from iFeature.py. All files in the example commands can be found in the examples directory.
The input protein or peptide sequences for iFeature.py and iFeaturePseKRAAC.py should be in fasta format, Please find the example in example folder. The following parameters are required by iFeature.py:
--helpshow help of 'iFeature.py'--fileprotein/peptide sequence file in fasta format--typefeature types for protein sequence analysis--pathdata file path used for 'PSSM', 'SSEB(C)', 'Disorder(BC)', 'ASA' and 'TA' encodings--traintraining file in fasta format only used for 'KNNprotein' or 'KNNpeptide' encodings--labelsample label file only used for 'KNNprotein' or 'KNNpeptide' encodings--orderoutput order for of Amino Acid Composition (i.e. AAC, EAAC, CKSAAP, DPC, DDE, TPC) descriptors--userDefinedOrderuser defined output order for of Amino Acid Composition (i.e. AAC, EAAC, CKSAAP, DPC, DDE, TPC) descriptors--outthe generated descriptor file Running the following command to obtain theComposition of k-spaced Amino Acid Pairs (CKSAAP)descriptor:
python python iFeature.py --file examples/test-protein.txt --type CKSAAP
Generally, users can generate different descriptors by changing the descriptor type by specifying '--type', For example, run the following command to generate the Dipeptide Deviation from Expected Mean (DDE) descriptor:
python python iFeature.py --file examples/test-protein.txt --type DDE
For some descriptors (e.g. PSSM, Disorder, ASA, TA and SSEB), the predicted protein property file should be supplied by the --path parameter. For example, run the following command to generate PSSM descriptor:
python iFeature.py --file examples/test-peptide.txt --type PSSM --path examples/predictedProteinProperty
KNNprotein and KNNpeptide descriptors requires an extra training file and a label file, which is spedified by --train and --label. Run the following command to generate the KNNprotein descriptor:
python iFeature.py --file examples/test-peptide.txt --type KNNpeptide --train examples/train-peptide.txt --label examples/label.txt
For the six descriptors in Amino Acid Composition group, user can specify the output order by --order and --userDefinedOrder, three amino acids order (i.e. alphabetically, polarity and side chaim volume) were supplied by iFeature. Run the following command to generate the AAC descriptor with the 'polarity' order:
python iFeature.py --file examples/test-protein.txt --type AAC --order polarity
Run the following command to generate the AAC descriptor with a user-defined order:
python iFeature.py --file examples/test-protein.txt --type AAC --order userDefined --userDefinedOrder YWVTSRQPNMLKIHGFEDCA
For some of the descriptors, user can adjust the default parameters by advanced usage. The detatiled advanced usage for each type of descriptor can be found in iFeatureManual.pdf in iFeature directory. For example, for CKSAAP descriptor, advanced users can adjust the size of the sliding window to 3 (the default is 5) by running the following Python command:
python codes/EAAC.py examples/test-peptide.txt 3 EAAC.tsv
The default output file is encoding.tsv, which can be specified by --out. For example:
python iFeature.py --file examples/test-protein.txt --type AAC --out AAC.txt
Examples for users to extract descriptors from iFeaturePseKRAAC.py.
The 16 types of reduced amino acid alph
