GPD
This repository contains scripts used in the manuscript "Massive expansion of human gut bacteriophage diversity"
Install / Use
/learn @cai91/GPDREADME
The Gut Phage Database (GPD)
Scripts used for characterizing human gut bacteriophages in the following manuscript:
Camarillo-Guerrero LF, Almeida A, Rangel-Pineros G, Finn RD, Lawley TD (2020) [Massive expansion of human gut bacteriophage diversity]
Associated data can also be found in our FTP server
classifier/classifier.py
Neural network that distinguishes phages from integrative and conjugative elements (ICEs).
<b>Requirements:</b>
- Python (tested v3.6.7)
- TensorFlow (tested v1.10)
- Keras (tested v.2.2.4)
<b>Usage:</b>
classifier.py <input_features_file.txt>
<b>Notes:</b>
- input_features_file.txt: It contains a feature vector of 1026 dimensions: fraction of hypothetical proteins (1), gene density (1), 5-kmer signature (1024) that represents a phage or an ICE (1 feature vector per line) <br />
- classifier/classifier_demo.py: It runs a demo of the classifier with 50 examples of phages and ICEs each <br />
<b>Input features generation files:</b>
getGeneDensity.py: This function takes in a GFF3 file and returns the number of genes / kb. <br /> getHypothetical.py: This function takes in a GFF3 file and returns the fraction of hypothetical proteins. <br /> getKmer.py: This function takes in a DNA sequence and counts the proportion of each of the 1024 possible 5mers. <br />
<b>Usage:</b>
getGeneDensity(<gff3_file_name>)
getHypothetical(<gff3_file_name>)
getSignature_hash(<DNA_sequence>)
Other analysis and plotting scripts
<b>figures/</b>
- 'Figure 1.py': Distribution of MIUViG scores from CheckV analysis
- 'Figure 2.py': Viral diversity patterns across gut bacteria genera and broad host range VCs
- 'Figure 3.py': Gut phageome profiling across human populations and correlation with gut bacteria enterotypes
- 'Figure 4.py': Crass-like family global distribution and host-phage network of globally distributed VCs
- 'Figure 5.py': Phylogenetic structure of the pX phage and global distribution
- 'Figure S1.py': Quality control assessment of GPD
- 'Figure S2.py': Viral diversity patterns across gut bacteria phyla and host range analysis of gut phages
- 'Figure S3.py': Correlation between sequencing depth and number of phages detected in a sample
- 'Figure S4.py': Host range analysis of globally distributed phages
Related Skills
node-connect
354.3kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
112.3kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
354.3kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
354.3kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
