PredCAN
Ontology-based prediction of cancer driver genes
Install / Use
/learn @bio-ontology-research-group/PredCANREADME
Ontology-based prediction of cancer driver genes: predCAN
Datasets
We used the driver genes from and IntoGen other genes from cellular Phenotype Database
Dependencies
To install python dependencies run: pip install -r requirements.txt
Prediction of candicate cancer driver genes workflow
We integrate the annotates from Gene ontology (GO), Cellular Microscopy Phenotype Ontology (CMPO) and Mammalian Phenotype ontology (MP) using OPA2Vec
We used generated embeddings to test each ontology individually and evaluate their performance (AUC and F-score). Then considering the genes in which they have complete representation in GO, CMPO and MP.
python OPA2Vec_Prediction.py "filename"
And merging all three ontologies by running mergeOntologies.groovy to have outont.owl.
groovy mergeOntologies.groovy
As a result, we predict 112 new candidate driver genes within 20 cancer type Predicted112candidateDriverGenes.txt
Validation on two-cohorts
Following GATK pipline with MuTect2 in tumor-only mode by running VariantsCall-TumorOnlyMode.sh as a job with specifiying the folder name of the VCFs files:
sbatch VariantsCall-TumorOnlyMode.sh
A- First analysis (count the mutations)
Start by running Annovar annovarscript.sh:
chmod +x annovarscript.sh
./annovarscript.sh
And run combinedhist.r to plot the figures with trying to adjust limits parameter.
B- Second Analysis (pathogenicity test)
Start with PrepareU-test.sh (it will run Annovar, but independent from the previous test):
chmod +x PrepareU-test.sh
./PrepareU-test.sh
Then, ranksumtest.py to compute different 7 p-value scores and it needs those specific cancer type related files (all-driver and predicteddriver):
python ranksumtest.py
Final notes
For any comments or help needed with how to run the scripts, please send an email to: sara.althubaiti@kaust.edu.sa
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
best-practices-researcher
The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app
groundhog
400Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
last30days-skill
20.0kAI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary
