SkillAgentSearch skills...

BioScripts

Cool Bioinformatics Scripts

Install / Use

/learn @Zilong-Li/BioScripts
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

#+TITLE: Cool Bioinformatics Scripts

  • Table of Content :TOC:QUOTE: #+BEGIN_QUOTE
  • [[#qqplot][qqplot]]
  • [[#downsampling-bam-files][downsampling bam files]]
  • [[#genotype-refinement-using-beagle-3][genotype refinement using beagle 3]]
  • [[#calculate-genotype-discordance][calculate genotype discordance]]
  • [[#r2-vs-maf-plot][R2 vs MAF plot]]
  • [[#phylogenetic-tree][Phylogenetic Tree]]
  • [[#fixref][fixref]] #+END_QUOTE
  • [[file:qqplot.py][qqplot]] You can use make a QQ plot in the following ways.
  • one-liner for reading tons of millions of P values from the pipe

#+begin_src shell

python

zcat pval.txt.gz | qqplot.py -out test -title "QQ plot on the fly"

julia (recommand to run it in the REPL)

zcat pval.txt.gz | qqplot.jl --out test --title "QQ plot on the fly" #+end_src

warning : If you have 100 billion P values to process you should definitely use [[qqplot.jl]] instead of [[file:qqplot.py][qqplot.py]]. The hourly processed number of lines of julia version is 5 billion while python is only 700 million on my server.

  • running in a julia REPL (recommanded)

#+begin_src julia include("qqplot.jl") cmd = pipeline(zcat pval.gz, awk 'NR>1{print $10}') sigp, expp = qqfly("test", cmd=cmd) #+end_src

  • use qqplot.py in your script

#+begin_src python import numpy as np from qqplot import qq p = np.random.random(1000000) qq(x=p, figname="test.png") #+end_src

[[file:image/qqplot.png]]

  • [[file:downsample.sh][downsampling bam files]]

#+begin_src shell Usage: downsample.sh [-b <bamlist>] [-d <depth>] [-n <cores>] [-o <outdir>] #+end_src

  • [[file:beagle3-imputation.sh][genotype refinement using beagle 3]]

#+begin_src shell Usage: beagle3-imputation.sh [options] Pipeline of genotype refinement for median depth sequencing data using beagle3

-h, Display help -i, Input VCF/BCF file -o, Output folder -f, MAF filters before imputation #+end_src

  • [[file:calc_imputed_gt_discord.py][calculate genotype discordance]]

When you run imputation analysis with =BEAGLE= (or other imputation tools), you may want to know the distribution of genotype discordance between the original vcf and imputed vcf.

#+begin_src shell usage: calc_imputed_gt_discord.py [-h] [-chr STRING] VCF1 VCF2 OUT #+end_src

warning : Before running the script, you must be sure the two vcfs have the exact same sites and samples for each chromosome.

[[file:image/calc_imputed_gt_discord.png]]

  • [[file:plot-r2-vs-maf.R][R2 vs MAF plot]] plot INFO/R2 after imputation by =BEAGLE= etc.

[[file:image/r2-vs-maf.png]]

  • [[file:njtree.R][Phylogenetic Tree]]

[[file:image/njtree-circular.png]]

  • [[file:fixref.py][fixref]]

Before running =bcftools merge=, you maybe need to fix the ref and alt and corresponding genotypes, otherwise =bcftools= will surprise you.

#+begin_src shell usage: fixref.py [-h] REF_VCF IN_VCF OUT_VCF #+end_src

Related Skills

View on GitHub
GitHub Stars12
CategoryDevelopment
Updated4mo ago
Forks1

Languages

Python

Security Score

92/100

Audited on Nov 11, 2025

No findings