Gtc2vcf
Tools to convert Illumina IDAT/BPM/EGT/GTC and Affymetrix CEL/CHP files to VCF
Install / Use
/learn @freeseek/Gtc2vcfREADME
gtc2vcf
A set of tools to convert Illumina and Affymetrix DNA microarray intensity data files into VCF files <b>without</b> using Microsoft Windows. You can use the final output to run the pipeline to detect mosaic chromosomal alterations. If you use this tool in your publication, please cite this website. For any feedback or questions, contact the author

- Usage
- Installation
- Software Installation
- Identifying chip type for IDAT and CEL files
- Convert Illumina IDAT files to GTC files
- Convert Illumina GTC files to VCF
- Convert Affymetrix CEL files to CHP files
- Convert Affymetrix CHP files to VCF
- Using an alternative genome reference
- Detect contamination
- Plot variants
- Illumina GenCall
- Acknowledgements
Usage
Illumina data tool:
Usage: bcftools +gtc2vcf [options] [<A.gtc> ...]
Plugin options:
-l, --list-tags list available FORMAT tags with description for VCF output
-t, --tags LIST list of output FORMAT tags [GT,GQ,IGC,BAF,LRR,NORMX,NORMY,R,THETA,X,Y]
-b, --bpm <file> BPM manifest file
-c, --csv <file> CSV manifest file (can be gzip compressed)
-e, --egt <file> EGT cluster file
-f, --fasta-ref <file> reference sequence in fasta format
--set-cache-size <int> select fasta cache size in bytes
--gc-window-size <int> window size in bp used to compute the GC content (-1 for no estimate) [200]
-g, --gtcs <dir|file> GTC genotype files from directory or list from file
-i, --idat input IDAT files rather than GTC files
--capacity <int> number of variants to read from intensity files per I/O operation [32768]
--adjust-clusters adjust cluster centers in (Theta, R) space (requires --bpm and --egt)
--use-gtc-sample-names use sample name in GTC files rather than GTC file name
--do-not-check-bpm do not check whether BPM and GTC files match manifest file name
--do-not-check-eof do not check whether the BPM and EGT readers reach the end of the file
--genome-studio <file> input a GenomeStudio final report file (in matrix format)
--no-version do not append version and command line to the header
-o, --output <file> write output to a file [standard output]
-O, --output-type u|b|v|z|t[0-9] u/b: un/compressed BCF, v/z: un/compressed VCF
t: GenomeStudio tab-delimited text output, 0-9: compression level [v]
--threads <int> number of extra output compression threads [0]
-x, --extra <file> write GTC metadata to a file
-v, --verbose print verbose information
-W, --write-index[=FMT] Automatically index the output files [off]
Manifest options:
--beadset-order output BeadSetID normalization order (requires --bpm and --csv)
--fasta-flank output flank sequence in FASTA format (requires --csv)
-s, --sam-flank <file> input flank sequence alignment in SAM/BAM format (requires --csv)
--genome-build <assembly> genome build ID used to update the manifest file [GRCh38]
Examples:
bcftools +gtc2vcf -i 5434246082_R03C01_Grn.idat
bcftools +gtc2vcf 5434246082_R03C01.gtc
bcftools +gtc2vcf -b HumanOmni2.5-4v1_H.bpm -c HumanOmni2.5-4v1_H.csv
bcftools +gtc2vcf -e HumanOmni2.5-4v1_H.egt
bcftools +gtc2vcf -c GSA-24v3-0_A1.csv -e GSA-24v3-0_A1_ClusterFile.egt -f human_g1k_v37.fasta -o GSA-24v3-0_A1.vcf
bcftools +gtc2vcf -c HumanOmni2.5-4v1_H.csv -f human_g1k_v37.fasta 5434246082_R03C01.gtc -o 5434246082_R03C01.vcf
bcftools +gtc2vcf -f human_g1k_v37.fasta --genome-studio GenotypeReport.txt -o GenotypeReport.vcf
Examples of manifest file options:
bcftools +gtc2vcf -b GSA-24v3-0_A1.bpm -c GSA-24v3-0_A1.csv --beadset-order
bcftools +gtc2vcf -c GSA-24v3-0_A1.csv --fasta-flank -o GSA-24v3-0_A1.fasta
bwa mem -M GCA_000001405.15_GRCh38_no_alt_analysis_set.fna GSA-24v3-0_A1.fasta -o GSA-24v3-0_A1.sam
bcftools +gtc2vcf -c GSA-24v3-0_A1.csv --sam-flank GSA-24v3-0_A1.sam -o GSA-24v3-0_A1.GRCh38.csv
Affymetrix data tool:
Usage: bcftools +affy2vcf [options] --csv <file> --fasta-ref <file> [<A.chp> ...]
Plugin options:
-l, --list-tags list available FORMAT tags with description for VCF output
-t, --tags LIST list of output FORMAT tags [GT,CONF,BAF,LRR,NORMX,NORMY,DELTA,SIZE]
-c, --csv <file> CSV manifest file (can be gzip compressed)
-f, --fasta-ref <file> reference sequence in fasta format
--set-cache-size <int> select fasta cache size in bytes
--gc-window-size <int> window size in bp used to compute the GC content (-1 for no estimate) [200]
--probeset-ids tab delimited file with column 'probeset_id' specifying probesets to convert
--calls <file> apt-probeset-genotype calls output (can be gzip compressed)
--confidences <file> apt-probeset-genotype confidences output (can be gzip compressed)
--summary <file> apt-probeset-genotype summary output (can be gzip compressed)
--snp <file> apt-probeset-genotype SNP posteriors output (can be gzip compressed)
--chps <dir|file> input CHP files rather than tab delimited files
--cel <file> input CEL files rather CHP files
--adjust-clusters adjust cluster centers in (Contrast, Size) space (requires --snp)
--no-version do not append version and command line to the header
-o, --output <file> write output to a file [standard output]
-O, --output-type u|b|v|z[0-9] u/b: un/compressed BCF, v/z: un/compressed VCF, 0-9: compression level [v]
--threads <int> number of extra output compression threads [0]
-x, --extra <file> write CHP metadata to a file (requires CHP files)
-v, --verbose print verbose information
-W, --write-index[=FMT] Automatically index the output files [off]
Manifest options:
--fasta-flank output flank sequence in FASTA format (requires --csv)
-s, --sam-flank <file> input flank sequence alignment in SAM/BAM format (requires --csv)
Examples:
bcftools +affy2vcf \
--csv GenomeWideSNP_6.na35.annot.csv \
--fasta-ref human_g1k_v37.fasta \
--chps cc-chp/ \
--snp AxiomGT1.snp-posteriors.txt \
--output AxiomGT1.vcf \
--extra report.tsv
bcftools +affy2vcf \
--csv GenomeWideSNP_6.na35.annot.csv \
--fasta-ref human_g1k_v37.fasta \
--calls AxiomGT1.calls.txt \
--confidences AxiomGT1.confidences.txt \
--summary AxiomGT1.summary.txt \
--snp AxiomGT1.snp-posteriors.txt \
--output AxiomGT1.vcf
Examples of manifest file options:
bcftools +affy2vcf -c GenomeWideSNP_6.na35.annot.csv --fasta-flank -o GenomeWideSNP_6.fasta
bwa mem -M GCA_000001405.15_GRCh38_no_alt_analysis_set.fna GenomeWideSNP_6.fasta -o GenomeWideSNP_6.sam
bcftools +affy2vcf -c GenomeWideSNP_6.na35.annot.csv -s GenomeWideSNP_6.sam -o GenomeWideSNP_6.na35.annot.GRCh38.csv
Installation
Install basic tools (Debian/Ubuntu specific if you have admin privileges)
sudo apt install wget unzip git g++ zlib1g-dev bwa unzip samtools msitools cabextract mono-devel libgdiplus icu-devtools bcftools
Optionally, you can install these libraries to activate further HTSlib features
sudo apt install libbz2-dev libssl-dev liblzma-dev libgsl0-dev
Preparation steps
mkdir -p $HOME/bin $HOME/GRCh3{7,8} && cd /tmp
We recommend compiling the source code but, wherever this is not possible, Linux x86_64 pre-compiled binaries are available for download here. However, notice that you will require BCFtools version 1.20 or newer. You can also download a previous version of the plugin through bioconda
Download latest version of HTSlib and BCFtools (if not downloaded already)
wget http://github.com/samtools/bcftools/releases/download/1.20/bcftools-1.20.tar.bz2
tar xjvf bcftools-1.20.tar.bz2
Download and compile plugins code (make sure you are using gcc version 5 or newer)
cd bcftools-1.20/
/bin/rm -f plugins/{idat2gtc.c,gtc2vcf.{c,h},affy2vcf.c}
wget -P plugins http://raw.githubusercontent.com/freeseek/gtc2vcf/master/{idat2gtc.c,gtc2vcf.{c,h},affy2vcf.c,BAFregress.c}
make
/bin/cp bcftools plugins/{idat2gtc,gtc2vcf,affy2vcf,BAFregress}.so $HOME/bin/
Make sure the directory with the pl
Related Skills
node-connect
343.1kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
90.0kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
343.1kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
343.1kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
