Deindexer
an easy deindex tool for illumina multiple barcodes sequencing
Install / Use
/learn @ws6/DeindexerREADME
########################## A Standalone Demultiplexing Tool for Illumina Platform ##########################
###Features###
- No need to edit SampleSheet.csv files
- Support your own barcode library
- Support many mismatches on barcode[s] (illumina only allow 1 mismatches)
- Easy to use: support single barcode to dual barcode or even multiple barcodes protocol [even it is not exist yet]
- Fast: deindexing 90millions reads dual indexed takes 30 mins; for GA/GAIIX; one lane of flow cell using 400 barcodes takes less than 1 min.
- Grouping output files: cutomize the output layout within the config file ###Compile### make
####run examples### for help: ./deindexer -h
create a folder for output $ mkdir output $ cd output
for single index: $ ../deindexer -f "rb" -c ../noinstall/config_examples/Single_Index_example.cfg -b ../noinstall/barcode_lib/ILMN_SINGLE12.txt ../noinstall/Reads/*
for dual indexes: $ ../deindexer -f "rbbr" -c ../noinstall/config_examples/Dual_index_example.cfg -b ../noinstall/barcode_lib/ILMN_dual_N700.txt -b ../noinstall/barcode_lib/ILMN_dual_N500.txt ../noinstall/Reads_dual_index/*
for multiple indexes : $ ../deindexer -f "rbbbrr" -c my_customized_config.cfg -b barlib_1.txt -b barlib_2.txt -b barlib_3.txt R1*.fastq R2*.fastq R3*.fastq R4*.fastq R5*.fastq R6*.fastq
for mismatches on barcode option: $ ../deindexer -m 1 ...omitted $ ../deindexer -m 2 ...omitted $ ../deindexer -m 5 ...omitted
-f : formatter defaul "rb" now only two characters allowed, 'r' or 'b'. 'r' is the non_barcode read 'b' is the barcode read number of 'r' + 'b' should be same as non getopt option parmaters non getopt paramters sould be grouped this way R1gz R2gz R3gz for single end -f "rb" or -f "br" 'b' and 'r' should be in order as R[1-3] appears "rbrrrr" is also legal as long as you provide enough non option paramters and packed the right way.
-c config file. instead of as using SampleSheet.csv like format, it only takes the number as the indexes the first column is the indexes out of first -b option the second colum, if the second -b option provided, should be the second barcode lib the next column of non barcode index should be the output filenames prefix, a.k.a sample names for an dual indexing example: -c config.txt -b ILMN_N700.txt -b ILMN_N500.txt R1* R2* R3* R4* ###In config.txt### 1 2 mysample1 7 2 mysample2 -b barcode libs: you can have one or multiple barcode lists in below format. number of barcode libs need same as config file' first N columns. ###In ILMN_700.txt### TAAGGCGA CGTACTAG AGGCAGAA TCCTGAGC GGACTCCT TAGGCATG CTCTCTAC CAGAGAGG GCTACGCT CGAGGCTG AAGAGGCA GTAGAGGA ###In ILMN_N500.txt### TAGATCGC CTCTCTAT TATCCTCT AGAGTAGA GTAAGGAG ACTGCATA AAGGAGTA CTAAGCCT ###single index config.txt### #illumina single index 12 barcode #./deindexer -c config.single.txt -b nonInstall/illumina_bar12.txt -f "rb" R1*.gz R2*.gz 1 sample1 11 sample2 -m mismatch tolerance: mismatch can have any number you want. it is user's responsibility to input a safe range. However, auto collisions detect mechnism also inplemented, which means if a fuzzy barcode matches multiple barcodes within [mismatches] allowed, this tuple will be throw away. -o outputdir: optional output dir on the top -z : if this parameter shows (no need a value), all files will be gzipped aftermath.(need gzip program installed) add noindex_R[1-4].fastq.gz files at output dir save the non parsed reads into non_index.R*.fastq.gz #advanced parameters -C [int] accepet 1, 2,3 default 2. choose which collision handling strategy ,see collision_handle.h -D [int] accept 0 or not zero int values, default 0 choose which distance function. 0==hamming distance, non zero== edit distance #advanced config file features In config file, if a value starts from 0 , which is non of the barcode lib index[1-bases], the layout will be translated for non collision - non indexed reads if a value start from -1 , the layout will be saved for collisions reads output. Note: you need used as many 0s or -1s as the barcodes lib used, to pad the layout same as barcode entries. This is not nessary, but for consistency.
Other digital values less than 0, will exit the program.
If not provide 0 or -1 entries' layout, these two categories will not be hanlded.
Examples:
dual indexed:
noinstall/config_examples/Dual_with_Nonindex_example.cfg
single indexed:
noinstall/config_examples/Single_Index_example.nonIndex_handle.cfg
TODO:
Fix/uniqfy the output format Q-score
Support qseq.txt files
AUTHOR:
Jingtao Liu@EG
EMAIL:
jingtao09 et g m a i l.com
welcome to use this tool and free to report bugs
