Jbwa
Java Bindings (JNI) for bwa
Install / Use
/learn @lindenb/JbwaREADME
jbwa
Java Bindings (JNI) for bwa
Author: Pierre Lindenbaum PhD. @yokofakun (Institut du Thorax, Nantes, France) BWA is written by Heng Li (Broad Institute)
Motivation
BWA (http://bio-bwa.sourceforge.net/) contains a small C example(https://github.com/lh3/bwa/blob/master/example.c) for running bwa-mem as a library (bwamem-lite). I created some JNI bindings to see if I can bind the C bwa library to java and get the same output than bwamem-lite.
Compilation
I've tested this code under linux and
- JAVA oracle JDK8
- GNU Make 3.81
- gcc 4.8.2
- wget
BWA for apache2 will be downloaded ( https://github.com/lh3/bwa/tree/Apache2 ) .
typing make, should download the sources bwa, compile and execute some tests.
See also
- https://github.com/broadinstitute/gatk/issues/1517
Contribute
- Issue Tracker: http://github.com/lindenb/jbwa/issues
- Source Code: http://github.com/lindenb/jbwa
License
The project is licensed under the Apache2 license.
Example (Two FASTQs)
System.loadLibrary("bwajni");
//load the index
BwaIndex index=new BwaIndex(new File(args[0]));
//load the bwa engine
BwaMem mem=new BwaMem(index);
//get reads from two fastqs
KSeq kseq1=new KSeq(new File(args[1]));
KSeq kseq2=new KSeq(new File(args[2]));
//build a list of two fastqs, forward and reverse
List<ShortRead> L1=new ArrayList<ShortRead>();
List<ShortRead> L2=new ArrayList<ShortRead>();
//while something can be done
for(;;)
{
//read the pair of fastq
ShortRead read1=kseq1.next();
ShortRead read2=kseq2.next();
//should we analyze and dump the data ?
if(read1==null || read2==null || L1.size()>100)
{
if(!L1.isEmpty())
for(String sam:mem.align(L1,L2)) //get the SAM records
{
System.out.print(sam);
}
if(read1==null || read2==null) break;
L1.clear();
L2.clear();
}
L1.add(read1);
L2.add(read2);
}
kseq1.dispose();
kseq2.dispose();
index.close();
mem.dispose();
Testing
Here is the ouput of the JAVA version:
java -Djava.library.path=src/main/native -cp src/main/java com.github.lindenb.jbwa.jni.Example2 \
human_g1k_v37.fasta tmp1.fq tmp2.fq
HWI-1KL149:20:C1CU7ACXX:4:1101:13638:2192 121 1 229568362 37 13S87M = 229568362 0 GCTCTTCCGATCTGGCACGTTGAAGGTCTCAAACATGATCTGGGTCATCTTCTCGCGGTTGGCCTTGGGATTGAGGGGGGCCTCGGTGAGCAGGGNGGGG AB?DDDDDDDBDCDDDDDDDDDDCDDDDCCC>(DCDDDDDDBDDDCCCCBDDDFFEEJIHIJIIHJIJJJJJJIJJJJJJJJJJJJJHHHHHDA2#FCCC NM:i:1 AS:i:85 XS:i:61
HWI-1KL149:20:C1CU7ACXX:4:1101:13638:2192 181 1 229568362 0 * = 229568362 0 GCTCTTCCGATCTCCCCACCCTGCTCACCGAGGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCAGNNNNNNNNNNNNNNNNNNAACGTGCC ?DDDDDDDDDDDDDDB?9BDDDDDDDBBB?8,,######################################?12##################FFFFFCCC AS:i:0 XS:i:0
HWI-1KL149:20:C1CU7ACXX:4:1101:1424:2423 69 X 16753128 0 * = 16753128 0 AGATNGGAAGAGCACACGTCTGAACTCCAGTCACCAAGGAGCATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAAAAACAAATACGGATGAGACATG CCCF#2ADHHHHHJJJJJJJJJJJJJJ>9:1*1C3C8D600)0*0*/00-.8B)--5B().).=).?CFFFBBBDB######################## AS:i:0 XS:i:0
HWI-1KL149:20:C1CU7ACXX:4:1101:1424:2423 137 X 16753128 0 58S34M8S = 16753128 0 AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATTAAAAAAAAAAAAAAAAAACAAAAAAAGAGATGAACAAGCAAA CCCFFFFFHHHHHJJJJJJJJJJJJJJJJJHIHIIJJJJJJJHJJIIJJJHFFFFEEEEEEEDDDD################################## NM:i:0 AS:i:34 XS:i:29
HWI-1KL149:20:C1CU7ACXX:4:1101:2908:2463 97 12 110765491 60 70M30S = 110765491 70 AATTNGGGGAACAGCTTTCCAAAGTCATCTCCCTTATTTGCATTGCAGTCTGGATCATAAATATTGGGCAAGATCGGAAGAGCACACGTCTGAACTCCAG CCCF#4BDHGHHHJJJJJJJJJIJHIJJJJJJJJJJJJJJJJIJJJJJJJJJJJJJIIIIHIJJJJIIIJIJJGHEHFFFEDDEEAA@BDDDCDDDD:C@ NM:i:1 AS:i:68 XS:i:0
HWI-1KL149:20:C1CU7ACXX:4:1101:2908:2463 145 12 110765491 60 30S70M = 110765491 -70 CTCTTTCCCTACACGACGCTCTTCCGATCTAATTTGGGGAACAGCTTTCCAAAGTCATCTCCCTTATTTGCATTGCAGTCTGGATCATAAATATTGGGCA DDDDDDDDDCAB=DDBDEEFFFFHHHJJJGHHGGFJJJJJIIIIJJJJJIJJJJJIJIIIJJJJJJJJJJJJIJJHHHFHEEJJIJJHHHHHFFFFFCBC NM:i:0 AS:i:70 XS:i:0
HWI-1KL149:20:C1CU7ACXX:4:1101:4663:2297 81 4 114279632 60 100M = 114279455 -277 GATTCCTACTGCACCCATGGAGAATGTGCCTTTTACTGAAAGCAAATCCAAAATTCCTGTAAGGACTATGCCCACTTCCACCCCAGCACCTCCATNTGCA DCDDDDCACCDBCBCDDDCDDCCA?EEDDDFFDFFFHHHGHHHJJJJJJJJIJJIJIJJIJIJJJJJJJJJJIGJJIIHFIJJJJHGDHHDHDA2#FCCB NM:i:1 AS:i:98 XS:i:0
HWI-1KL149:20:C1CU7ACXX:4:1101:4663:2297 161 4 114279455 60 100M = 114279632 277 CGTGCAAACGGGTGATATACCTCCTCTCTCTGGTGTAAAGCAGATATCCTGCCCCGACTCTTCTGAACCAGCTGTACAAGTCCAGTTAGATTTTTCCACA CCBFFFFFHHHHFHIJJJJJIIJJJJJJJJJJJHIGIJIIJJJJJJJJJJHIJJJJJJJHHHHHHFDDDFDDEEEDDDADCCDDDCCDCCDEDDDCACCC NM:i:0 AS:i:100 XS:i:0
HWI-1KL149:20:C1CU7ACXX:4:1101:6872:2320 81 2 179597667 60 100M = 179597628 -139 GGCTGTGCCTTCCACAAATGCTATCCTGTATCTGTCAGAAGCAGCTATTTCTTTGCCATCCTTAAACCAGGACACCCTCATGGGGAGGGAGCCTGNAATT ABDDDDDBDDDDDDEDDEDDDEECEEFFFFFFHGHHHHJJIJJJJJIIJJIJJJJJJJJJIIJIHGJJJJJHHEJJIHJJJJJJJJJHHHHHDA2#FCCC NM:i:1 AS:i:98 XS:i:0
HWI-1KL149:20:C1CU7ACXX:4:1101:6872:2320 161 2 179597628 60 100M = 179597667 139 CCCTGCATCATTCATGTCTACTCTGATGATCTCCAAAGAGGCTGTGCCTTCCACAAATGCTATCCTGTATCTGTCAGAAGCAGCTATTTCTTTGCCATCC CCCFFFFFHHHHHJJJJJJJJJJJJIJJJJJJJJJJJJJJIIJJIIHJJJJIJJGIIIJJJIIJIIIHGIJJJJJIIEHHHHHHFBFFDEFECDECCDDA NM:i:0 AS:i:100 XS:i:0
HWI-1KL149:20:C1CU7ACXX:4:1101:9215:2408 97 2 220283746 60 100M = 220283863 217 CAGCNGCTCAAGGCCAAGTGAGGGCCCGGCACCCCAGACTCCTCTTTCTGCGGGCAGGGCACAGGAGGCTAGGCCTGGGGGCTGGGGTCCCGCTGTCAGC CCCF#2ADHHHHHFIJIIHIGIJJJJJJJJIIJJJJIJJJJJJJIIIJJIGFFFDDDDDDDBDDD?BDBDCBBDDCDDDDDBDDDBB>BBDDDDB@CDCD NM:i:2 AS:i:93 XS:i:23
HWI-1KL149:20:C1CU7ACXX:4:1101:9215:2408 145 2 220283863 60 100M = 220283746 -217 GCCCGGGACCCTCTCCTGCCCCATGTGGAGAAAGGGTCCTCCACCTGTGTGTTTCAAGGGGCCGTGACCTCCAGGTCTCTCCCCCTGCGATCCCATCTTG BDDBDBC?DDDDDDDDDDDDDDDDDDDDDDDDDDDDBDDDDCADDDDDBEEEEEFFFFHHIJJJIHGJJJIJJJJJIIIIJIJJJJJHHHGHFFFFFCCC NM:i:0 AS:i:100 XS:i:0
HWI-1KL149:20:C1CU7ACXX:4:1101:9815:2325 97 22 46114322 60 100M = 46114410 188 AAAGNCCGGAATTGGTACAAGCCATGTTTCCCAAACTGAACAATCAAGAAAGGTAACCCCCCAACCAGCGTGGTCTGGAGTATTTAGCATTCCATATAGG CCCF#2ADHHHHHJJGHIJJJJJJJJIGJJJJJJJJJJJJJJJJJJJJGHIJJHIJJIIJJHFFFFDDCD?BDDDCCDCD>ACDEEDDDEDDEDCCCCCD NM:i:1 AS:i:98 XS:i:0
HWI-1KL149:20:C1CU7ACXX:4:1101:9815:2325 145 22 46114410 60 100M = 46114322 -188 ATTCCATATAGGGTATTCGATGCACGTGACTGAAAAGCTGTGTGGTTTCTGAGTTGGCACAGAATCTCTAAATACATGTTTCTGTGTTGGTAATGGTTTT DDCDEDCCDDDDCDDEEDEFFFFFHHHHIJJJJJJJIJJJJIIJJJIIGGJJJJJIJJJJJJJJIIHJJJJJIIJJJJJJJIIJIJIHFHHHFFFFFCCC NM:i:0 AS:i:100 XS:i:0
HWI-1KL149:20:C1CU7ACXX:4:1101:11401:2488 97 3 38763808 60 100M = 38763855 147 CCACNATACGGTAGCAAGTCTTGCGCACCTGCCAGCCCACATCCCATGGACTCTTCGTGGTATCCAGTTTGCAGCAGGGACAGTGGCGAATGCATCCTGT CCCF#4ADHHHHHJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJIJJJEIJJIJJJHHHFFFFFFFEEEEEEEDABBDDDBBCCDBD>BDDDDEDDDD> NM:i:2 AS:i:93 XS:i:0
HWI-1KL149:20:C1CU7ACXX:4:1101:11401:2488 145 3 38763855 60 100M = 38763808 -147 GGACTCTTCGTGGTATCCAGTTTGCAGCAGGGACAGTGGCGAATGCATCCTGTGGGGAGAGGTGACTGATGGTGGGTGATGGCCAGTGGGCAAAGGGGAT DDCDDDB?DCCCDECDDCDDDCDDEEDEFFFFFFHHHJJIJJJIJIIJIJJJIJJIJJJJJJJJIJJJJJJJJJJIJJJJJJJJJJJHHHHHFFFFFCCC NM:i:1 AS:i:95 XS:i:0
HWI-1KL149:20:C1CU7ACXX:4:1101:11658:2375 97 7 35293037 60 100M = 35293129 192 CAGCNAGGGGCACAGACGGATGCGCAGCATCCCCAGTCCTCGGCGGACAGCCGGGTAGCCCAACTTACCCAGGGGTTTGATTGTGTTCTCCGTCGCCTCC CCCF#2ADHHHHHJIIJJJJIJJJJJJJJJIJJJJJIJJJJJJJJDDDDDDDDDDBBDDDDDDDDDDDDDDDDDDDBBBDDDDDDDDCEDCB?ABDBDD1 NM:i:1 AS:i:98 XS:i:0
HWI-1KL149:20:C1CU7ACXX:4:1101:11658:2375 145 7 35293129 60 100M = 35293037 -192 TCGCCTCCTTCTCCTTAGAGCCGCCGCTCGACATGAGCGCGGCAATGGAGAAGGCGTTGGCCCGGGAGGAGAGTTGGGGCTTGGGGGACGCCGTGAACTC DDBBBDDCA8DDDCC@DDDBDDDDDDDDDDEDDDDDDDDDDDDEDDDDCCDDDDFFFHHJJJJJJJJJHJJJJJJJJJJJJJJJJJJHHHHHFFFFDCBB NM:i:1 AS:i:95 XS:i:20
HWI-1KL149:20:C1CU7ACXX:4:1101:12054:2300 97 2 40401764 60 100M = 40401971 307 CAAGNTACATAAGATGTAGGTTTGGATTGATGGTTAAGGGTATTTGGGGAAAAATAAGGAACATTAAAAAAATAAGTCTTACCAAACAGGTATTTTCCTT CCCF#4=DHHHHHIJJHIJJHIJJJHIJJIIJJEGHJJJJDGIJJJJJJGHHIJJIIJJJIIIIJIJJHHFDEDECDDEEDDDDDDDDDDCCDEEEDDCD NM:i:1 AS:i:98 XS:i:0
HWI-1KL149:20:C1CU7ACXX:4:1101:12054:2300 145 2 40401971 60 100M = 40401764 -307 TTGTGAAGCCACCTAAAAAAGAAAAAAACAACAACAAATGTTATAATTTGACACTCTACATAACAAATACCAGTGACATCAGACTGCCTGACAACCCACC @CC@DDDDDDDDDDDDDDDDDDFHHHHEIIHIIIJJJIJJJJJJJJJIHDIJJJJJIIJJJJIJJJJHFJJJJJJIJJJJJJJJJJJHHHHHFFFFDBCB NM:i:0 AS:i:100 XS:i:0
And the ouput of the Native C version:
bwa mem human_g1k_v37.fasta tmp1.fq tmp2.fq 2> /dev/null | grep -v -E '^@'
HWI-1KL149:20:C1CU7ACXX:4:1101:13638:2192 121 1
