SkillAgentSearch skills...

Gonetics

Go / Golang Bioinformatics Library

Install / Use

/learn @pbenner/Gonetics

README

Gonetics

Gonetics is a bioinformatics library for the Go programming language (golang). It provides native data structures for handling genetic data and methods for handling common file formats such as BAM, GTF, BED, BigWig, and Wig. The documentation is available here.

Tools

Executables are available here.

| Tool | Description | | ------------------------ | ------------------------------------------------------------------------ | | bamCheckBin | check bin records of a bam file | | bamGenome | print the genome (sequence table) of a bam file | | bamToBigWig | convert bam to bigWig (estimate fragment length if required) | | bamView | print contents of a bam file | | bigWigEditChromNames | edit chromosome names of a bigWig file (i.e. replace chr1 by just 1) | | bigWigExtract | extract regions from a bigWig file and save them as table or bigWig file | | bigWigExtractChroms | extract a subset of the chromosomes from a bigWig file | | bigWigGenome | print the genome (sequence table) of a bigWig file | | bigWigHistogram | compute a histogram of the values in a bigWig file | | bigWigNil | read bigWig and output it to a new file | | bigWigMap | apply a function to a set of bigWig files | | bigWigPositive | simple peak finding (i.e. every region with a value above a threshold) | | bigWigQuantileNormalize | quantile normalize a bigWig file to a reference | | bigWigQuery | retrieve data from a bigWig file | | bigWigQuerySequence | retrieve sequences from a bigWig file | | bigWigStatistics | compute summary statistics of a bigWig file | | chromHmmTablesToBigWig | convert chromHmm output (posteriors / binariezed bams) to bigWig | | countKmers | count kmers in a set of DNA sequences | | drawGenomicRegions | draw random genomic regions | | fastaExtract | extract regions from a fasta file | | fastaUnresolvedRegions | identify regions that are not resolved (i.e. stretches of 'NNNN...') | | gtfToBed | convert GTF files to Bed6 format | | memeExtract | extract PWM/PPM motifs from MEME/DREME xml files | | observedOverExpectedCpG | compute CpG scores as defined by Gardiner-Garden and Frommer (1987) | | pwmScanSequences | scan sequences for PWM hits | | pwmScanRegions | scan regions for multiple PWMs | | segmentationDifferential | extract differential regions from multiple chromatin segmentations |

GRanges

Create a GRanges object with three ranges on the first chromosome:

  seqnames := []string{"chr1", "chr1", "chr1"}
  from     := []int{100000266, 100000271, 100000383}
  to       := []int{100000291, 100000296, 100000408}
  strand   := []byte{'+', '+', '-'}

  granges  := NewGRanges(seqnames, from, to, strand)
  fmt.Println(granges)
  seqnames                 ranges strand
1     chr1 [100000266, 100000291)      +
2     chr1 [100000271, 100000296)      +
3     chr1 [100000383, 100000408)      -

Add some meta data to the GRanges object:

  granges.AddMeta("data", []float64{1.0, 2.0, 3.0})
  seqnames                 ranges strand |          data
1     chr1 [100000266, 100000291)      + |      1.000000
2     chr1 [100000271, 100000296)      + |      2.000000
3     chr1 [100000383, 100000408)      - |      3.000000

Find overlaps of two GRanges objects:

  rSubjects := NewGRanges(
    []string{"chr4", "chr4", "chr4", "chr4"},
    []int{100, 200, 300, 400},
    []int{150, 250, 350, 450},
    []byte{})
  rQuery := NewGRanges(
    []string{"chr1", "chr4", "chr4", "chr4", "chr4", "chr4"},
    []int{100, 110, 190, 340, 390, 450},
    []int{150, 120, 220, 360, 400, 500},
    []byte{})

  queryHits, subjectHits := FindOverlaps(rQuery, rSubjects)
  queryHits: [1 2 3 4 5]
subjectHits: [0 1 2 3 3]

Genes

Download gene list from UCSC and export it to file:

  genes := ImportGenesFromUCSC("hg19", "ensGene")
  genes.WriteTable("hg19.knownGene.txt", true, false)
  fmt.Println(genes)
                 names seqnames          transcripts                  cds strand
     1 ENST00000456328     chr1 [   11868,    14409) [   14409,    14409)      +
     2 ENST00000515242     chr1 [   11871,    14412) [   14412,    14412)      +
     3 ENST00000518655     chr1 [   11873,    14409) [   14409,    14409)      +
     4 ENST00000450305     chr1 [   12009,    13670) [   13670,    13670)      +
     5 ENST00000423562     chr1 [   14362,    29370) [   29370,    29370)      -
                   ...      ...                  ...                  ...       
204936 ENST00000420810     chrY [28695571, 28695890) [28695890, 28695890)      +
204937 ENST00000456738     chrY [28732788, 28737748) [28737748, 28737748)      -
204938 ENST00000435945     chrY [28740997, 28780799) [28780799, 28780799)      -
204939 ENST00000435741     chrY [28772666, 28773306) [28773306, 28773306)      -
204940 ENST00000431853     chrY [59001390, 59001635) [59001635, 59001635)      +

Import expression data from a GTF file:

  genes.ImportGTF("genesExpr_test.gtf.gz", "transcript_id", "FPKM", false)
                 names seqnames          transcripts                  cds strand |          expr
     1 ENST00000456328     chr1 [   11868,    14409) [   14409,    14409)      + |      0.073685
     2 ENST00000515242     chr1 [   11871,    14412) [   14412,    14412)      + |      0.000000
     3 ENST00000518655     chr1 [   11873,    14409) [   14409,    14409)      + |      0.000000
     4 ENST00000450305     chr1 [   12009,    13670) [   13670,    13670)      + |      0.000000
     5 ENST00000423562     chr1 [   14362,    29370) [   29370,    29370)      - |     10.413931
                   ...      ...                  ...                  ...        |           ...
204936 ENST00000420810     chrY [28695571, 28695890) [28695890, 28695890)      + |      0.000000
204937 ENST00000456738     chrY [28732788, 28737748) [28737748, 28737748)      - |      0.000000
204938 ENST00000435945     chrY [28740997, 28780799) [28780799, 28780799)      - |      0.000000
204939 ENST00000435741     chrY [28772666, 28773306) [28773306, 28773306)      - |      0.000000
204940 ENST00000431853     chrY [59001390, 59001635) [59001635, 59001635)      + |      0.000000

Peaks

Import peaks from a MACS xls file:

  peaks := ImportXlsPeaks("peaks_test.xls")
   seqnames             ranges strand |  abs_summit     pileup -log10(pvalue) fold_enrichment -log10(qvalue)
 1       2L [   5757,    6001)      * |        5865  33.000000      19.809300        6.851880      17.722200
 2       2L [  47233,   47441)      * |       47354  36.000000      19.648200        6.263150      17.566200
 3       2L [  66379,   67591)      * |       66957 252.000000     350.151250       50.986050     346.525450
 4       2L [  72305,   72838)      * |       72525 170.000000     208.558240       34.460930     205.734390
 5       2L [  72999,   73218)      * |       73130  25.000000      12.711700        5.239670      10.700880
        ...                ...        |         ...        ...            ...             ...            ...
12       2R [3646319, 3646794)      * |     3646442  37.000000      23.176910        7.455710      21.063850
13       2R [3666770, 3668041)      * |     3667119 215.000000     279.229060       41.551060     276.108090
14       2R [3668231, 3668441)      * |     3668363  22.000000       9.943110        4.476950       7.976070
15       2R [3670063, 3670393)      * |     3670180  38.000000      19.474590        5.901360      17.393440
16       2R [3670470, 3670927)      * |     3670719 227.000000     305.243350       45.831180     301.974760

Track

Import ChIP-seq reads from bed files and create a track with the normalized signal:

  fmt.Fprintf(os.Stderr, "Parsing reads (treatment) ...\n")
  treatment1 := GRanges{}
  treatment1.ImportBed6("SRR094207.bed")
  treatment2 := GRanges{}
  treatment2.ImportBed6("SRR094208.bed")
  fmt.Fprintf(os.Stderr, "Parsing reads (control)   ...\n")
  control1   := GRanges{}
  control1.ImportBed6("SRR094215.bed")
  control2   := GRanges{}
  control2.ImportBed6("SRR094216.bed")

  genome  := Genome{}
  genome.Import("Data/hg19.genome")
  d       := 200 // d=200 (see *_peaks.xls)
  binsize := 100 // binsize of the track
  pcounts := 1   // pseudocounts
  track := NormalizedTrack("H3K4me3",
    []GRanges{treatment1, treatment2}, []GRanges{control1, control2},
    genome, d, binsize, pcounts, pcounts, false)

Export track to wig or bigWig:

  track.WriteWiggle("track.wig", "track description")
  track.WriteBigWig("track.bw",  "track description")

BigWig Files

BigWig files contain data in a binary format optimized for fast random access. In addition to the raw data, bigWig files typically contain several zoom levels for which the data has been summarized. The BigWigReader class allows to query data and it automatically selects an appropriate zoom level for the g

View on GitHub
GitHub Stars45
CategoryDevelopment
Updated1y ago
Forks1

Languages

Go

Security Score

80/100

Audited on Dec 4, 2024

No findings