Fgbio
Tools for working with genomic and high throughput sequencing data.
Install / Use
/learn @fulcrumgenomics/FgbioREADME
fgbio
A set of tools to analyze genomic data with a focus on Next Generation Sequencing.
<p> <a href="https://fulcrumgenomics.com"><img src=".github/logos/fulcrumgenomics.svg" alt="Fulcrum Genomics" height="100"/></a> </p>Visit us at Fulcrum Genomics to learn more about how we can power your Bioinformatics with fgbio and beyond.
<a href="mailto:contact@fulcrumgenomics.com?subject=[GitHub inquiry]"><img src="https://img.shields.io/badge/Email_us-brightgreen.svg?&style=for-the-badge&logo=gmail&logoColor=white"/></a> <a href="https://www.fulcrumgenomics.com"><img src="https://img.shields.io/badge/Visit_Us-blue.svg?&style=for-the-badge&logo=wordpress&logoColor=white"/></a>
This readme document is mostly for developers/contributors and those attempting to build the project from source. Detailed user documentation is available on the project website including tool usage and documentation of metrics produced. Detailed developer documentation can be found here.
<!---toc start-->- Quick Installation
- Goals
- Overview
- List of tools
- Building
- Command line
- Include fgbio in your project
- Contributing
- Authors
- License
- Sponsorship
Quick Installation
The conda package manager (configured with bioconda channels) can be used to quickly install fgbio:
conda install fgbio
To install fgbio without extra dependencies (e.g. R), use the command:
conda install fgbio-minimal
Goals
There are many toolkits available for analyzing genomic data; fgbio does not aim to be all things to all people but is specifically focused on providing:
- Robust, well-tested tools.
- An easy to use command-line.
- Clear and thorough documentation for each tool.
- Open source development for the benefit of the community and our clients.
Overview
Fgbio is a set of command line tools to perform bioinformatic/genomic data analysis.
The collection of tools within fgbio are used by our customers and others both for ad-hoc data analysis and within production pipelines.
These tools typically operate on read-level data (ex. FASTQ, SAM, or BAM) or variant-level data (ex. VCF or BCF).
They range from simple tools to filter reads in a BAM file, to tools to compute consensus reads from reads with the same molecular index/tag.
See the list of tools for more detail on the tools
List of tools
For a full list of available tools please see the tools section of the project website.
Below we highlight a few tools that you may find useful.
- Tools for working with Unique Molecular Indexes (UMIs, aka Molecular IDs or Molecular Barcodes):
- Annotate/Extract Umis from read-level data:
FastqToBam,AnnotateBamWithUmis,ExtractUmisFromBam, andCopyUmiFromReadName. - Manipulate read-level data containing Umis:
CorrectUmis,GroupReadsByUmi,CallMolecularConsensusReads,CallDuplexConsensusReads, andFilterConsensusReads. - Collect metrics and review consensus reads:
CollectDuplexSeqMetricsandReviewConsensusVariants.
- Annotate/Extract Umis from read-level data:
- Tools to manipulate read-level data:
- Fastq Manipulation:
FastqToBam,ZipperBams, andDemuxFastqs(see[fqtk][fqtk-link], our rust re-implementation for sample demultiplexing). - Filter, clip, randomize, sort, and update metadata for read-level data:
FilterBam,ClipBam,RandomizeBam, [SortBam][fgbio-sortbam-link],SetMateInformationandUpdateReadGroups.
- Fastq Manipulation:
- Tools for quality control assessment:
- Detailed substitution error rate evaluation: [
ErrorRateByReadPosition][fgbio-errorratebyreadposition-link]. - Sample pooling QC: [
EstimatePoolingFractions]: [fgbio-estimatepoolingfractions-link]. - Splice-aware insert size QC for RNA-seq libraries: [
EstimateRnaSeqInsertSize][fgbio-estimaternaseqinsertsize-link].
- Detailed substitution error rate evaluation: [
- Tools for adding or manipulating alternate contig names:
- Extract contig names from an NCBI Assembly Report:
CollectAlternateContigNames. - Update contig names in common file formats:
UpdateFastaContigNames,UpdateVcfContigNames,UpdateGffContigNames,UpdateIntervalListContigNames, [UpdateDelimitedFileContigNames][fgbio-updatedelimitedfilecontignames-link].
- Extract contig names from an NCBI Assembly Report:
- Miscellaneous tools:
- Pick molecular indices (ex. sample barcodes, or molecular indexes): [
PickIlluminaIndices][fgbio-pickilluminaindices-link] and [PickLongIndices][fgbio-picklongindices-link]. - Find technical/synthetic, or switch-back sequences in read-level data: [
FindTechnicalReads][fgbio-findtechnicalreads-link] and [FindSwitchbackReads][fgbio-findswitchbackreads-link]. - Make synthetic mixture VCFs: [
MakeMixtureVcf][fgbio-makemixturevcf-link] and [MakeTwoSampleMixtureVcf][fgbio-maketwosamplemixturevcf-link].
- Pick molecular indices (ex. sample barcodes, or molecular indexes): [
[fgbio-updatedelimited
Related Skills
node-connect
330.3kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
81.3kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
330.3kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
81.3kCommit, push, and open a PR
