SkillAgentSearch skills...

Gtz

A high performance and compression ratio compressor for genomic data, powered by GTXLab of Genetalks.

Install / Use

/learn @Genetalks/Gtz

README

GTX.Zip Professional Version (Latest Version GTZ 4.x)

Please see the GTX.Zip website, where you can download the lastest version of GTX.Zip, read user manual, ask questions, and receive technical support.This github is no longer maintained.

中文说明.

<table style="width:100%"> <tr> <td> <h3>QQ group(s): 934492381 </h3> <img src="https://i.loli.net/2018/12/10/5c0df947eddde.png" alt="GTX.Zip QQ groups"/> </td> <td> <h3>WeChat group(s):</h3> <img src="https://i.loli.net/2018/12/10/5c0e0afa8d12d.jpg" alt="GTX.Zip WebChat groups"/> </td> </tr> </table> Powered by GTXLab of Genetalks.

Product Series<span id="product-series"></span>

Product | Version | Description | How to Get ----|---- | -------- | -------- GTX.Zip|V4.x|Companies, Institutions and individual users that with large local sequencing data|Download

The following are outdated

Index<span id="index"></span>

What is GTX.Zip?<span id="intro"></span>

GTX.Zip (GTZ for short) is a high performance lossless compression tool for arbitrary files, and has a particularly high compression rate for genetic data which can compress the FASTQ to 2% ( almost 1/6 ~1/8 of fastq.gz ) of the original size even at the speed of 1100MB/s for fastq file. GTX.Zip also support to recompress fastq.gz file directly.

-Back to Top-


Product Series<span id="product-series"></span>

Product | Version | Description | How to Get ----|---- | -------- | -------- GTX.Zip Professional|V3.0.0|Companies, Institutions and individual users that with large local sequencing data|Install GTX.Zip Enterprise|V1.0.1|Large-scale enterprises and data centers that with PB-level sequencing data and require distributed compression by their own computing clusters|Contact Us GTX.Zip Cloud|V1.0.1|Companies that with large amounts of sequencing data distribution and storage in the cloud| http://gtz.io

-Back to Top-


Supported Bioinformatic Analysis Softwares<span id="supported-softwares"></span>

  • BWA 0.7 for GTX.Zip is the the most widely used software package for mapping DNA sequences that can input XXX.gtz file directly. It consists of two softwares : bwa 0.7 and bwa-opt 0.7.
    • bwa-opt 0.7 is the optimized version that is about 30% faster than standard bwa, and its mapping results are completely consistent with those of standard bwa.
  • BOWTIE / BOWTIE2 for GTX.Zip is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. It can input XXX.gtz file directly, and You can use this tool as if you are using the official version.
    • BOWTIE for GTX.Zip based on BOWTIE 1.2.2 version.
    • BOWTIE2 for GTX.Zip based on BOWTIE2 2.3.4.3 version.

-Back to Top-


Feature<span id="feature"></span>

GTX.Zip compressor system features:

  • High Compression Ratio: The system implements Context Model compression technology, with a variety of optimized predicting model, and balancing the system concurrent and memory resources consumption, thus achieving a extreme high compression rate. For FASTQ files, GTX.Zip is capable to compress the original fastq file to 2.53%. The compression rate of GTX.Zip is about 3-6 times of gzip compressor which could save up to 80% storage space and transfer costs.

Data List|Compression rate of GTX.Zip|Compression rate of Fastq.gz ---|:--:|---: Nova_wes_1.fq|2.53%|17.15% Nova_wes_2.fq|3.45%|18.34% nova_wgs_1.fq|3.18%|17.55% nova_wgs_2.fq|3.93%|18.66% nova_rna_1.fq|4.56%|17.70% nova_rna_2.fq|5.39%|18.94%

  • High Performance: GTX.Zip fully exploits the concurrency of the CPU, the new Haswell CPU architecture, and the computing power of the new instructions such as AVX2, BMI2, which makes GTX.Zip gain high compression speed even on a ordinary computing server, with the throughput of 1100MB/s for a single compression node. GTX.Zip Enterprise supports large-scale distributed compression.

  • Safety Guarantee: Thanks to its high speed, during the process of GTX.Zip compression, the data decompression and restore test is performed. The compression process will be done only after the data has been confirmed exactly the same as the source data. MD5 validation is performed to ensure data integrity as well.

  • Software Ecology: GTX.Zip provides command line and GUI decompression software for Linux, Mac OSX and Windows. It also provides SDK interfaces in languages such as Python, C, C++, etc. which is convenient for third-party developers to read and write gtz file (GTX.Zip compression format) directly. For example, gtz version of bcl2fastq, fastp and BWA are supported by community now.
    If you want to get these softwares, please go to -GTZ Ecology Softwares-.

  • Nirvana Plan:
    As an enterprise-level software, GTX.Zip has developed a nirvana program for high-availability requirements to ensure that users can decompress compressed data into original data under the extreme condition. The nirvana plan's dual availability protection strategy is as follows:

    • GTX.Zip is multi-site hosted. http://gtz.io website, GitHub and other sites will permanently host all versions of GTX.Zip, to make sure that it is available to the entire network all the time and free of charge at any time.
    • To ensure that compressed data can be restored to original file under any conditions, pre-embedded micro decompression programs could be extract from compressed data first, and then be used to decompress the file.
    • Please click -here- for usage.

-Back to Top-


System Environment Requirements<span id="environment"></span>

  • 64-bit Linux system (CentOS >= 6.1;Ubuntu >= 12.04, < 18.04)

  • On the linux operating system,to achieve good performance, the computing server with 32-core 64GB memory is recommended, or that has the same configuration with the AWS C4.8xlarge machine)

  • 64-bit Windows system (Win7,Win10)

How to Install (Linux) <span id="install"></span>

  • Mode 1: Install directly from the command line(recommended installation method)

Please visit this website to download the installation package
www.gtxlab.com

  • Verify that the installation was successful

Running the following command, the corresponding software version information appears, indicates that the installation was successful

gtz --version

-Back to Top-


Quick Start (Linux)<span id="quick-start"></span>

GTX.Zip Professional needs to be installed on the current machine. If not, please see -How to Install- .

1. Download samples file to be compressed Sample Download: -sample.fq-

<font size=1>* 2GB fastq file, extracted from a real WES data produced by Novaseq</font>

Reference genome Download: -GCF_000001405.37_GRCh38.p11_genomic.fna.gz-

2. Start compression

gtz sample.fq --ref GCF_000001405.37_GRCh38.p11_genomic.fna.gz

<font size=1>* gtz can also directly compress fastq.gz file</font>

3、decompress

gtz -d sample.fq.gtz

How to use <span id="use"></span>

Command navigation:

<table style="width:100%"> <tr> <td> <h3>high compression rate with fasta, Decompress without using fasta anymore(recommended)</h3> <img src="https://i.loli.net/2019/08/21/HXWTwqVya4dMFmL.png" alt=""/> </td> <tr> <tr> <tr> <td> <h3>Higher compression rate, Decompress use the fasta exactly the same as compressing (Note: You and your client must properly store the fasta file for decompression in future)</h3> <img src="https://i.loli.net/2019/08/21/lJNmCwEhFU3XszI.png" alt=""> </td> <tr> <tr> <tr> <td> <h3>compress BAM, Decompress without using fasta anymore(recommended)</h3> <img src="https://i.loli.net/2019/08/21/ejNtE4JimApkxKs.png" alt=""> </td> <tr> <tr> <tr> <td> <h3>Decompress use the fasta exactly the same as compressing (Note: You and your client must properly store the fasta file for decompression in future)</h3> <img src="https://i.loli.net/2019/08/21/BUfoJHru5jQGwpA.png" alt=""> </td> <tr> <tr> <tr> <td> <h3>Lower compression rate than above, but can compress arbitrary files</h3> <img src="https://i.loli.net/2019/08/21/CqTi7a3QA8YDlsV.png"> </td> </tr> </table>

Usage example:

1. Compression fastq/fastq.gz (high-power compression)

1.1 Default compression mode for fastq

gtz /data/nova.fastq.gz --ref /fasta/genomic.fna(.gz)

The ref parameter is used to specify the reference genome fasta file for the nova.fastq.gz corresponding species, and the fasta file also supports the gz format.Note: After compression, and the fasta file is no longer needed when decompress.

1.2 Default compression mode for bam

gtz /data/nova.bam --ref /fasta/genomic.fna(.gz)

The ref parameter is used to specify the reference genome fasta file for the nova.bam corresponding species, and it's necessary. After compression, and the fasta file is no longer needed when decompress.

1.3 Specify the output file name

gtz /data/nova.fastq.gz --ref /fasta/genomic.fna -o /out/nova.gtz

-o parameter specifies the output file name, note that the lowercase

View on GitHub
GitHub Stars169
CategoryDevelopment
Updated1mo ago
Forks40

Security Score

85/100

Audited on Feb 21, 2026

No findings