SkillAgentSearch skills...

I002C

Telomere-to-Telomere diploid Indian Genome

Install / Use

/learn @LHG-GG/I002C
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Telomere-to-Telomere diploid Indian Genome

We have sequenced a EBV-immortalized human male cell line from SG10k samples on different platforms. The data contains ~106x of Pacbio HiFi, ~64x of Oxford Nanopore (ONT) Duplex, ~222x ONT Ultralong (ULONT), ~100x MGI WGS short reads, ~33x Illumina WGS short reads and ~120x Omni-C for the child sample (I002C). Statistics of BioNano will be added soon.

For parental samples: SampleType | SampleID | HiFi (REVIO) | Duplex | MGI ---|:---:|:---:|:---:|:---:| Father|I002A|60x|21x|35x Mother|I002B|61x|20x|37x

The data statistics are provided in an excel

Reads

Reads can be downloaded from SRA PRJNA1150503. Please cite the below article if you use the dataset.

Sarashetti, P., Lipovac, J., Tomas, F. et al. Evaluating data requirements for high-quality haplotype-resolved genomes for creating robust pangenome references. Genome Biol 25, 312 (2024). https://doi.org/10.1186/s13059-024-03452-y

Assembly releases

v0.7

The latest version of assembly with combined QV of 82.

Annotations

:black_small_square: Gene Annotations

| File |Description|Link| |:----|:----|:----:| |I002C_Maternal_v0.7_LiftOver.gff3.gz|GRCh38 Gencode (v48) annotation for maternal haplotype|⬇️download| |I002C_Maternal_v0.7_bambu.gtf.gz|bambu annotations for maternal haplotype|⬇️download| |I002C_Paternal_v0.7_LiftOver.gff3.gz|GRCh38 Gencode (v48) annotation for paternal haplotype|⬇️download| |I002C_Paternal_v0.7_bambu.gtf.gz|bambu annotations for maternal haplotype|⬇️download|

:black_small_square: Repeat Annotations

Method: RepeatMasker (v4.1.5) using Dfam (v3.7)

| File |Description|Link| |:----|:----|:----:| |I002C_Maternal_v0.7.fasta.out.gz|Maternal RM annotation in dafault .out format|⬇️download| |I002C_Maternal_v0.7.fasta_rm.bed.gz|Maternal RM annotation in .bed format|⬇️download| |I002C_Paternal_v0.7.fasta.out.gz|Paternal RM annotation in dafault .out format|⬇️download| |I002C_Paternal_v0.7.fasta_rm.bed.gz|Paternal RM annotation in .bed format|⬇️download| |I002C_Maternal_v0.7_Centromere.bed|Maternal centromere annotations|⬇️download| |I002C_Paternal_v0.7_Centromere.bed|Paternal centromere annotations|⬇️download| |I002C_centromere_methylation_dip_coordinates.bed|Centromere methylation dip regions (Maternal and Paternal combined)|⬇️download|

:black_small_square: Coverage issues files

| File |Description|Link| |:----|:----|:----:| |I002C_Maternal_v0.7_Flagger_regions.bed.gz|Maternal issues track based on Flagger|⬇️download| |I002C_Paternal_v0.7_Flagger_regions.bed.gz|Paternal issues track based on Flagger|⬇️download| |I002C_Maternal_v0.7_coverage_issues.bed|Maternal issues track based on coverage analysis^1|⬇️download| |I002C_Paternal_v0.7_coverage_issues.bed|Paternal issues track based on coverage analysis|⬇️download|

:black_small_square: Chain files

| File |Description|Link| |:----|:----|:----:| |GRCh38.p14 (GENCODE v48) <-> I002C||| |HG38ToMat.chain.gz|hg38 to maternal haplotype|⬇️download| |HG38ToPat.chain.gz|hg38 to paternal haplotype|⬇️download| |HG38ToHaploid.chain.gz|hg38 to haploid haplotype|⬇️download| |MatToHG38.chain.gz|Maternal haplotype to hg38|⬇️download| |PatToHG38.chain.gz|Paternal haplotype to hg38|⬇️download| |HaploidToHG38.chain.gz|Haploid haplotype to hg38|⬇️download| |CHM13 (v2.0) <-> I002C||| |CHM13ToMat.chain.gz|CHM13 to maternal haplotype|⬇️download| |CHM13ToPat.chain.gz|CHM13 to paternal haplotype|⬇️download| |CHM13ToHaploid.chain.gz|CHM13 to haploid haplotype|⬇️download| |MatToCHM13.chain.gz|Maternal haplotype to CHM13|⬇️download| |PatToCHM13.chain.gz|Paternal haplotype to CHM13|⬇️download| |HaploidToCHM13.chain.gz|Haploid haplotype to CHM13|⬇️download|

:black_small_square: Methylation files

Method: HiFi data was processed using pb-cpg-tools and ONT data using modbam2bed with default parameters. The below bed file exclude reference positions with no methylation calling. | File |Description|Link| |:----|:----|:----:| |I002C_PacBio_HiFi_REVIO_5mC.bedMethyl.gz|PacBio HiFi genome-wide methylation bed file against I002C diploid genome|⬇️download| |I002C_ONT_R10_5mC.bedMethyl.gz|ONT R10 simplex genome-wide methylation bed file against I002C diploid genome|⬇️download|

v0.4

This assembly version results from performing two rounds of polishing, as outlined in the procedure from[^2].

The v0.4 assembly has a improved QV values, with Maternal at 72.35 and Paternal at 70.97. QV values are estimated using hybrid k-mers generated from Pacbio HiFi and MGI WGS data as described in[^3]. Chromosome wise QV values are listed in an excel file.

v0.2

This version of the assembly contains Telomere-to-Telomere chromsomes for both maternal and paternal haplotypes including a mitochondria. In this version of the genomes, rDNAs have not been resolved.

 |Maternal|Paternal ---|:---:|:---: T2T Chromosomes|23|23 Size|3,022,465,370|2,934,829,127 NG50|154,891,367|146,273,588 %GC|40.82|40.79

Assembly files (zipped):

Downloading

If you wish to download the files using wget, you may use wget -O <fileName> -U "Mozilla/5.0" --no-check-certificate <link>

Assembly QV is calculated with yak tool using I002C MGI WGS dataset. Per chromosome QV values are provided in an excel file

Note: The data available on this GitHub page can be accessed either from LHG or LBCB GitHub pages

[^2]: Mc Cartney, Ann M., et al. "Chasing perfection: validation and polishing strategies for telomere-to-telomere genome assemblies." Nature methods 19.6 (2022): 687-695. [^3]: https://github.com/arangrhie/T2T-Polish/tree/master/merqury

Related Skills

View on GitHub
GitHub Stars15
CategoryDevelopment
Updated1mo ago
Forks0

Security Score

75/100

Audited on Mar 5, 2026

No findings