SkillAgentSearch skills...

SomaticoEP1

Exercício Somático EP1

Install / Use

/learn @renatopuga/SomaticoEP1
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Pipeline somático: Exercício 1

Tempo de Processamento

| Infraestrutura | CPU | Memória (GB) | Storage (GB) | Etapa | Tempo | | -------------- | ---- | ------------ | ------------ | ----------------- | --------------- | | gitpod | 16 | 64 | ?DD 30 | FASTQ -> SAM | 55-61min (chr9) | | | | | | SAM -> BAM | 13-18min | | | | | | BAM -> SORT_BAM | ~6-12min | | | | | | SORT_BAM -> RMDUP | ~12 min | | servidor local | 16 | 32 | SSD 30 | todas as etapas | ~2h (hg19) |

Shotgun e Amplicon

<img width="1438" alt="Screen Shot 2022-10-28 at 22 55 11" src="https://user-images.githubusercontent.com/8321336/198762626-8a788ca0-d5f9-495b-a96b-7ae807ba071b.png">

Workflow

  1. O arquivo no NCBI da amostra WP312 (tumor)

    1. Projeto SRA: https://www.ncbi.nlm.nih.gov/Traces/study/?acc=SRP190048&o=bases_l%3Aa

    2. ID: SRR8856724

    3. Precisa instalar o sratoolkit (dentro do gitpod)

      1. # instalar utilizando o brew install (gitpod)
        brew install sratoolkit
        
        # parallel fastq-dump
        # https://github.com/rvalieris/parallel-fastq-dump
        pip install parallel-fastq-dump
        
        # caso o vdb-config não funcione
        wget -c https://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/3.0.0/sratoolkit.3.0.0-ubuntu64.tar.gz
        tar -zxvf sratoolkit.3.0.0-ubuntu64.tar.gz
        export PATH=$PATH://workspace/somaticoEP1/sratoolkit.3.0.0-ubuntu64/bin/
        echo "Aexyo" | sratoolkit.3.0.0-ubuntu64/bin/vdb-config
        
        
      2. # fastq-dump: comando que faz o download do arquivo utilizando o SRR ID da amostra
        # --sra-id: SRR 
        # --threads 4: paraleliza em 4 partes o download
        # --gzip: fastq compactados
        # --split-files: ele vai separar os arquivos fastq em 1 e 2 (paired)
        time parallel-fastq-dump --sra-id SRR8856724 --threads 4 --outdir ./ --split-files --gzip
        
      3. Google Colab sratoolkit (modo alternativo)

# download do binário ubuntu 64bits
%%bash
wget -c https://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/3.0.0/sratoolkit.3.0.0-ubuntu64.tar.gz
          
# -z unzip
# -x extract
# -v verbose
# -f force?
tar -zxvf sratoolkit.3.0.0-ubuntu64.tar.gz
          
# validate
./sratoolkit.3.0.0-ubuntu64/bin/vdb-config
          
# adicionando o diretorio dos programas no path
export PATH=$PATH:/content/sratoolkit.3.0.0-ubuntu64/bin/
	  
# downlaod parallel
parallel-fastq-dump --sra-id SRR8856724 --threads 6 --outdir ./ --split-files --gzip --tmpdir /content/
  1. AS Referências do Genoma hg38 (FASTA, VCFs)

    1. Os arquivos de Referência: Panel of Normal (PoN), Gnomad AF e Exac common:

      1. https://console.cloud.google.com/storage/browser/gatk-best-practices/somatic-hg38?project=broad-dsde-outreach
      wget -c https://storage.googleapis.com/gatk-best-practices/somatic-hg38/af-only-gnomad.hg38.vcf.gz
      
      wget -c https://storage.googleapis.com/gatk-best-practices/somatic-hg38/af-only-gnomad.hg38.vcf.gz.tbi
      
      wget -c https://storage.googleapis.com/gatk-best-practices/somatic-hg38/1000g_pon.hg38.vcf.gz
      
      wget -c https://storage.googleapis.com/gatk-best-practices/somatic-hg38/1000g_pon.hg38.vcf.gz.tbi
      
      wget -c https://storage.googleapis.com/gatk-best-practices/somatic-hg38/small_exac_common_3.hg38.vcf.gz
      
      wget -c https://storage.googleapis.com/gatk-best-practices/somatic-hg38/small_exac_common_3.hg38.vcf.gz.tbi
      
    2. Arquivo no formato FASTA do genoma humano hg38

      1. Diretório Download UCSC hg38: https://hgdownload.soe.ucsc.edu/goldenPath/hg38/chromosomes/

        1. chr9.fa.gz: https://hgdownload.soe.ucsc.edu/goldenPath/hg38/chromosomes/chr9.fa.gz

          wget -c https://hgdownload.soe.ucsc.edu/goldenPath/hg38/chromosomes/chr9.fa.gz
          
  2. BWA para mapeamento dos arquivos FASTQ (gitpod)

    1. Instalando o BWA no gitpod
    brew install bwa 
    
    1. Google Colab

    Abri um CODE

    !sudo apt-get install bwa
    
  3. BWA index do arquivo chr9.fa.gz

    # descompacta o arquivo gz
    gunzip chr9.fa.gz
    
    # bwa index para indexar o arquivo .fa (~5min)
    bwa index chr9.fa
    
  4. Samtools faidx

    1. Install Gitpod
    brew install samtools 
    
    1. Install Google Colab
    !sudo apt-get install samtools
    
    1. Samtools faidx chr9.fa
    samtools faidx chr9.fa
    
  5. BWA-mem para fazer o alinhamento (FASTQ -> BAM)

NOME=WP312; Biblioteca=Nextera; Plataforma=illumina;

bwa mem -t 16 -M -R "@RG\tID:$NOME\tSM:$NOME\tLB:$Biblioteca\tPL:$Plataforma" \
chr9.fa \
SRR8856724_1.fastq.gz \
SRR8856724_2.fastq.gz > WP312.sam

samtools: fixmate, sort e index (~min)

# -@ numero de cores utilizados
time samtools fixmate -@10 WP312.sam WP312.bam
# ordenando o arquivo fixmate
time samtools sort -O bam -@6 -m2G -o WP312_sorted.bam WP312.bam
# indexando o arquivo BAM ordenado (sort)
time samtools index WP312_sorted.bam
# abordagem de target sequencing utilizamos o rmdup para remover duplicata de PCR
time samtools rmdup WP312_sorted.bam WP312_sorted_rmdup_F4.bam
# indexando o arquivo BAM rmdup
time samtools index WP312_sorted_rmdup_F4.bam 

Alternativa: combinar com pipes: bwa + samtools view e sort

bwa mem -t 10 -M -R "@RG\tID:$NOME\tSM:$NOME\tLB:$Biblioteca\tPL:$Plataforma" chr9.fa SRR8856724_1.fastq.gz SRR8856724_2.fastq.gz | samtools view -F4 -Sbu -@2 - | samtools sort -m4G -@2 -o WP312_sorted.bam

NOTA: se utilizar a opção alternativa, não esquecer de rodar o samtools para as etapas: rmdup e index (do rmdup).

  1. Cobertura - make BED files

Instalação do bedtools

Gitpod

brew install bedtools

Google Colab

!sudo apt-get install bedtools

Gerando BED do arquivo BAM

bedtools bamtobed -i WP312_sorted_rmdup_F4.bam > WP312_sorted_rmdup.bed
bedtools merge -i WP312_sorted_rmdup.bed > WP312_sorted_rmdup_merged.bed
bedtools sort -i WP312_sorted_rmdup_merged.bed > WP312_sorted_rmdup_merged_sorted.bed

Cobertura Média

Pegando arquivo view -F4 do BAM WP312.

git clone https://github.com/circulosmeos/gdown.pl.git
./gdown.pl/gdown.pl  https://drive.google.com/file/d/1pTMpZ2eIboPHpiLf22gFIQbXU2Ow26_E/view?usp=drive_link WP312_sorted_rmdup_F4.bam
./gdown.pl/gdown.pl  https://drive.google.com/file/d/10utrBVW-cyoFPt5g95z1gQYQYTfXM4S7/view?usp=drive_link WP312_sorted_rmdup_F4.bam.bai
bedtools coverage -a WP312_sorted_rmdup_merged_sorted.bed \
-b WP312_sorted_rmdup_F4.bam -mean \
> WP312_coverageBed.bed

Filtro por total de reads >=20

cat WP312_coverageBed.bed | \
awk -F "\t" '{if($4>=20){print}}' \
> WP312_coverageBed20x.bed
  1. GATK4 instalação (MuTect2 com PoN)

Download

wget -c https://github.com/broadinstitute/gatk/releases/download/4.2.2.0/gatk-4.2.2.0.zip

Descompactar

unzip gatk-4.2.2.0.zip 

Gerar arquivo .dict

./gatk-4.2.2.0/gatk CreateSequenceDictionary -R chr9.fa -O chr9.dict

Gerar interval_list do chr9

./gatk-4.2.2.0/gatk ScatterIntervalsByNs -R chr9.fa -O chr9.interval_list -OT ACGT

Converter Bed para Interval_list

./gatk-4.2.2.0/gatk BedToIntervalList -I WP312_coverageBed20x.bed \
-O WP312_coverageBed20x.interval_list -SD chr9.dict

GATK4 - CalculateContamination

./gatk-4.2.2.0/gatk GetPileupSummaries \
	-I WP312_sorted_rmdup_F4.bam  \
	-V af-only-gnomad.hg38.vcf.gz \
	-L WP312_coverageBed20x.interval_list \
	-O WP312.table
./gatk-4.2.2.0/gatk CalculateContamination \
-I WP312.table \
-O WP312.contamination.table

GATK4 - MuTect2 Call

./gatk-4.2.2.0/gatk Mutect2 \
  -R chr9.fa \
  -I WP312_sorted_rmdup_F4.bam \
  --germline-resource af-only-gnomad.hg38.vcf.gz  \
  --panel-of-normals 1000g_pon.hg38.vcf.gz \
  -L WP312_coverageBed20x.interval_list \
  -O WP312.somatic.pon.vcf.gz

GATK4 - MuTect2 FilterMutectCalls

./gatk-4.2.2.0/gatk FilterMutectCalls \
	-R chr9.fa \
	-V WP312.somatic.pon.vcf.gz \
	--contamination-table WP312.contamination.table \
	-O WP312.filtered.pon.vcf.gz

somaticoEP1 - LiftOverVCF

Download dos arquivos VCFs da versão hg19 da análise antiga do Projeto LMA Brasil:

https://drive.google.com/drive/folders/1m2qmd0ca2Nwb7qcK58ER0zC8-1_9uAiE?usp=sharing

WP312.filtered.vcf.gz
WP312.filtered.vcf.gz.tbi

Download liftOver

  • MacOS: https://hgdownload.cse.ucsc.edu/admin/exe/macOSX.x86_64/liftOver
wget -c https://hgdownload.cse.ucsc.edu/admin/exe/macOSX.x86_64/liftOver
  • Linux x86_64: https://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/liftOver
wget -c https://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/liftOver

Alterar a permissão de execução do arquivo liftOver

chmod +x liftOver

Para testar execute:

./liftOver
liftOver - Move annotations from one assembly to another
usage:
   liftOver oldFile map.chain newFile unMapped
oldFile and newFile are in bed format by default, but can be in GFF and
maybe eventually others with the appropriate flags below.
The map.chain file has the old genome as the target and the new genome
as the query.

***********************************************************************
WARNING: liftOver was
View on GitHub
GitHub Stars6
CategoryDevelopment
Updated1y ago
Forks6

Languages

Shell

Security Score

55/100

Audited on Jan 18, 2025

No findings