SkillAgentSearch skills...

SigProfilerTopography

SigProfilerTopography allows evaluating the effect of chromatin organization, histone modifications, transcription factor binding, DNA replication, and DNA transcription on the activities of different mutational processes. SigProfilerTopography elucidates the unique topographical characteristics of mutational signatures.

Install / Use

/learn @SigProfilerSuite/SigProfilerTopography

README

License Docs Build Status Uptime Robot status

schematic

SigProfilerTopography

SigProfilerTopography allows evaluating the effect of chromatin organization, histone modifications, transcription factor binding, DNA replication, and DNA transcription on the activities of different mutational processes. SigProfilerTopography elucidates the unique topographical characteristics of mutational signatures. The tool seamlessly integrates with other SigProfiler tools including SigProfilerMatrixGenerator, SigProfilerSimulator, and SigProfilerAssignment. Detailed documentation can be found at: https://sigprofilersuite.github.io/SigProfilerTopography/

SigProfilerTopography provides topography analyses for mutations such as

  • Single Base Substitutions (SBS)
  • Doublet Base Substitutions (DBS)
  • Small insertions and deletions, indels (ID)

and carries out following analyses:

  • Epigenomics Occupancy (e.g.: Histone Modifications, Transcription Factors, Open Chromatin Regions)
  • Nucleosome Occupancy
  • Replication Timing
  • Replication Strand Asymmetry
  • Transcription Strand Asymmetry
  • Genic versus Intergenic Regions
  • Strand-coordinated Mutagenesis

PREREQUISITES

The framework is written in PYTHON, however, it also requires the following software with the given versions (or newer):

  • PYTHON version 3.8 or newer
  • WGET version 1.9 or RSYNC if you have a firewall

QUICK START GUIDE

This section will guide you through the minimum steps required to run SigProfilerTopography:

  1. For most recent stable PyPI version of this tool, install the python package using pip:
    $ pip install SigProfilerTopography
    
    If you have installed SigProfilerTopography before, upgrade using pip:
    $ pip install SigProfilerTopography --upgrade
    
<!--- ```To install the current version of this Github repo, git clone this repo or download the zip file. Unzip the contents of SigProfilerTopography-master.zip or the zip file of a corresponding branch. In the command line, please run the following: ```bash $ cd SigProfilerTopography-master $ pip install . ``` ``` -->
  1. Imports the example data that is provided by SigProfilerTopography. This data can be used to run the example program and ensure that the environment is set up.

    >>> from SigProfilerTopography import Topography as topography
    >>> topography.install_example_data()
    

    Imports 21BRCA.zip under the current working directory. Once 21BRCA.zip has been downloaded, unzip the file. The unzipped 21BRCA folder contains two folders: 21BRCA_vcfs and 21BRCA_probabilities. The folder 21BRCA_vcfs contains 21 VCF files (one per each breast cancer sample) in GRCh37 and 21BRCA_probabilities` contains probability matrix files for single base substitutions and doublet base substitutions.

  2. Install your desired reference genome from the command line/terminal as follows (available reference genomes are: GRCh37, GRCh38, mm9, and mm10):

    $ python
    >>> from SigProfilerMatrixGenerator import install as genInstall
    >>> genInstall.install('GRCh37')
    

    This will install the human 37 assembly as a reference genome.

  3. Imports the nucleosome library file that is necessary for nucleosome occupancy analyses. Next, choose the genome that you would like to import:

    >>> from SigProfilerTopography import Topography as topography
    >>> topography.install_nucleosome("GRCh37")
    

    By default, install_nucleosome imports nucleosome data of K562 cell line for GRCh37 and GRCh38 genome assemblies.

  4. Imports the open chromatin library file that is necessary for epigenomics analyses. Next, choose the genome that you would like to import:

    >>> from SigProfilerTopography import Topography as topography
    >>> topography.install_atac_seq("GRCh37")
    

    By default, install_atac_seq imports open chromatin data of breast epithelium tissue for GRCh37 and left lung tissue for GRCh38.

  5. Imports the replication timing library file that is necessary for replication timing analyses. Next, choose the genome that you would like to import:

    >>> from SigProfilerTopography import Topography as topography
    >>> topography.install_repli_seq("GRCh37")
    

    By default, install_repli_seq imports replication time data of MCF7 and IMR90 for GRCh37 and GRCh38, respectively.

  6. Conducts topography analyses for your samples. Here is an example of a call to runAnalyses that generates all of the different analyses.

    >>> from SigProfilerTopography import Topography as topography
    
    >>> genome = "GRCh37"
    >>> inputDir = "path/to/21BRCA_vcfs"
    >>> outputDir = "path/to/results"
    >>> jobname = "21BRCA_SPT"
    >>> numofSimulations = 5
    
    >>> if __name__ == "__main__":
    		topography.runAnalyses(genome, 
                       inputDir, 
                       outputDir, 
                       jobname, 
                       numofSimulations, 
                       epigenomics=True,
                       nucleosome=True, 
                       replication_time=True, 
                       strand_bias=True, 
                       processivity=True)
    

    If probability files are not provided, SigProfilerTopography utilizes SigProfilerAssignment by default to attribute the activities of known reference mutational signatures from the Catalogue Of Somatic Mutations In Cancer (COSMIC) database to each examined sample.

  7. Here is an example of a call to runAnalyses with probability files using the 21 VCF files located in the subfolder 21BRCA_vcfs as input and providing the probability files in the subfolder 21BRCA_probabilities.

    >>> from SigProfilerTopography import Topography as topography
    
    >>> genome = "GRCh37"
    >>> inputDir = "path/to/21BRCA_vcfs"
    >>> outputDir = "path/to/results"
    >>> jobname = "21BRCA_SPT_with_probability_matrices"
    >>> numofSimulations = 5
    >>> sbs_probability_file = "path/to/21BRCA_probabilities/COSMIC_SBS96_Decomposed_Mutation_Probabilities.txt"
    >>> dbs_probability_file = "path/to/21BRCA_probabilities/COSMIC_DBS78_Decomposed_Mutation_Probabilities.txt"
    
    >>> if __name__ == "__main__":
    		topography.runAnalyses(genome, 
                       inputDir, 
                       outputDir, 
                       jobname, 
                       numofSimulations, 
                       sbs_probabilities = sbs_probability_file,
                       dbs_probabilities = dbs_probability_file,
                       epigenomics=True,
                       nucleosome=True, 
                       replication_time=True, 
                       strand_bias=True, 
                       processivity=True)
    

    SigProfilerTopography utilizes probability matrix files containing the probability of each signature to cause a specific mutation type in a cancer sample.

View the table below for the full list of runAnalyses parameters.

PARAMETERS | Category | Parameter | Variable Type | Parameter Description | | ------ | ----------- | ----------- | ----------- | | Required | | | | | | genome | String | The reference genome used for the topography analyses. Accepted values include: {"GRCh37", "GRCh38", "mm10"}. | | | inputDir | String | The path to the directory containing the input files. SigProfilerTopography accepts all input files that SigProfilerMatriXGenerator can process. | | | outputDir | String | The path of the directory where the output will be saved. If this directory doesn't exist, a new one will be created. | | | jobname | String | The name of the directory containing all of the outputs under outputDir/jobname. If this directory doesn't exist, a new one will be created. | | | numofSimulations | Integer | The number of simulations to be created. | | Optional | | | | | | epigenomics | Boolean | Generate epigenomics analysis when True. By default, this is set to False. | | | nucleosome | Boolean | Generate nucleosome occupancy analysis when True. By default, this is set to False. | | | replication_time | Boolean | Generate replication timing analysis when True. By default, this is set to False. | | | strand_bias | Boolean | Generate replication and transcription strand asymmetry analysis when True. By default, this is set to False. | | | replication_strand_bias | Boolean | Generate replication strand asymmetry analysis when True. By default, this is set to False. | | | transcription_strand_bias | Boolean | Generate transcription strand asymmetry analysis (including genic versus intergenic regions) when True. By default, this is set to False. | | | processivity | Boolean | Generate strand-coordinated mutagenesis when True. By default, this is set to False. | | | epigenomics_files | List of Strings | Python list of paths for each epigenomics library file utilized in the epigenomics analysis. By default, epigenomics files of open chromatin, CTCF and histone modifications attained from "breast_epithelium" and "lung" tissue are utilized for GRCh37 and GRCh38, respectively. | | | epigenomics_dna_elements | List of Strings | Python list of unique DNA element names for the epigenomics files utilized in the epigenomics analysis. Each DNA element name must be contained in at least one epigenomics library filename. E.g., DNA element is 'CTCF' for the epigenomics file of 'ENCFF782GCQ_breast_epithelium_Normal_CTCF-human.bed'. By default, DNA elements of ['H3K27me3', 'H3K36me3', 'H3K9me3', 'H3K27ac', 'H3K4me1', 'H3K4me3', 'C

Related Skills

View on GitHub
GitHub Stars24
CategoryDevelopment
Updated3d ago
Forks2

Languages

Python

Security Score

95/100

Audited on Mar 18, 2026

No findings