SigProfilerTopography
SigProfilerTopography allows evaluating the effect of chromatin organization, histone modifications, transcription factor binding, DNA replication, and DNA transcription on the activities of different mutational processes. SigProfilerTopography elucidates the unique topographical characteristics of mutational signatures.
Install / Use
/learn @SigProfilerSuite/SigProfilerTopographyREADME

SigProfilerTopography
SigProfilerTopography allows evaluating the effect of chromatin organization, histone modifications, transcription factor binding, DNA replication, and DNA transcription on the activities of different mutational processes. SigProfilerTopography elucidates the unique topographical characteristics of mutational signatures. The tool seamlessly integrates with other SigProfiler tools including SigProfilerMatrixGenerator, SigProfilerSimulator, and SigProfilerAssignment. Detailed documentation can be found at: https://sigprofilersuite.github.io/SigProfilerTopography/
SigProfilerTopography provides topography analyses for mutations such as
- Single Base Substitutions (SBS)
- Doublet Base Substitutions (DBS)
- Small insertions and deletions, indels (ID)
and carries out following analyses:
- Epigenomics Occupancy (e.g.: Histone Modifications, Transcription Factors, Open Chromatin Regions)
- Nucleosome Occupancy
- Replication Timing
- Replication Strand Asymmetry
- Transcription Strand Asymmetry
- Genic versus Intergenic Regions
- Strand-coordinated Mutagenesis
PREREQUISITES
The framework is written in PYTHON, however, it also requires the following software with the given versions (or newer):
- PYTHON version 3.8 or newer
- WGET version 1.9 or RSYNC if you have a firewall
QUICK START GUIDE
This section will guide you through the minimum steps required to run SigProfilerTopography:
- For most recent stable PyPI version of this tool, install the python package using pip:
If you have installed SigProfilerTopography before, upgrade using pip:$ pip install SigProfilerTopography$ pip install SigProfilerTopography --upgrade
-
Imports the example data that is provided by SigProfilerTopography. This data can be used to run the example program and ensure that the environment is set up.
>>> from SigProfilerTopography import Topography as topography >>> topography.install_example_data()Imports
21BRCA.zipunder the current working directory. Once21BRCA.ziphas been downloaded, unzip the file. The unzipped21BRCAfolder contains two folders:21BRCA_vcfsand21BRCA_probabilities. The folder21BRCA_vcfscontains 21 VCF files (one per each breast cancer sample) in GRCh37 and 21BRCA_probabilities` contains probability matrix files for single base substitutions and doublet base substitutions. -
Install your desired reference genome from the command line/terminal as follows (available reference genomes are: GRCh37, GRCh38, mm9, and mm10):
$ python >>> from SigProfilerMatrixGenerator import install as genInstall >>> genInstall.install('GRCh37')This will install the human 37 assembly as a reference genome.
-
Imports the nucleosome library file that is necessary for nucleosome occupancy analyses. Next, choose the genome that you would like to import:
>>> from SigProfilerTopography import Topography as topography >>> topography.install_nucleosome("GRCh37")By default,
install_nucleosomeimports nucleosome data ofK562cell line for GRCh37 and GRCh38 genome assemblies. -
Imports the open chromatin library file that is necessary for epigenomics analyses. Next, choose the genome that you would like to import:
>>> from SigProfilerTopography import Topography as topography >>> topography.install_atac_seq("GRCh37")By default,
install_atac_seqimports open chromatin data ofbreast epitheliumtissue for GRCh37 andleft lungtissue for GRCh38. -
Imports the replication timing library file that is necessary for replication timing analyses. Next, choose the genome that you would like to import:
>>> from SigProfilerTopography import Topography as topography >>> topography.install_repli_seq("GRCh37")By default,
install_repli_seqimports replication time data ofMCF7andIMR90for GRCh37 and GRCh38, respectively. -
Conducts topography analyses for your samples. Here is an example of a call to
runAnalysesthat generates all of the different analyses.>>> from SigProfilerTopography import Topography as topography >>> genome = "GRCh37" >>> inputDir = "path/to/21BRCA_vcfs" >>> outputDir = "path/to/results" >>> jobname = "21BRCA_SPT" >>> numofSimulations = 5 >>> if __name__ == "__main__": topography.runAnalyses(genome, inputDir, outputDir, jobname, numofSimulations, epigenomics=True, nucleosome=True, replication_time=True, strand_bias=True, processivity=True)If probability files are not provided, SigProfilerTopography utilizes SigProfilerAssignment by default to attribute the activities of known reference mutational signatures from the Catalogue Of Somatic Mutations In Cancer (COSMIC) database to each examined sample.
-
Here is an example of a call to
runAnalyseswith probability files using the 21 VCF files located in the subfolder21BRCA_vcfsas input and providing the probability files in the subfolder21BRCA_probabilities.>>> from SigProfilerTopography import Topography as topography >>> genome = "GRCh37" >>> inputDir = "path/to/21BRCA_vcfs" >>> outputDir = "path/to/results" >>> jobname = "21BRCA_SPT_with_probability_matrices" >>> numofSimulations = 5 >>> sbs_probability_file = "path/to/21BRCA_probabilities/COSMIC_SBS96_Decomposed_Mutation_Probabilities.txt" >>> dbs_probability_file = "path/to/21BRCA_probabilities/COSMIC_DBS78_Decomposed_Mutation_Probabilities.txt" >>> if __name__ == "__main__": topography.runAnalyses(genome, inputDir, outputDir, jobname, numofSimulations, sbs_probabilities = sbs_probability_file, dbs_probabilities = dbs_probability_file, epigenomics=True, nucleosome=True, replication_time=True, strand_bias=True, processivity=True)SigProfilerTopography utilizes probability matrix files containing the probability of each signature to cause a specific mutation type in a cancer sample.
View the table below for the full list of runAnalyses parameters.
PARAMETERS
| Category | Parameter | Variable Type | Parameter Description |
| ------ | ----------- | ----------- | ----------- |
| Required | | | |
| | genome | String | The reference genome used for the topography analyses. Accepted values include: {"GRCh37", "GRCh38", "mm10"}. |
| | inputDir | String | The path to the directory containing the input files. SigProfilerTopography accepts all input files that SigProfilerMatriXGenerator can process. |
| | outputDir | String | The path of the directory where the output will be saved. If this directory doesn't exist, a new one will be created. |
| | jobname | String | The name of the directory containing all of the outputs under outputDir/jobname. If this directory doesn't exist, a new one will be created. |
| | numofSimulations | Integer | The number of simulations to be created. |
| Optional | | | |
| | epigenomics | Boolean | Generate epigenomics analysis when True. By default, this is set to False. |
| | nucleosome | Boolean | Generate nucleosome occupancy analysis when True. By default, this is set to False. |
| | replication_time | Boolean | Generate replication timing analysis when True. By default, this is set to False. |
| | strand_bias | Boolean | Generate replication and transcription strand asymmetry analysis when True. By default, this is set to False. |
| | replication_strand_bias | Boolean | Generate replication strand asymmetry analysis when True. By default, this is set to False. |
| | transcription_strand_bias | Boolean | Generate transcription strand asymmetry analysis (including genic versus intergenic regions) when True. By default, this is set to False. |
| | processivity | Boolean | Generate strand-coordinated mutagenesis when True. By default, this is set to False. |
| | epigenomics_files | List of Strings | Python list of paths for each epigenomics library file utilized in the epigenomics analysis. By default, epigenomics files of open chromatin, CTCF and histone modifications attained from "breast_epithelium" and "lung" tissue are utilized for GRCh37 and GRCh38, respectively. |
| | epigenomics_dna_elements | List of Strings | Python list of unique DNA element names for the epigenomics files utilized in the epigenomics analysis. Each DNA element name must be contained in at least one epigenomics library filename. E.g., DNA element is 'CTCF' for the epigenomics file of 'ENCFF782GCQ_breast_epithelium_Normal_CTCF-human.bed'. By default, DNA elements of ['H3K27me3', 'H3K36me3', 'H3K9me3', 'H3K27ac', 'H3K4me1', 'H3K4me3', 'C
Related Skills
node-connect
328.7kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
81.0kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
328.7kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
81.0kCommit, push, and open a PR
