DiMA
A command-line tool that analyses the diversity and motifs of protein/nucleotide sequences.
Install / Use
/learn @BVU-BILSAB/DiMAREADME
DiMA - Diversity Motif Analyser
Table of Contents
- What is DiMA?
- Publications
- Installation
- Basic Usage
- Advance Usage
- Command-Line Arguments
- Module Parameters
What is DiMA?
Protein sequence diversity is one of the major challenges in the design of diagnostic, prophylactic and therapeutic interventions against viruses. DiMA is a tool designed to facilitate the dissection of protein sequence diversity dynamics for viruses. DiMA provides a quantitative measure of sequence diversity by use of Shannon’s entropy, applied via a user-defined k-mer sliding window. Further, the entropy value is corrected for sample size bias by applying a statistical adjustment. Additionally, DiMA further interrogates the diversity by dissecting the entropy value at each k-mer position to various diversity motifs. The distinct k-mer sequences at each position are classified into the following motifs based on their incidence.
- Index: The predominant sequence.
- Major: The sequence with the second highest incidence after the Index.
- Minor: Kmers with incidence in between major and unique motifs
- Unique: Kmers which are only seen once in a particular kmer position.
Moreover, the description line of the sequences in the alignment can be formatted for inclusion of meta-data that can be tagged to the diversity motifs. DiMA enables comparative diversity dynamics analysis, within and between proteins of a virus species, and proteomes of different viral species.
Publications
- https://arxiv.org/abs/2205.13915
Installation
pip install dima-cli
Basic Usage
Shell Command
dima-cli -i aligned_sequences.afa -o results.json
Python
from dima import Dima
results = Dima(sequences="aligned_sequences.afa").run()
Results
<details> <summary>Click to view basic results</summary>{
"sequence_count": 5,
"support_threshold": 30,
"low_support_count": 20,
"query_name": "Unknown Query",
"kmer_length": 9,
"average_entropy": 0.06854034285524647,
"highest_entropy": {
"position": 186,
"entropy": 1.3921472236645345
},
"results": [
{
"position": 1,
"low_support": "LS",
"entropy": 0.7219280948873623,
"support": 5,
"distinct_variants_count": 1,
"distinct_variants_incidence": 100.0,
"total_variants_incidence": 20.0,
"diversity_motifs": [
{
"sequence": "MSASKEIKS",
"count": 4,
"incidence": 80.0,
"motif_short": "I",
"motif_long": "Index",
"metadata": null
},
{
"sequence": "SAGVYMGNL",
"count": 1,
"incidence": 20.0,
"motif_short": "U",
"motif_long": "Unique",
"metadata": null
}
]
},
{
"position": 2,
"low_support": "LS",
"entropy": 0.7219280948873623,
"support": 5,
"distinct_variants_count": 1,
"distinct_variants_incidence": 100.0,
"total_variants_incidence": 20.0,
"diversity_motifs": [
{
"sequence": "AGVYMGNLS",
"count": 1,
"incidence": 20.0,
"motif_short": "U",
"motif_long": "Unique",
"metadata": null
},
{
"sequence": "SASKEIKSF",
"count": 4,
"incidence": 80.0,
"motif_short": "I",
"motif_long": "Index",
"metadata": null
}
]
},
{
"position": 3,
"low_support": "LS",
"entropy": 0.7219280948873623,
"support": 5,
"distinct_variants_count": 1,
"distinct_variants_incidence": 100.0,
"total_variants_incidence": 20.0,
"diversity_motifs": [
{
"sequence": "GVYMGNLSS",
"count": 1,
"incidence": 20.0,
"motif_short": "U",
"motif_long": "Unique",
"metadata": null
},
{
"sequence": "ASKEIKSFL",
"count": 4,
"incidence": 80.0,
"motif_short": "I",
"motif_long": "Index",
"metadata": null
}
]
},
{
"position": 4,
"low_support": "LS",
"entropy": 0.7219280948873623,
"support": 5,
"distinct_variants_count": 1,
"distinct_variants_incidence": 100.0,
"total_variants_incidence": 20.0,
"diversity_motifs": [
{
"sequence": "VYMGNLSSQ",
"count": 1,
"incidence": 20.0,
"motif_short": "U",
"motif_long": "Unique",
"metadata": null
},
{
"sequence": "SKEIKSFLW",
"count": 4,
"incidence": 80.0,
"motif_short": "I",
"motif_long": "Index",
"metadata": null
}
]
},
{
"position": 5,
"low_support": "LS",
"entropy": 0.7219280948873623,
"support": 5,
"distinct_variants_count": 1,
"distinct_variants_incidence": 100.0,
"total_variants_incidence": 20.0,
"diversity_motifs": [
{
"sequence": "KEIKSFLWT",
"count": 4,
"incidence": 80.0,
"motif_short": "I",
"motif_long": "Index",
"metadata": null
},
{
"sequence": "YMGNLSSQQ",
"count": 1,
"incidence": 20.0,
"motif_short": "U",
"motif_long": "Unique",
"metadata": null
}
]
},
{
"position": 6,
"low_support": "LS",
"entropy": 0.7219280948873623,
"support": 5,
"distinct_variants_count": 1,
"distinct_variants_incidence": 100.0,
"total_variants_incidence": 20.0,
"diversity_motifs": [
{
"sequence": "MGNLSSQQL",
"count": 1,
"incidence": 20.0,
"motif_short": "U",
"motif_long": "Unique",
"metadata": null
},
{
"sequence": "EIKSFLWTQ",
"count": 4,
"incidence": 80.0,
"motif_short": "I",
"motif_long": "Index",
"metadata": null
}
]
},
{
"position": 7,
"low_support": "LS",
"entropy": 0.7219280948873623,
"support": 5,
"distinct_variants_count": 1,
"distinct_variants_incidence": 100.0,
"total_variants_incidence": 20.0,
"diversity_motifs": [
{
"sequence": "IKSFLWTQS",
"count": 4,
"incidence": 80.0,
"motif_short": "I",
"motif_long": "Index",
"metadata": null
},
{
"sequence": "GNLSSQQLD",
"count": 1,
"incidence": 20.0,
"motif_short": "U",
"motif_long": "Unique",
"metadata": null
}
]
},
{
"position": 8,
"low_support": "LS",
"entropy": 0.7219280948873623,
"support": 5,
"distinct_variants_count": 1,
"distinct_variants_incidence": 100.0,
"total_variants_incidence": 20.0,
"diversity_motifs": [
{
"sequence": "KSFLWTQSL",
"count": 4,
"incidence": 80.0,
"motif_short": "I",
"motif_long": "Index",
"metadata": null
},
{
"sequence": "NLSSQQLDQ",
"count": 1,
"incidence": 20.0,
"motif_short": "U",
"motif_long": "Unique",
"metadata": null
}
]
},
{
"position": 9,
"low_support": "LS",
"entropy": 0.7219280948873623,
"support": 5,
"distinct_variants_count": 1,
"distinct_variants_incidence": 100.0,
"total_variants_incidence": 20.0,
"diversity_motifs": [
{
"sequence": "SFLWTQSLR",
"count": 4,
"incidence": 80.0,
"motif_short": "I",
"motif_long": "Index",
"metadata": null
},
{
"sequence": "LSSQQLDQR",
"count": 1,
"incidence": 20.0,
"motif_short": "U",
"motif_long": "Unique",
"metadata": null
}
]
},
{
"position": 10,
"low_support": "LS",
"entropy": 0.7219280948873623,
"support": 5,
"distinct_variants_count": 1,
"distinct_variants_incidence": 100.0,
"total_variants_incidence": 20.0,
"diversity_motifs": [
{
"sequence": "SSQQLDQRR",
"count": 1,
"incidence": 20.0,
"motif_short": "U",
"motif_long": "Unique",
"metadata": null
},
{
"sequence": "FLWTQSLRR",
"count": 4,
"incidence": 80.0,
"motif_short": "I",
"motif_long": "Index",
"metadata": null
}
]
},
{
"position": 11,
"low_support": "LS",
"entropy": 0.7219280948873623,
"support": 5,
"distinct_variants_count": 1,
"distinct_variants_incidence": 100.0,
"total_variants_incidence": 20.0,
"diversity_motifs": [
{
"sequence": "LWTQSLRRE",
"count": 4,
"incidence": 80.0,
"motif_short": "I",
"motif_long": "Index",
Related Skills
himalaya
340.5kCLI to manage emails via IMAP/SMTP. Use `himalaya` to list, read, write, reply, forward, search, and organize emails from the terminal. Supports multiple accounts and message composition with MML (MIME Meta Language).
claude-opus-4-5-migration
84.2kMigrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5
coding-agent
340.5kDelegate coding tasks to Codex, Claude Code, or Pi agents via background process
tavily
340.5kTavily web search, content extraction, and research tools.
