SkillAgentSearch skills...

DiMA

A command-line tool that analyses the diversity and motifs of protein/nucleotide sequences.

Install / Use

/learn @BVU-BILSAB/DiMA
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

DiMA - Diversity Motif Analyser

PyPI - Downloads GitHub closed issues GitHub issues PyPI - Python Version PyPI GitHub release (latest SemVer)

Table of Contents

What is DiMA?

Protein sequence diversity is one of the major challenges in the design of diagnostic, prophylactic and therapeutic interventions against viruses. DiMA is a tool designed to facilitate the dissection of protein sequence diversity dynamics for viruses. DiMA provides a quantitative measure of sequence diversity by use of Shannon’s entropy, applied via a user-defined k-mer sliding window. Further, the entropy value is corrected for sample size bias by applying a statistical adjustment. Additionally, DiMA further interrogates the diversity by dissecting the entropy value at each k-mer position to various diversity motifs. The distinct k-mer sequences at each position are classified into the following motifs based on their incidence.

  • Index: The predominant sequence.
  • Major: The sequence with the second highest incidence after the Index.
  • Minor: Kmers with incidence in between major and unique motifs
  • Unique: Kmers which are only seen once in a particular kmer position.

Moreover, the description line of the sequences in the alignment can be formatted for inclusion of meta-data that can be tagged to the diversity motifs. DiMA enables comparative diversity dynamics analysis, within and between proteins of a virus species, and proteomes of different viral species.

Publications

  • https://arxiv.org/abs/2205.13915

Installation

pip install dima-cli

Basic Usage

Shell Command

dima-cli -i aligned_sequences.afa -o results.json

Python

from dima import Dima
results = Dima(sequences="aligned_sequences.afa").run()

Results

<details> <summary>Click to view basic results</summary>
{
  "sequence_count": 5,
  "support_threshold": 30,
  "low_support_count": 20,
  "query_name": "Unknown Query",
  "kmer_length": 9,
  "average_entropy": 0.06854034285524647,
  "highest_entropy": {
    "position": 186,
    "entropy": 1.3921472236645345
  },
  "results": [
    {
      "position": 1,
      "low_support": "LS",
      "entropy": 0.7219280948873623,
      "support": 5,
      "distinct_variants_count": 1,
      "distinct_variants_incidence": 100.0,
      "total_variants_incidence": 20.0,
      "diversity_motifs": [
        {
          "sequence": "MSASKEIKS",
          "count": 4,
          "incidence": 80.0,
          "motif_short": "I",
          "motif_long": "Index",
          "metadata": null
        },
        {
          "sequence": "SAGVYMGNL",
          "count": 1,
          "incidence": 20.0,
          "motif_short": "U",
          "motif_long": "Unique",
          "metadata": null
        }
      ]
    },
    {
      "position": 2,
      "low_support": "LS",
      "entropy": 0.7219280948873623,
      "support": 5,
      "distinct_variants_count": 1,
      "distinct_variants_incidence": 100.0,
      "total_variants_incidence": 20.0,
      "diversity_motifs": [
        {
          "sequence": "AGVYMGNLS",
          "count": 1,
          "incidence": 20.0,
          "motif_short": "U",
          "motif_long": "Unique",
          "metadata": null
        },
        {
          "sequence": "SASKEIKSF",
          "count": 4,
          "incidence": 80.0,
          "motif_short": "I",
          "motif_long": "Index",
          "metadata": null
        }
      ]
    },
    {
      "position": 3,
      "low_support": "LS",
      "entropy": 0.7219280948873623,
      "support": 5,
      "distinct_variants_count": 1,
      "distinct_variants_incidence": 100.0,
      "total_variants_incidence": 20.0,
      "diversity_motifs": [
        {
          "sequence": "GVYMGNLSS",
          "count": 1,
          "incidence": 20.0,
          "motif_short": "U",
          "motif_long": "Unique",
          "metadata": null
        },
        {
          "sequence": "ASKEIKSFL",
          "count": 4,
          "incidence": 80.0,
          "motif_short": "I",
          "motif_long": "Index",
          "metadata": null
        }
      ]
    },
    {
      "position": 4,
      "low_support": "LS",
      "entropy": 0.7219280948873623,
      "support": 5,
      "distinct_variants_count": 1,
      "distinct_variants_incidence": 100.0,
      "total_variants_incidence": 20.0,
      "diversity_motifs": [
        {
          "sequence": "VYMGNLSSQ",
          "count": 1,
          "incidence": 20.0,
          "motif_short": "U",
          "motif_long": "Unique",
          "metadata": null
        },
        {
          "sequence": "SKEIKSFLW",
          "count": 4,
          "incidence": 80.0,
          "motif_short": "I",
          "motif_long": "Index",
          "metadata": null
        }
      ]
    },
    {
      "position": 5,
      "low_support": "LS",
      "entropy": 0.7219280948873623,
      "support": 5,
      "distinct_variants_count": 1,
      "distinct_variants_incidence": 100.0,
      "total_variants_incidence": 20.0,
      "diversity_motifs": [
        {
          "sequence": "KEIKSFLWT",
          "count": 4,
          "incidence": 80.0,
          "motif_short": "I",
          "motif_long": "Index",
          "metadata": null
        },
        {
          "sequence": "YMGNLSSQQ",
          "count": 1,
          "incidence": 20.0,
          "motif_short": "U",
          "motif_long": "Unique",
          "metadata": null
        }
      ]
    },
    {
      "position": 6,
      "low_support": "LS",
      "entropy": 0.7219280948873623,
      "support": 5,
      "distinct_variants_count": 1,
      "distinct_variants_incidence": 100.0,
      "total_variants_incidence": 20.0,
      "diversity_motifs": [
        {
          "sequence": "MGNLSSQQL",
          "count": 1,
          "incidence": 20.0,
          "motif_short": "U",
          "motif_long": "Unique",
          "metadata": null
        },
        {
          "sequence": "EIKSFLWTQ",
          "count": 4,
          "incidence": 80.0,
          "motif_short": "I",
          "motif_long": "Index",
          "metadata": null
        }
      ]
    },
    {
      "position": 7,
      "low_support": "LS",
      "entropy": 0.7219280948873623,
      "support": 5,
      "distinct_variants_count": 1,
      "distinct_variants_incidence": 100.0,
      "total_variants_incidence": 20.0,
      "diversity_motifs": [
        {
          "sequence": "IKSFLWTQS",
          "count": 4,
          "incidence": 80.0,
          "motif_short": "I",
          "motif_long": "Index",
          "metadata": null
        },
        {
          "sequence": "GNLSSQQLD",
          "count": 1,
          "incidence": 20.0,
          "motif_short": "U",
          "motif_long": "Unique",
          "metadata": null
        }
      ]
    },
    {
      "position": 8,
      "low_support": "LS",
      "entropy": 0.7219280948873623,
      "support": 5,
      "distinct_variants_count": 1,
      "distinct_variants_incidence": 100.0,
      "total_variants_incidence": 20.0,
      "diversity_motifs": [
        {
          "sequence": "KSFLWTQSL",
          "count": 4,
          "incidence": 80.0,
          "motif_short": "I",
          "motif_long": "Index",
          "metadata": null
        },
        {
          "sequence": "NLSSQQLDQ",
          "count": 1,
          "incidence": 20.0,
          "motif_short": "U",
          "motif_long": "Unique",
          "metadata": null
        }
      ]
    },
    {
      "position": 9,
      "low_support": "LS",
      "entropy": 0.7219280948873623,
      "support": 5,
      "distinct_variants_count": 1,
      "distinct_variants_incidence": 100.0,
      "total_variants_incidence": 20.0,
      "diversity_motifs": [
        {
          "sequence": "SFLWTQSLR",
          "count": 4,
          "incidence": 80.0,
          "motif_short": "I",
          "motif_long": "Index",
          "metadata": null
        },
        {
          "sequence": "LSSQQLDQR",
          "count": 1,
          "incidence": 20.0,
          "motif_short": "U",
          "motif_long": "Unique",
          "metadata": null
        }
      ]
    },
    {
      "position": 10,
      "low_support": "LS",
      "entropy": 0.7219280948873623,
      "support": 5,
      "distinct_variants_count": 1,
      "distinct_variants_incidence": 100.0,
      "total_variants_incidence": 20.0,
      "diversity_motifs": [
        {
          "sequence": "SSQQLDQRR",
          "count": 1,
          "incidence": 20.0,
          "motif_short": "U",
          "motif_long": "Unique",
          "metadata": null
        },
        {
          "sequence": "FLWTQSLRR",
          "count": 4,
          "incidence": 80.0,
          "motif_short": "I",
          "motif_long": "Index",
          "metadata": null
        }
      ]
    },
    {
      "position": 11,
      "low_support": "LS",
      "entropy": 0.7219280948873623,
      "support": 5,
      "distinct_variants_count": 1,
      "distinct_variants_incidence": 100.0,
      "total_variants_incidence": 20.0,
      "diversity_motifs": [
        {
          "sequence": "LWTQSLRRE",
          "count": 4,
          "incidence": 80.0,
          "motif_short": "I",
          "motif_long": "Index",
       

Related Skills

View on GitHub
GitHub Stars9
CategoryEducation
Updated1y ago
Forks2

Languages

Rust

Security Score

60/100

Audited on Jan 17, 2025

No findings