SkillAgentSearch skills...

DNASequenceAnalysisTool

A comprehensive Python tool for DNA sequence analysis that provides various molecular biology and bioinformatics functions.

Install / Use

/learn @YanCotta/DNASequenceAnalysisTool

README

🧬 DNA Sequence Analysis Tool

Python Version License: MIT Code style: black Tests

A high-performance Python library and command-line tool for comprehensive DNA/RNA sequence analysis with advanced visualization capabilities. This toolkit is designed for both bioinformaticians and molecular biologists, providing a robust set of tools for sequence analysis, manipulation, and visualization.

📑 Table of Contents

✨ Key Features

Sequence Analysis

  • GC content calculation
  • Melting temperature prediction
  • Molecular weight calculation
  • Sequence validation and sanitization
  • Motif finding and pattern matching
  • ORF (Open Reading Frame) detection

Sequence Manipulation

  • Reverse complement generation
  • Transcription and translation
  • Sequence alignment
  • Primer design
  • Restriction site analysis

File I/O Support

  • FASTA/FASTQ format support
  • GZIP/BZ2 compression support
  • Batch processing of multiple files
  • Stream processing for large files
  • Configurable output formats
  • Parallel processing options

Visualization

  • GC content plots
  • Sequence logos
  • Restriction maps
  • Interactive sequence viewers

Command-Line Interface

  • User-friendly command-line tools
  • Batch processing support
  • Configurable output formats
  • Parallel processing options
    • User-friendly command-line tools
    • Batch processing support
    • Configurable output formats
    • Parallel processing options

🏗️ Project Structure

DNASequenceAnalysisTool/
├── dna_sequence_analysis_tool/     # Main package
│   ├── core/                      # Core functionality
│   │   ├── __init__.py
│   │   ├── sequence_analysis.py   # Sequence analysis functions
│   │   ├── sequence_io.py         # File I/O operations
│   │   ├── sequence_validation.py # Sequence validation
│   │   ├── sequence_statistics.py # Statistical analysis
│   │   ├── sequence_transformation.py # Sequence manipulation
│   │   └── visualization.py       # Visualization tools
│   ├── data/                      # Sample data
│   │   ├── __init__.py
│   │   └── sample_sequence.py     # Sample sequences
│   ├── tests/                     # Test suite
│   │   ├── __init__.py
│   │   └── test_sequence_analysis.py
│   ├── utils/                     # Utility functions
│   │   ├── __init__.py
│   │   └── file_io.py
│   ├── __init__.py
│   ├── cli.py                     # Command-line interface
│   ├── config.py                  # Configuration settings
│   ├── exceptions.py              # Custom exceptions
│   └── logging_config.py          # Logging configuration
├── examples/                      # Example scripts
│   ├── basic_sequence_analysis.py
│   ├── file_io_and_visualization.py
│   └── README.md
├── .gitignore
├── CHANGELOG.md
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── LICENSE
├── MANIFEST.in
├── Makefile
├── pyproject.toml
├── requirements-dev.txt
├── requirements.txt
└── setup.py

📋 Requirements

  • Python 3.8+
  • Dependencies are listed in requirements.txt

Core Dependencies

  • NumPy >= 1.19.0
  • SciPy >= 1.5.0
  • Biopython >= 1.78
  • pandas >= 1.2.0
  • pydantic >= 1.8.0
  • pyyaml >= 5.4.1
  • click >= 8.0.0
  • rich >= 10.0.0
  • matplotlib >= 3.3.0
  • plotly >= 5.0.0

💻 Installation

From PyPI (recommended)

pip install dna-sequence-analysis-tool

From Source

  1. Clone the repository:

    git clone https://github.com/YanCotta/DNASequenceAnalysisTool.git
    cd DNASequenceAnalysisTool
    
  2. Install with pip in development mode:

    pip install -e .
    

Development Setup

  1. Install development dependencies:

    pip install -r requirements-dev.txt
    
  2. Set up pre-commit hooks:

    pre-commit install
    

🚀 Quick Start

Python API

from dna_sequence_analysis_tool import DNASequence, DNAToolkit

# Create a DNA sequence
sequence = DNASequence("ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG", "example_sequence")

# Get sequence information
print(f"Sequence ID: {sequence.id}")
print(f"Length: {sequence.length} bp")
print(f"GC content: {sequence.gc_content:.2f}%")

# Get reverse complement
rev_comp = sequence.reverse_complement()
print(f"Reverse complement: {rev_comp}")

# Find motifs
motif = "GGC"
positions = sequence.find_motif(motif)
print(f"Motif '{motif}' found at positions: {positions}")

# Analyze with toolkit
toolkit = DNAToolkit()
tm = toolkit.calculate_melting_temperature(sequence.sequence)
print(f"Melting temperature: {tm:.2f}°C")

Command Line Interface

# Analyze a sequence file
dnatool analyze sequences.fasta --output results.csv

# Generate a GC content plot
dnatool plot-gc sequences.fasta --output gc_plot.png

# Find ORFs in a sequence
dnatool find-orfs sequence.fasta --min-length 100

# Get help
dnatool --help

🔧 Configuration

The tool can be configured using a YAML configuration file located at ~/.dna_sequence_analysis/config.yaml.

Example configuration:

# General settings
log_level: INFO
max_sequence_length: 10000000

# File I/O settings
default_input_format: fasta
default_output_format: fasta
auto_detect_format: true

# Performance settings
chunk_size: 10000
max_workers: 4

# Visualization settings
plot_theme: default
default_figure_size: [10, 6]

📚 Documentation

Comprehensive documentation is available at Read the Docs.

To build the documentation locally:

cd docs
make html

🤝 Contributing

Contributions are welcome! Please see our Contributing Guide for details on how to contribute to this project.

🧪 Testing

Run the test suite with:

pytest

For test coverage report:

pytest --cov=dna_sequence_analysis_tool --cov-report=term-missing

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

📝 Changelog

See CHANGELOG.md for a history of changes to this project.

📬 Contact & Support

For support or questions, please open an issue on GitHub.


Made with ❤️ by the DNA Sequence Analysis Tool contributors

🌟 Features

Sequence Analysis

  • GC content calculation
  • Melting temperature prediction
  • ORF detection and analysis
  • Nucleotide composition analysis
  • Pattern recognition and motif finding

Molecular Biology Tools

  • DNA/RNA transcription
  • Codon-optimized protein translation
  • Sophisticated ORF detection
  • Advanced melting temperature calculations

File I/O Support

  • FASTA/FASTQ format support
  • GZIP/BZIP2 compression
  • Batch processing capabilities
  • Format conversion utilities

Visualization

  • GC content plots
  • Sequence logos
  • Multiple sequence alignments
  • Interactive visualizations

Command Line Interface

  • Intuitive command structure
  • Batch processing support
  • Multiple output formats (text, JSON, CSV)
  • Visualization export to image files

🏗️ Project Structure

dna_sequence_analysis_tool/
├── core/
│   ├── __init__.py
│   ├── sequence_analysis.py
│   ├── sequence_validation.py
│   └── visualization.py
├── data/
│   └── sample_sequences.fasta
├── utils/
│   ├── file_io.py
│   └── logging.py
├── tests/
│   ├── test_sequence_analysis.py
│   └── test_validation.py
├── cli.py
├── README.md
└── requirements.txt

📋 Requirements

  • Python 3.8+

Core Dependencies

  • NumPy >= 1.19.0
  • SciPy >= 1.5.0
  • Biopython >= 1.78
  • pandas >= 1.2.0
  • matplotlib >= 3.3.0 (for visualization)
  • click >= 8.0.0 (for CLI)
  • rich >= 10.0.0 (for rich CLI output)
  • plotly >= 5.0.0 (for interactive visualizations)

Optional Dependencies

  • python-magic (for file type detection)
  • python-magic-bin (Windows only, for file type detection)

📦 Installation

# Install from PyPI
pip install dna-sequence-analysis-tool

# Install from source
git clone https://github.com/YanCotta/DNASequenceAnalysisTool.git
cd DNASequenceAnalysisTool
pip install -e .

🔍 Quick Start

from dna_sequence_analysis_tool import DNAToolkit

# Initialize toolkit
toolkit = DNAToolkit()

# Analyze a sequence
sequence = "ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG"
result = toolkit.analyze_sequence(sequence)
print(f"GC Content: {result.gc_content}%")

📊 API Documentation

Core Classes

DNASequence

class DNASequence:
    """
    Core class for DNA sequence analysis.
    
    Attributes:
        sequence (str): The DNA sequence
        length (int): Sequence length
        gc_content (float): GC content percentage
    """

Basic Functions

validate_sequence(sequence)

  • Validates DNA sequences (A, T, G, C)
  • Returns: (bool, str) - validity status and error message

calculate_gc_content(dna_sequence)

  • Calculates GC content percentage
  • Raises ValueError for invalid sequences

reverse_complement(dna_sequence)

  • Generates reverse complement of DNA sequence
  • Returns: String of complementary sequence

find_motif(dna_sequence, motif)

  • Finds all occurrences of a
View on GitHub
GitHub Stars4
CategoryDevelopment
Updated8mo ago
Forks1

Languages

Python

Security Score

82/100

Audited on Jul 10, 2025

No findings