License

This repository contains solutions to bioinformatics coding challenges from [rosalind.info]. Problems are organised by the various different [locations]:

[Python Village]: initial problems to learn a few basics about the Python programming language.
[Bioinformatics Stronghold]: problems to discover the algorithms underlying a variety of bioinformatics topics.
[Bioinformatics Armory]: unlike the stronghold in the Armory we solve problems by using existing tools.
[Bioinformatics Textbook Track]: problems associated with [Bioinformatics Algorithms: An Active Learning Approach].
[Algorithmic Heights]: exercises to accompany the book [Algorithms].

Running the solutions

This repository is written as a python module and uses [poetry] and [typer].

Solutions for each problem are located in individual files inside the directory for each location.

You can install the versions of dependencies used here with:

poetry install

To run solutions within this environment run, e.g.:

poetry run rosalind ini2 rosalind_ini2.txt

To run the solution on the provided "Sample Dataset" from [rosalind.info] (which should reproduce the "Sample Output"), run the solution in "test" mode:

poetry run rosalind --test ini2

Testing

pytest-snapshot is used to test solutions to problems. In many cases solutions generated will and should exactly match the "Sample Output" given at rosalind.info. In cases, where e.g. ordering is not important, the expected solutions (in tests/expected) have been updated to match code used here, but are equally valid solutions.

To run the tests use:

poetry run pytest

To update the tests (adding or modifying snapshots / expected output) use:

poetry run pytest --snapshot-update

Note that some solutions (that use Entrez) require an email address. This should be set as an environment variable, e.g.:

export ENTREZ_EMAIL=rosalind.franklin@cam.ac.uk

About

My rosalind profile: https://rosalind.info/users/danhalligan/

Solutions

Python Village

Bioinformatics Stronghold

[x] DNA: Counting DNA Nucleotides
[x] RNA: Transcribing DNA into RNA
[x] REVC: Complementing a Strand of DNA
[x] FIB: Rabbits and Recurrence Relations
[x] GC: Computing GC Content
[x] HAMM: Counting Point Mutations
[x] IPRB: Mendel's First Law
[x] PROT: Translating RNA into Protein
[x] SUBS: Finding a Motif in DNA
[x] CONS: Consensus and Profile
[x] FIBD: Mortal Fibonacci Rabbits
[x] GRPH: Overlap Graphs
[x] IEV: Calculating Expected Offspring
[x] LCSM: Finding a Shared Motif
[x] LIA: Independent Alleles
[x] MPRT: Finding a Protein Motif
[x] MRNA: Inferring mRNA from Protein
[x] ORF: Open Reading Frames
[x] PERM: Enumerating Gene Orders
[x] PRTM: Calculating Protein Mass
[x] REVP: Locating Restriction Sites
[x] SPLC: RNA Splicing
[x] LEXF: Enumerating k-mers Lexicographically
[x] LGIS: Longest Increasing Subsequence
[x] LONG: Genome Assembly as Shortest Superstring
[x] PMCH: Perfect Matchings and RNA Secondary Structures
[x] PPER: Partial Permutations
[x] PROB: Introduction to Random Strings
[x] SIGN: Enumerating Oriented Gene Orderings
[x] SSEQ: Finding a Spliced Motif
[x] TRAN: Transitions and Transversions
[x] TREE: Completing a Tree
[x] CAT: Catalan Numbers and RNA Secondary Structures
[x] CORR: Error Correction in Reads
[x] INOD: Counting Phylogenetic Ancestors
[x] KMER: k-Mer Composition
[x] KMP: Speeding Up Motif Finding
[x] LCSQ: Finding a Shared Spliced Motif
[x] LEXV: Ordering Strings of Varying Length Lexicographically
[x] MMCH: Maximum Matchings and RNA Secondary Structures
[x] PDST: Creating a Distance Matrix
[x] REAR: Reversal Distance
[x] RSTR: Matching Random Motifs
[x] SSET: Counting Subsets
[x] ASPC: Introduction to Alternative Splicing
[x] EDIT: Edit Distance
[x] EVAL: Expected Number of Restriction Sites
[x] MOTZ: Motzkin Numbers and RNA Secondary Structures
[x] SCSP: Interleaving Two Motifs
[x] SETO: Introduction to Set Operations
[x] SORT: Sorting by Reversals
[x] SPEC: Inferring Protein from Spectrum
[x] TRIE: Introduction to Pattern Matching
[x] CONV: Comparing Spectra with the Spectral Convolution
[x] DBRU: Constructing a De Bruijn Graph
[x] EDTA: Edit Distance Alignment
[x] FULL: Inferring Peptide from Full Spectrum
[x] INDC: Independent Segregation of Chromosomes
[x] LREP: Finding the Longest Multiple Repeat
[x] RNAS: Wobble Bonding and RNA Secondary Structures
[x] AFRQ: Counting Disease Carriers
[x] CTEA: Counting Optimal Alignments
[x] GLOB: Global Alignment with Scoring Matrix
[x] PCOV: Genome Assembly with Perfect Coverage
[x] PRSM: Matching a Spectrum to a Protein
[x] SGRA: Using the Spectrum Graph to Infer Peptides
[x] SUFF: Encoding Suffix Trees
[x] GASM: Genome Assembly Using Reads
[x] GCON: Global Alignment with Constant Gap Penalty
[x] LING: Linguistic Complexity of a Genome
[x] LOCA: Local Alignment with Scoring Matrix
[x] MGAP: Maximizing the Gap Symbols of an Optimal Alignment
[x] MULT: Multiple Alignment
[x] PDPL: Creating a Restriction Map
[x] SEXL: Sex-Linked Inheritance
[x] WFMD: The Wright-Fisher Model of Genetic Drift
[x] ASMQ: Assessing Assembly Quality with N50 and N75
[x] EBIN: Wright-Fisher's Expected Behavior
[x] FOUN: The Founder Effect and Genetic Drift
[x] GAFF: Global Alignment with Scoring Matrix and Affine Gap Penalty
[x] [GREP: Genome Assembly with Perfect Coverage and Repeats](rosalind/bioinformatics_stronghold/grep

Rosalind.info

Install / Use

README