Pydna
Clone with Python! Data structures for double stranded DNA & simulation of homologous recombination, Gibson assembly, cut & paste cloning.
Install / Use
/learn @pydna-group/PydnaREADME
| |
|
|
|
| ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------- |
|
|
|
| |
Pydna is a python package that provides a human-readable formal descriptions of 🧬 cloning and genetic assembly strategies in Python 🐍 for simulation and verification. Pydna can be used as executable documentation for cloning.
Designing genetic constructs with many components and steps, like recombinant metabolic pathways 🧫, often makes accurate documentation difficult, as seen in the poor state of scientific literature ☢️
A cloning strategy expressed in pydna is complete, unambiguous and stable.
Pydna provides simulation of:
- Primer design
- PCR
- Restriction digestion
- Ligation
- Gel electrophoresis of DNA with generation of gel images
- Homologous recombination
- Gibson assembly
- Golden gate assembly (in progress)
Virtually any sub-cloning experiment can be described in pydna, and its execution yield the sequences of intermediate and final DNA molecules.
Pydna has been designed with the goal of being understandable for biologists with only some basic understanding of Python.
Pydna can formalize planning and sharing of cloning strategies and is especially useful for complex or combinatorial DNA molecule constructions.
<!-- docs/index.rst-end -->Acknowledgement 🤝
If you use pydna in your research, please reference the paper:
Pereira, F., Azevedo, F., Carvalho, Â., Ribeiro, G. F., Budde, M. W., & Johansson, B. (2015). Pydna: a simulation and documentation tool for DNA assembly strategies using python. BMC Bioinformatics, 16(142), 142. doi:10.1186/s12859-015-0544-x
Documentation and usage 📚
Full documentation of all modules and classes can be found at https://pydna-group.github.io/pydna.
To get started, we recommend you to have a look at the example notebooks. Start by having a look at Dseq, Dseq_Features and Importing_Seqs, which cover the basics of working with sequences. The rest of the notebooks cover how to use pydna for different cloning strategies, such as Gibson assembly, Restriction-Ligation, etc.
Most pydna functionality is implemented as methods for the double stranded DNA sequence record classes Dseq and Dseqrecord, which are subclasses of the Biopython Seq and SeqRecord classes.
These classes make PCR primer design, PCR simulation and cut-and-paste cloning very simple:
NOTE: You can run this example in this notebook
from pydna.dseqrecord import Dseqrecord
# Let's create a DNA sequence record, and add a feature to it
dsr = Dseqrecord("ATGCAAACAGTAATGATGGATGACATTCAAAGCACTGATTCTATTGCTGAAAAAGATAAT")
dsr.add_feature(x=0, y=60,type="gene", label="my_gene") # We add a feature to highlight the sequence as a gene
dsr.figure()
<pre>
Dseqrecord(-60)
<mark>ATGCAAACAGTAATGATGGATGACATTCAAAGCACTGATTCTATTGCTGAAAAAGATAAT</mark>
TACGTTTGTCATTACTACCTACTGTAAGTTTCGTGACTAAGATAACGACTTTTTCTATTA
</pre>
# This is how it would look as a genbank file
print(dsr.format("genbank"))
LOCUS name 60 bp DNA linear UNK 01-JAN-1980
DEFINITION description.
ACCESSION id
VERSION id
KEYWORDS .
SOURCE .
ORGANISM .
.
FEATURES Location/Qualifiers
misc 1..60
/type="gene"
/label="my_gene"
ORIGIN
1 atgcaaacag taatgatgga tgacattcaa agcactgatt ctattgctga aaaagataat
//
# Now let's design primers to amplify it
from pydna.design import primer_design
# limit is the minimum length of the primer, target_tm is the desired melting temperature of the primer
amplicon = primer_design(dsr, limit=13, target_tm=55)
# Let's print the primers, and a figure that shows where they align with the template sequence
print("forward primer:", amplicon.forward_primer.seq)
print("reverse primer:", amplicon.reverse_primer.seq)
amplicon.figure()
forward primer: ATGCAAACAGTAATGATGGA
reverse primer: ATTATCTTTTTCAGCAATAGAATCA
5ATGCAAACAGTAATGATGGA...TGATTCTATTGCTGAAAAAGATAAT3
|||||||||||||||||||||||||
3ACTAAGATAACGACTTTTTCTATTA5
5ATGCAAACAGTAATGATGGA3
||||||||||||||||||||
3TACGTTTGTCATTACTACCT...ACTAAGATAACGACTTTTTCTATTA5
# Let's say we don't want to just amplify it, but we want to add restriction sites to it!
from pydna.amplify import pcr
# We add the restriction sites to the primers
forward_primer = "ccccGGATCC" + amplicon.forward_primer
reverse_primer = "ttttGGATCC" + amplicon.reverse_primer
# We do the PCR
pcr_product = pcr(forward_primer, reverse_primer, dsr)
# The PCR product is of class `Amplicon`, a subclass of `Dseqrecord`.
# When doing a figure, it shows where primers anneal.
pcr_product.figure()
5ATGCAAACAGTAATGATGGA...TGATTCTATTGCTGAAAAAGATAAT3
|||||||||||||||||||||||||
3ACTAAGATAACGACTTTTTCTATTACCTAGGtttt5
5ccccGGATCCATGCAAACAGTAATGATGGA3
||||||||||||||||||||
3TACGTTTGTCATTACTACCT...ACTAAGATAACGACTTTTTCTATTA5
# If we want to see the sequence more clearly, we can turn it into a `Dseqrecord`
pcr_product = Dseqrecord(pcr_product)
pcr_product.figure()
<pre>
Dseqrecord(-80)
ccccGGATCC<mark>ATGCAAACAGTAATGATGGATGACATTCAAAGCACTGATTCTATTGCTGAAAAAGATAAT</mark>GGATCCaaaa
ggggCCTAGGTACGTTTGTCATTACTACCTACTGTAAGTTTCGTGACTAAGATAACGACTTTTTCTATTACCTAGGtttt
</pre>
from Bio.Restriction import BamHI # cuts GGATCC
# a, payload, c are the cut fragments
a, payload, c = pcr_product.cut (BamHI)
print(a.figure())
print()
print (payload.figure())
print()
print(c.figure())
<pre>
Dseqrecord(-9)
ccccG
ggggCCTAG
Dseqrecord(-70)
GATCC<mark>ATGCAAACAGTAATGATGGATGACATTCAAAGCACTGATTCTATTGCTGAAAAAGATAAT</mark>G
GTACGTTTGTCATTACTACCTACTGTAAGTTTCGTGACTAAGATAACGACTTTTTCTATTACCTAG
Dseqrecord(-9)
GATCCaaaa
Gtttt
</pre>
# We create a circular vector to insert the amplicon into
vector = Dseqrecord("aatgtttttccctCCCGGGcaaaatAGATCTtgctatgcatcatcgatct", circular=True, name="vect")
vector.figure()
Dseqrecord(o50)
aatgtttttccctCCCGGGcaaaatAGATCTtgctatgcatcatcgatct
ttacaaaaagggaGGGCCCgttttaTCTAGAacgatacgtagtagctaga
from Bio.Restriction import BglII # cuts AGATCT
linear_vector_bgl = vector.cut(BglII)[0] # Linearize the vector at BglII (produces only one fragment)
# Ligate the fragment of interest to the vector, and call looped() to circularize it
# synced is used to place the origin coordinate (0) in the same place for rec_vector and vector
rec_vector= (linear_vector_bgl + payload).looped().synced(vector)
rec_vector.figure()
<pre>
Dseqrecord(o116)
aatgtttttccctCCCGGGcaaaatAGATCC<mark>ATGCAAACAGTAATGATGGATGACATTCAAAGCACTGATTCTATTGCTGAAAAAGATAAT</mark>GGATCTtgctatgcatcatcgatct
ttacaaaaagggaGGGCCCgttttaTCTAGGTACGTTTGTCATTACTACCTACTGTAAGTTTCGTGACTAAGATAACGACTTTTTCTATTACCTAGAacgatacgtagtagctaga
</pre>
# Let's simulate a Gibson assembly
from pydna.assembly import Assembly
fragments = [
Dseqrecord('aatgtttttccctCACTACGtgctatgcatcat', name="fragment_A"),
Dseqrecord('tgctatgcatcatCTATGGAcactctaataatg', name="fragment_B"),
Dseqrecord('cactctaataatgTTACATAaatgtttttccct', name="fragment_C"),
]
# limit is the min. homology length between fragments in the assembly
asm = Assembly(fragments, limit=10)
# From the assembly object, which can generate all possible products, get a circular
product, *rest = asm.assemble_circular()
# We can print a figure that shows the overlaps between fragments
product.figure()
-|fragment_A|13
| \/
| /\
| 13|fragment_B|13

