Pywfa
Python wrapper for wavefront alignment using WFA2-lib
Install / Use
/learn @kcleal/PywfaREADME
===== pyWFA
A python wrapper for wavefront alignment using WFA2-lib <https://github.com/smarco/WFA2-lib/>_
Installation
To download from pypi::
pip install pywfa
From conda::
conda install -c bioconda pywfa
Build from source::
git clone https://github.com/kcleal/pywfa
cd pywfa
pip install .
Overview
Alignment of pattern and text strings can be performed by accessing WFA2-lib functions directly:
.. code-block::
from pywfa import WavefrontAligner
pattern = "TCTTTACTCGCGCGTTGGAGAAATACAATAGT"
text = "TCTATACTGCGCGTTTGGAGAAATAAAATAGT"
a = WavefrontAligner(pattern)
score = a.wavefront_align(text)
assert a.status == 0 # alignment was successful
assert a.cigarstring == "3M1X4M1D7M1I9M1X6M"
assert a.score == -24
a.cigartuples
>>> [(0, 3), (8, 1), (0, 4), (2, 1), (0, 7), (1, 1), (0, 9), (8, 1), (0, 6)]
a.cigar_print_pretty()
.. code-block::
>>> ALIGNMENT 3M1X4M1D7M1I9M1X6M
ALIGNMENT.COMPACT 1X1D1I1X
PATTERN TCTTTACTCGCGCGTT-GGAGAAATACAATAGT
||| |||| ||||||| ||||||||| ||||||
TEXT TCTATACT-GCGCGTTTGGAGAAATAAAATAGT
The output of cigar_pretty_print can be directed to a file, rather than stdout using:
.. code-block::
a.cigar_print_pretty("file.txt")
To obtain a python str of this print out, access the results object (see below).
Cigartuples follow the convention:
.. list-table:: :widths: 15 15 :header-rows: 1
-
- Operation
- Code
-
- M
- 0
-
- I
- 1
-
- D
- 2
-
- N
- 3
-
- S
- 4
-
- H
- 5
-
- =
- 7
-
- X
- 8
-
- B
- 9
For convenience, a results object can be obtained by calling the WavefrontAligner with a pattern and text:
.. code-block::
pattern = "TCTTTACTCGCGCGTTGGAGAAATACAATAGT"
text = "TCTATACTGCGCGTTTGGAGAAATAAAATAGT"
a = WavefrontAligner(pattern)
result = a(text) # alignment result
result.__dict__
>>> {'pattern_length': 32, 'text_length': 32, 'pattern_start': 0, 'pattern_end': 32, 'text_start': 0, 'text_end': 32, 'cigartuples': [(0, 3), (8, 1), (0, 4), (2, 1), (0, 7), (1, 1), (0, 9), (8, 1), (0, 6)], 'score': -24, 'pattern': 'TCTTTACTCGCGCGTTGGAGAAATACAATAGT', 'text': 'TCTATACTGCGCGTTTGGAGAAATAAAATAGT', 'status': 0}
# Alignment can also be called with a pattern like this:
a(text, pattern)
# obtain a string in the same format as cigar_print_pretty
a.pretty
>>> 3M1X4M1D7M1I9M1X6M ALIGNMENT
1X1D1I1X ALIGNMENT.COMPACT
PATTERN TCTTTACTCGCGCGTT-GGAGAAATACAATAGT
|||*|||| ||||||| |||||||||*||||||
TEXT TCTATACT-GCGCGTTTGGAGAAATAAAATAGT
Configure
To configure the WaveFrontAligner, options can be provided during initialization:
.. code-block::
from pywfa import WavefrontAligner
a = WavefrontAligner(scope="score",
distance="affine2p",
span="end-to-end",
heuristic="adaptive")
Supported distance metrics are "affine" (default) and "affine2p". Scope can be "full" (default) or "score". Span can be "ends-free" (default) or "end-to-end". Heuristic can be None (default), "adaptive" or "X-drop".
When using heuristic functions it is recommended to check the status attribute:
.. code-block::
pattern = "AAAAACCTTTTTAAAAAA"
text = "GGCCAAAAACCAAAAAA"
a = WavefrontAligner(heuristic="adaptive")
a(pattern, text)
a.status
>>> 0 # successful alignment, -1 indicates the alignment was stopped due to the heuristic
Default options
The WavefrontAligner will be initialized with the following default options:
.. list-table:: :widths: 15 10 :header-rows: 1
-
- Parameter
- Default value
-
- pattern
- None
-
- distance
- "affine"
-
- match
- 0
-
- gap_opening
- 6
-
- gep_extension
- 2
-
- gap_opening2
- 24
-
- gap_extension2
- 1
-
- scope
- "full"
-
- span
- "ends-free"
-
- pattern_begin_free
- 0
-
- pattern_end_free
- 0
-
- text_begin_free
- 0
-
- text_end_free
- 0
-
- heuristic
- None
-
- min_wavefront_length
- 10
-
- max_distance_threshold
- 50
-
- steps_between_cutoffs
- 1
-
- xdrop
- 20
Modifying the cigar
If desired the cigar can be modified so the end operation is either a soft-clip or a match, this makes the alignment cigar resemble those produced by bwa, for example:
.. code-block::
pattern = "AAAAACCTTTTTAAAAAA"
text = "GGCCAAAAACCAAAAAA"
a = WavefrontAligner(pattern)
res = a(text, clip_cigar=False)
print(cigartuples_to_str(res.cigartuples))
>>> 4I7M5D6M
res(text, clip_cigar=True)
print(cigartuples_to_str(res.cigartuples))
>>> 4S7M5D6M
An experimental feature is to trim short matches at the end of alignments. This results in alignments that approximate local alignments:
.. code-block::
pattern = "AAAAAAAAAAAACCTTTTAAAAAAGAAAAAAA"
text = "ACCCCCCCCCCCAAAAACCAAAAAAAAAAAAA"
a = WavefrontAligner(pattern)
# The unmodified cigar may have short matches at the end:
res = a(text, clip_cigar=False)
res.cigartuples
>>> [(0, 1), (1, 5), (8, 6), (0, 7), (2, 5), (0, 5), (8, 1), (0, 7)]
res.aligned_text
>>> ACCCCCCCCCCCAAAAACCAAAAAAAAAAAAA
res.text_start, res.text_end
>>> 0, 32
# The minimum allowed block of matches can be set at e.g. 5 bp, which will trim off short matches
res = a(text, clip_cigar=True, min_aligned_bases_left=5, min_aligned_bases_right=5)
res.cigartuples
>>> [(4, 12), (0, 7), (2, 5), (0, 5), (8, 1), (0, 7)]
res.aligned_text
>>> AAAAACCAAAAAAAAAAAAA
res.text_start, res.text_end
>>> 12, 32
# Mismatch operations X can also be elided, note this occurs after the clip_cigar stage
res = a(text, clip_cigar=True, min_aligned_bases_left=5, min_aligned_bases_right=5, elide_mismatches=True)
res.cigartuples
>>> [(4, 12), (0, 7), (2, 5), (0, 13)]
res.aligned_text
>>> AAAAACCAAAAAAAAAAAAA
Notes: The alignment score is not modified currently by trimming the cigar, however the pattern_start, pattern_end, test_start and text_end are modified when the cigar is modified.
Related Skills
node-connect
343.1kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
90.0kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
343.1kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
343.1kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
