CfdnaPattern
Pattern Recognition for Cell-free DNA
Install / Use
/learn @OpenGene/CfdnaPatternREADME
CfdnaPattern
Pattern Recognition for Cell-free DNA
Predict a fastq is cfdna or not
# predict a single file
python predict.py <single_fastq_file>
# predict files
python predict.py <fastq_file1> <fastq_file2> ...
# predict files with wildcard
python predict.py *.fq
warning: this tool doesn't work for trimmed fastq
prediction output
For each file given in the command line, this tool will output a line <prediction>: <filename>, like
cfdna: /fq/160220_NS500713_0040_AHVNG2BGXX/20160220-cfdna-001_S1_R1_001.fastq.gz
cfdna: /fq/160220_NS500713_0040_AHVNG2BGXX/20160220-cfdna-001_S1_R2_001.fastq.gz
not-cfdna: /fq/160220_NS500713_0040_AHVNG2BGXX/20160220-gdna-002_S2_R1_001.fastq.gz
not-cfdna: /fq/160220_NS500713_0040_AHVNG2BGXX/20160220-gdna-002_S2_R2_001.fastq.gz
Add -q or --quite to enable quite output mode, in which it will only output:
- a file with name of
cfdna, but prediction isnot-cfdna - a file without name of
cfdna, but prediction iscfdna
Train a model
This tool has a pre-trained model (cfdna.model), which can be used for prediction. But you still can train a model by yourself.
- prepare/link all your fastq files in some folder
- for files from
cfdna, includecfdna(case-insensitive) in the filename, like20160220-cfdna-015_S15_R1_001.fq - for files from
genomic DNA, includegdna(case-insensitive) in the filename, like20160220-gdna-002_S2_R1_001.fq - for files from
FFPE DNA, includeffpe(case-insensitive) in the filename, like20160123-ffpe-040_S0_R1_001.fq - run:
python train.py /fastq_folder/*.fq
Citation
If you used CfdnaPattern for your publication, please cite: https://doi.org/10.1109/TCBB.2017.2723388
Full options:
python training.py <fastq_files> [options]
Options:
--version show program's version number and exit
-h, --help show this help message and exit
-m MODEL_FILE, --model=MODEL_FILE
specify which file to store the built model.
-a ALGORITHM, --algorithm=ALGORITHM
specify which algorithm to use for classfication,
candidates are svm/knn/rbf/rf/gnb/benchmark, rbf means
svm using rbf kernel, rf means random forest, gnb
means Gaussian Naive Bayes, benchmark will try every
algorithm and plot the score figure, default is knn.
-c CFDNA_FLAG, --cfdna_flag=CFDNA_FLAG
specify the filename flag of cfdna files, separated by
semicolon. default is: cfdna
-o OTHER_FLAG, --other_flag=OTHER_FLAG
specify the filename flag of other files, separated by
semicolon. default is: gdna;ffpe
-p PASSES, --passes=PASSES
specify how many passes to do training and validating,
default is 10.
-n, --no_cache_check if the cache file exists, use it without checking the
identity with input files
Related Skills
node-connect
334.9kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
82.3kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
334.9kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
82.3kCommit, push, and open a PR
