Fastman
An R package for fast and efficient visualizing of GWAS results using Q-Q and Manhattan plots directly from PLINK output files.
Install / Use
/learn @adhikari-statgen-lab/FastmanREADME
fastman
Description
An R package for fast and efficient visualizing of GWAS results using Q-Q and Manhattan plots directly from PLINK output files.
- Fast: Drastically reduces time in plot generation compared to qqman. On a typical imputed PLINK assoc file of 10 million SNPs, plotting time is reduced from 737s in qqman to 60s.
- Efficient: Optimized memory management
- Versatile: Can handle various inputs from p-values, and logarithms of p-values to F<sub>ST</sub> scores. Compatible plotting with other genome-wide population genetic parameters (e.g. F<sub>ST</sub>, π and D statistics). Allows both-sided scores, e.g. scores with negative values.
- Non-model friendly: Additional support for results from genomes of non-model organisms (often with hundreds of contigs or many scaffolds), alphabetical and other ordering options.
- Annotation and Highlight versatility: Has a wide set of options to customize annotating and highlighting SNPs of interest.
- Familiar: Has a similar set of input arguments and code structure compared to qqman.
- Missing Value Handling: Can handle missing values in the input data frame.
- Gene Annotation Capability: Can annotate with gene names instead of SNP names if the user prefers. Currently, the package provides in-built support for human builds GRCh36, 37 and 38, but other builds, e.g., for non-model organisms, can also be provided as input by the user.
Installation
If you are using Rstudio you can use the following code to install the package.
devtools::install_github('adhikari-statgen-lab/fastman',build_vignettes = TRUE)
If you are not using Rstudio, we would recommend that you install the package without building the vignette.
devtools::install_github('adhikari-statgen-lab/fastman',build_vignettes = FALSE)
Functions:
1. fastman
Description
Creates a Manhattan plot directly from a PLINK assoc output (or any data frame with chromosome, position, and p-value).
Usage
fastman (m, chr = "CHR", bp = "BP", p = "P", snp, chrlabs, speedup=TRUE, logp = TRUE, scattermore = FALSE,
col="matlab", maxP=14, sortchr=TRUE, bybp=FALSE, chrsubset, bprange, highlight,
annotateHighlight=FALSE, annotatePval, colAbovePval=FALSE, col2="greys", annotateTop=TRUE,
annotationWinMb, annotateN, annotationCol, annotationAngle=45, baseline=NULL, suggestiveline,
genomewideline, cex=0.4, cex.text=0.4, cex.axis=0.6, scattermoresize = c(3000,1800),
geneannotate = FALSE, build, sep="|", border=0, xlab, ylab, xlim, ylim, gap.axis=NA, ...)
Parameters:
- m = A data frame containing data for producing the Manhattan plot. Has to contain a minimum of three columns: base pair position, chromosome ID, and P-value: defaults are "BP", "CHR", and "P" following the Plink assoc file convention. And optionally some ID, e.g. the SNP ID (default "SNP”) if annotations are needed. See explanations of the next four parameters if your column names differ, e.g. for non-model organisms, you can use contid ID instead of chromosome ID.
- chr = A string denoting the column name for the chromosome. Defaults to “CHR”, which corresponds to the PLINK –assoc command output. For non-model organisms, this could be the contig ID. In case your chromosome column is numeric but has been converted into string during the reading of data in R, you must pay close attention to the sorting order of chromosomes. If you still want the chromosomes to be sorted in an increasing order of chromosome number then you must convert your chromosome column to numeric before using the function. If your data is already sorted then you do not need to convert the column to numeric, you can specify
sortchr = FALSEin your input arguments instead. - bp = A string denoting the column name for the chromosomal position. Defaults to “BP”, which corresponds to the PLINK –assoc command output. The column must be numeric.
- p = A string denoting the column name for the p-values or scores for the SNP association tests. Defaults to “P”, which corresponds to the PLINK –assoc command output. The column must be numeric. You can also provide an already-computed log of p-values, e.g. from published summary statistics.
- snp = A string denoting the column name for the SNP name (rs number). The column must be character.
- chrlabs = An optional character vector of length equal to the number of chromosomes, specifying the chromosome labels. e.g., you can provide
c(1:22, "X", "Y", "MT")to convert the Plink numerical notation of 23=X, 24=Y, etc. This character vector is used to create the axis labels of the Manhattan plot. So, you should sort the character vector in the order you want the chromosome labels to appear in the final plot. For example, if your input data frame has chromosome numbers in a particular order you specifically want, and you have used the optionsortchr = FALSEto preserve the order for your final plot, then yourchrlabsvector should also have the same order of chromosomes. - speedup = A logical value; if TRUE, the function employs the faster method where input values at the extreme 0.2% are rounded to 3 digits, and the rest is rounded to 2 digits. The default value of this parameter is TRUE.
- logp = A logical value; if TRUE, negative logarithms (base 10) of p-values are plotted. In case the user wants to use F<sub>ST</sub> score type data or logarithm of p-values directly, then logp must be stated to be FALSE, as the default value of this parameter is TRUE.
- scattermore = A logical value; if TRUE, uses
scattermorepackage to speed up plot generation faster. In case the user wants to use this feature, thescattermorepackage needs to be installed and loaded before running the command. The default value of this parameter is FALSE. - col = A string indicating the colour scheme of the plot. Defaults to “matlab”. There are various options available for users. See below for details.
- maxP = A numeric value indicating the maximum y-value till which the user wants to visualize. The default value of this parameter is 14. If the data has negative values then both sides are truncated till the absolute value of the parameter. The user can provide NULL as input if truncation is not required.
- sortchr = A logical value; if TRUE, the table is sorted by chromosome number before plotting. If not specified by the user, the function takes the default value TRUE.
- bybp = A logical value; if TRUE, the y-values are plotted against chromosome positions. In this case, the table is not sorted by chromosome number before plotting. If not specified by the user, the function takes the default value FALSE. This feature is useful, especially for plots where the user might be interested in studying the association p-values across contigs.
- chrsubset = The subset of chromosome numbers to be plotted.
- bprange = The range of chromosome positions to be plotted. In case the user wants to subset the X-axis by region, then this should be the parameter of choice, not xlim.
- highlight = A character vector of SNPs in the dataset to highlight. These SNPs should all be in the dataset.
- annotateHighlight = A logical value; if TRUE, annotates all highlighted SNPs in case more specific annotation instructions are not provided.
- annotatePval = A numeric value, if set, SNPs with p-values below this will be annotated on the plot. In the case of p-value, the user can provide either the p-value or the negative logarithm of the p-value as input for this argument, whichever is convenient. In the case of scores, the user can provide the score cutoff directly as input.
- colAbovePval = A logical value, if TRUE, will colour all hits above the specified p-value threshold using the colour scheme chosen in col argument (default "matlab"), while the points below the threshold will be coloured using the colour scheme chosen in col2 argument below (default "greys"). Defaults to FALSE.
- col2 = A string indicating the colour scheme of the part of the plot below the specified p-value threshold. Defaults to “greys”. There are various options available for users. See below for details.
- annotateTop = A logical value; If TRUE, only annotates the top hit on each chromosome that is below the annotatePval threshold. This is just a modifier, and it works only when used with either annotatePval, annotateHighlight or annotateN.
- annotationWinMb = A numeric value, if set, will determine the megabase window within which the top SNP will be annotated. This is just a modifier, and it works only when used with either annotatePval, annotateHighlight or annotateN.
- annotateN = A numeric value, if set, this number of top SNPs will be annotated on the plot.
- annotationCol = A string indicating the colour of the annotation or the column name containing the colour information for the individual rows. The user can provide a column as a part of the input data frame which contains annotation colour information corresponding to each SNP and specify the column name in this parameter. If the user does not want annotation for some particular SNP, NA can be provided in this column corresponding to that SNP. In case the user provides just a string indicating the annotation colour instead of a column name, then the same colour will be used for annotating all the SNPs. Defaults to grey.
- annotationAngle = The angle of annotation, defaults to 45 degrees.
- baseline = The position to draw a baseline in black. Defaults to NULL, as a typical Manhattan plot already has a baseline at y=0. In case the data has a left tail (e.g. two-sided scoes) the user might want to provide a baseline position for reference. In case multiple baselines are required, the user can provide a vector of positions.
- suggestiveline = The position to draw a GWAS "suggestive significance" line. Defaults to
Related Skills
node-connect
339.5kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
83.9kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
339.5kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
83.9kCommit, push, and open a PR
