2 days ago4 min read
5 days ago4 min read

Volcano plots are a powerful tool for visualizing large datasets, especially in fields like bioinformatics and genomics. They allow you to quickly identify statistically significant data points, such as genes with large fold changes in expression. This guide provides a comprehensive, step-by-step tutorial on how to create and customize stunning, publication-ready volcano plots in R, based on the excellent tutorial by biostatsquid.com.
A volcano plot is a type of scatterplot that simultaneously displays the statistical significance (p-value) and the magnitude of change (fold change) of data points. In genomics, for example, each point on the plot represents a gene. The y-axis typically shows the -log10 of the p-value, so more significant genes appear higher on the plot. The x-axis represents the log2 fold change, with upregulated genes on the right and downregulated genes on the left.
This visualization technique is invaluable for interpreting the results of differential gene expression analysis, allowing researchers to quickly pinpoint the most biologically significant genes.
Before you can create a volcano plot, you need to set up your R environment. This involves loading the necessary libraries. The tidyverse package, which includes ggplot2 and dplyr, is essential for data manipulation and plotting. You will also need RColorBrewer for custom color palettes and ggrepel to prevent text labels from overlapping.
library(tidyverse)
library(RColorBrewer)
library(ggrepel)
Next, you need to load your data into R. The data should be from a differential gene expression analysis and contain columns for gene names, log2 fold change, and p-values. For this tutorial, we will use a sample dataset from a human COVID T cell single-cell RNA-seq study.
With your data loaded, you can now create a basic volcano plot using ggplot2. You'll map the log2 fold change to the x-axis and the -log10 of the p-value to the y-axis.
ggplot(data = df, aes(x = log2fc, y = -log10(pval))) +
geom_point()
You should also add threshold lines to your plot to indicate which genes are considered statistically significant (e.g., p-value < 0.05) and which have a significant fold change (e.g., log2 fold change > 0.6 or < -0.6).
Now for the fun part: making your volcano plot publication-ready! You can customize almost every aspect of your plot.
Coloring Points: Color the points based on whether they are upregulated, downregulated, or not significant.
Adding Gene Annotations: Use ggrepel to add labels for specific genes of interest without them overlapping.
Changing the Theme: Apply a different theme to your plot for a cleaner look.
Editing Axis Labels and Titles: Add a title and customize the axis labels, including subscripts for log2 and log10.
Once you are happy with your volcano plot, you can save it in various formats, such as PDF or PNG, using R's export functions. This will allow you to easily incorporate your plot into publications, presentations, or reports.

