Introduction to R Graphics - From Basic to High-Quality Plots
— The post is part of my presentation for the Computational Biology Workshop for Clinicians (March 6, 2025) at AIIMS Kalyani.
Introduction to R Graphics - From Basic to High-Quality Plots
— The post is part of my presentation for the Computational Biology Workshop for Clinicians (March 6, 2025) at AIIMS Kalyani.
R provides a versatile and flexible system for creating a variety of plots. In this section, we start with scatter plots, which are one of the simplest ways to visualize relationships between two numeric variables.
Before plotting, we generate some data for demonstration:
# Generate data
x <- 1:30
y <- rnorm(30, mean = x) # Normally distributed values centered around x
y2 <- rnorm(30, mean = x, sd = sqrt(x)) # More variation based on x
Here,
y is generated using rnorm() with a mean equal to x, simulating a linear trend with some randomness.
y2 has an increasing variance as x increases, using sd = sqrt(x).
plot(x, y)
plot(x, y) creates a simple scatter plot with x on the x-axis and y on the y-axis.
A scatter plot with default black circles representing the data points.
plot(x, y, type = "l")
type = "l" changes the representation from points to lines connecting the data points.
A line plot instead of points.
plot(x, y, type = "b")
type = "b" displays both points and connecting lines.
A plot where points and lines are shown together.
plot(x, y, type = "b", pch = 4)
pch = 4 changes the point shape to an “X” marker.
A scatter plot with lines and X-shaped points.
plot(x, y, type = "b", pch = 2, col = "blue")
col = "blue" changes the color of the points and lines to blue.
pch = 2 changes the point shape to a triangle.
A blue scatter plot with lines and triangle-shaped points.
abline(c(0, 1)) # Intercept = 0, Slope = 1
abline(c(0,1)) adds a reference line with intercept 0 and slope 1.
A scatter plot with a diagonal reference line passing through the origin.
points(x, y2, col = "red")
points(x, y2, col = "red") adds another dataset (y2) in red without replacing the existing plot.
A scatter plot where the original dataset is in blue, and the new dataset is in red.
plot(x, y2, col = "orange", xlab = "my x-label", ylab = "yyy")
xlab = "my x-label" sets a custom x-axis label.
ylab = "yyy" sets a custom y-axis label.
col = "orange" colors the points and lines orange.
A scatter plot with custom axis labels and orange points.
plot(x, y2, xlim = c(1,10), ylim = c(1,5))
xlim = c(1,10) restricts the x-axis range to 1 to 10, ignoring values outside this range.
ylim = c(1,5) restricts the y-axis range to 1 to 5, ignoring values outside this range.
A scatter plot where only those points whose x-values are from 1 to 10 and y-values from 1 to 5 are displayed.
Histograms are useful for visualizing the distribution of a dataset. They show how data points are distributed across different bins, helping to understand frequency patterns, skewness, and variability.
In this section, we will generate a dataset using a Poisson distribution and explore different ways to customize histograms in R.
# Create a random dataset of 100 numbers from a Poisson distribution with mean 3
d1 <- rpois(100, lambda = 3)
The rpois() function generates 100 random numbers from a Poisson distribution with a mean (λ) of 3.
This simulates count data, often used in biological and ecological studies.
hist(d1)
hist(d1) creates a default histogram of d1, automatically setting the number of bins.
A histogram displaying the frequency distribution of values in d1, with automatically chosen bins.
hist(d1, breaks = 4)
breaks = 4 forces R to divide the data into 4 bins.
A histogram with 4 bins, which may provide a less detailed distribution.
hist(d1, breaks = c(0, 1, 3, 5, 7, 11, 21))
breaks = c(0, 1, 3, 5, 7, 11, 21) manually defines the edges of the bins instead of relying on automatic calculations.
A histogram where bin sizes are irregular, allowing finer control over how data is grouped.
hist(d1, freq = TRUE)
freq = TRUE ensures the histogram shows absolute frequencies (the count of data points in each bin).
This is the default setting when bins are of equal size.
A histogram displaying counts on the y-axis.
hist(d1, freq = FALSE)
freq = FALSE normalizes the histogram to show density instead of raw counts.
The area under the histogram sums to 1, making it useful for comparing distributions.
A histogram with density values on the y-axis instead of counts.
z <- hist(d1, plot = FALSE)
plot = FALSE prevents R from displaying the histogram but still stores the results in z.
This is useful when you need to extract histogram properties without visualizing it.
z$counts # Number of values in each bin
z$mids # Midpoints of the bins
z$counts gives the number of values in each bin.
z$mids returns the midpoints of the bins.
These values can be used for further analysis, such as overlaying a density curve.
> z$counts # Number of values in each bin
[1] 25 22 17 13 11 7 4 0 0 0 1
> z$mids # Midpoints of the bins
[1] 0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5
Bar plots are useful for visualizing categorical data by representing counts, proportions, or numerical values associated with categories. They help compare group sizes, distributions, and trends effectively. A bar plot is distinct from a histogram as it is specifically used for categorical data. Each bar represents a distinct category in a bar plot, and spaces separate the bars. In contrast, a histogram is used for continuous data, where data is grouped into bins, and the bars are adjacent to show the frequency distribution of values.
In this section, we will explore how to create bar plots in R using different datasets and customization options.
data(islands) # Load dataset containing areas of various islands
barplot(islands)
The islands dataset contains land areas (in 1000 square miles) of various world islands.
barplot(islands) creates a vertical bar plot, where each bar represents an island's area.
A vertical bar plot showing different islands and their corresponding areas.
barplot(islands, horiz = TRUE)
Setting horiz = TRUE makes the bars horizontal, which is useful when dealing with long labels or emphasizing ranking.
A horizontal bar plot with the same data.
barplot(islands, horiz = TRUE, las = 1)
las = 1 rotates axis labels horizontally for better readability.
Useful when category names are long or crowded.
A horizontal bar plot with horizontally aligned labels.
data(iris) # Load dataset
barplot(height = iris$Petal.Width, beside = TRUE, col = iris$Species)
The iris dataset contains measurements of Sepal and Petal dimensions across three species (setosa, versicolor, virginica).
height = iris$Petal.Width plots the petal width values as bars.
beside = TRUE ensures bars for different species are placed side-by-side instead of stacked.
col = iris$Species colors bars based on species, distinguishing groups visually.
A grouped bar plot where bars represent Petal Width and are colored according to species.
Box plots (also called box-and-whisker plots) are useful for visualizing the distribution, spread, and potential outliers in numerical data across different categories. They summarize key statistics, including:
Median (Q2) – the middle value
Interquartile Range (IQR) – the range between Q1 (25th percentile) and Q3 (75th percentile)
Whiskers – data spread beyond IQR
Outliers – extreme values outside whiskers
xx <- data.frame(iris) # Load iris dataset
boxplot(xx$Petal.Width ~ xx$Species, col = c("red", "green", "blue"))
The iris dataset contains petal and sepal measurements for three flower species: setosa, versicolor, and virginica.
xx$Petal.Width ~ xx$Species groups Petal Width by Species, displaying separate box plots for each species.
col = c("red", "green", "blue") assigns different colors to each species for clear differentiation.
A colored box plot showing the distribution of Petal Width across the three species:
setosa (red)
versicolor (green)
virginica (blue)
Density plots help visualize the distribution of continuous data in a smooth and interpretable way. Unlike histograms, density plots use kernel density estimation (KDE) to create a continuous probability distribution.
In this section, we explore 2D density visualization using filled.contour() to represent a 3D surface in 2D.
# Generate normally distributed data
x <- sort(rnorm(100)) # 100 random values sorted
y <- sort(rnorm(50)) # 50 random values sorted
# Compute the outer product to create a density grid
z <- x %o% y
# 3D density plot in 2D
filled.contour(z)
rnorm(n) generates n normally distributed random values (mean = 0, sd = 1 by default).
sort() ensures the values are arranged in increasing order.
x %o% y computes the outer product, creating a grid of values that serve as height values in a density function.
filled.contour(z) produces a 2D filled contour plot, where:
Contours represent different density levels (like a topographic map).
Colors indicate density intensity, with darker shades showing higher density.
A smooth 2D density plot (contour plot) where different colors indicate varying densities.
Advanced plotting in R allows for better visualization by mainly incorporating scaling, labeling, legends, themes, and facets. These features help in making plots more informative, readable, and aesthetically appealing.
In this section, we explore scaling options in ggplot2 to control axes, colors, and transformations for enhanced data visualization.
Example: Scatter Plot with Continuous Scaling
# Load ggplot2
library(ggplot2)
# Create a dataset
data <- data.frame(x = rnorm(100), y = rnorm(100))
# Scatter plot with continuous scaling
ggplot(data, aes(x = x, y = y)) +
geom_point() +
scale_x_continuous(name = "X-axis label") +
scale_y_continuous(name = "Y-axis label")
scale_x_continuous() and scale_y_continuous() explicitly define the axis labels for continuous variables.
This ensures proper interpretation of the numeric scale on both axes.
Example: Bar Plot with Discrete Scaling
# Create a dataset
data <- data.frame(category = c("A", "B", "C", "D"), value = c(10, 15, 8, 12))
# Bar plot with discrete scaling
ggplot(data, aes(x = category, y = value, fill = category)) +
geom_bar(stat = "identity") +
scale_x_discrete(name = "Categories") +
scale_y_continuous(name = "Values")
scale_x_discrete() is used when the x-axis represents categorical data (e.g., "A", "B", "C", "D").
The fill aesthetic is mapped to the category variable, coloring the bars accordingly.
Example: Line Plot with Logarithmic Scaling
# Create a dataset
data <- data.frame(x = 1:10, y = exp(1:10))
# Line plot with logarithmic scaling
ggplot(data, aes(x = x, y = y)) +
geom_line() +
scale_y_log10(name = "Logarithmic Scale")
scale_y_log10() applies a log transformation to the y-axis, useful for datasets with exponential growth.
This technique helps visualize skewed distributions or large numeric ranges.
Example: Scatter Plot with Color Scaling
# Create a dataset
data <- data.frame(x = rnorm(100), y = rnorm(100), color_var = rnorm(100))
# Scatter plot with color scaling
ggplot(data, aes(x = x, y = y, color = color_var)) +
geom_point() +
scale_color_continuous(name = "Color Legend")
scale_color_continuous() maps a continuous variable to color intensity.
The color gradient visually represents variations in the third numeric variable (color_var).
Proper labeling in plots improves clarity, interpretation, and presentation. In this section, we cover:
Axis Labels
Plot Titles
Legend Titles
Example: Scatter Plot with Axis Labels
# Load ggplot2
library(ggplot2)
# Create a dataset
data <- data.frame(x = rnorm(100), y = rnorm(100))
# Scatter plot with labeled axes
ggplot(data, aes(x = x, y = y)) +
geom_point() +
labs(x = "X-axis label", y = "Y-axis label")
The labs(x = "...", y = "...") function adds custom labels for the x and y axes.
Helps in understanding variable meanings on each axis.
Example: Bar Plot with a Title
# Create a dataset
data <- data.frame(category = c("A", "B", "C", "D"), value = c(10, 15, 8, 12))
# Bar plot with a title
ggplot(data, aes(x = category, y = value, fill = category)) +
geom_bar(stat = "identity") +
labs(title = "Bar Plot with Title", x = "Categories", y = "Values")
labs(title = "...") adds a descriptive title to the plot.
Titles make plots more informative by summarizing insights.
Example: Scatter Plot with a Legend Title
# Create a dataset
data <- data.frame(x = rnorm(100), y = rnorm(100), group = rep(c("A", "B"), each = 50))
# Scatter plot with legend title
ggplot(data, aes(x = x, y = y, color = group)) +
geom_point() +
labs(title = "Scatter Plot with Legend Title",
x = "X-axis label",
y = "Y-axis label",
color = "Group") +
theme(legend.title = element_text(size = 12))
labs(color = "Group") customizes the legend title.
theme(legend.title = element_text(size = 12)) adjusts the legend text size.
Legends improve visualization by clarifying groupings and aesthetics. Here, we explore:
Color Legends
Shape Legends
Size Legends
Fill Legends
Legend Positioning
Example: Scatter Plot with Color Legend
library(ggplot2)
# Create a dataset
data <- data.frame(x = rnorm(150), y = rnorm(150), group = rep(c("A", "B", "C"), each = 50))
# Scatter plot with color legend
ggplot(data, aes(x = x, y = y, color = group)) +
geom_point() +
labs(title = "Scatter Plot with Color Legend", x = "X-axis label", y = "Y-axis label")
aes(color = group) assigns different colors to groups A, B and C.
ggplot2 automatically generates a legend for the color aesthetic.
Example: Scatter Plot with Shape Legend
# Create a dataset
data <- data.frame(x = rnorm(100), y = rnorm(100), group = rep(c("A", "B"), each = 50))
# Scatter plot with shape legend
ggplot(data, aes(x = x, y = y, shape = group)) +
geom_point(size = 3) +
labs(title = "Scatter Plot with Shape Legend", x = "X-axis label", y = "Y-axis label")
aes(shape = group) assigns different shapes to groups.
Useful when printing in black and white (avoids reliance on colors).
Example: Scatter Plot with Size Legend
# Create a dataset
data <- data.frame(x = rnorm(100), y = rnorm(100), size_var = runif(100, 2, 8))
# Scatter plot with size legend
ggplot(data, aes(x = x, y = y, size = size_var)) +
geom_point() +
labs(title = "Scatter Plot with Size Legend", x = "X-axis label", y = "Y-axis label")
aes(size = size_var) scales point size by a continuous variable.
Ideal for emphasizing importance (e.g., population size).
Example: Bar Plot with Fill Legend
# Create a dataset
data <- data.frame(category = c("A", "B", "C", "D"), value = c(10, 15, 8, 12), group = rep(c("X", "Y"), each = 2))
# Bar plot with fill legend
ggplot(data, aes(x = category, y = value, fill = group)) +
geom_bar(stat = "identity") +
labs(title = "Bar Plot with Fill Legend", x = "Categories", y = "Values")
aes(fill = group) colors bars by group.
Helps distinguish categories visually.
Example: Moving the Legend to the Bottom-Right
# Create a dataset
data <- data.frame(x = rnorm(100), y = rnorm(100), group = rep(c("A", "B"), each = 50))
# Scatter plot with repositioned legend
ggplot(data, aes(x = x, y = y, color = group)) +
geom_point() +
labs(title = "Scatter Plot with Bottom-Right Legend", x = "X-axis label", y = "Y-axis label") +
theme(legend.position = "bottom", legend.justification = "right")
legend.position = "bottom" moves the legend.
legend.justification = "right" aligns it to the right side.
Themes in ggplot2 allow customization of plot appearance. Here, we explore:
Default Theme
Minimal Theme
Classic Theme
Dark Theme
Custom Themes
library(ggplot2)
# Create a dataset
data <- data.frame(x = rnorm(100), y = rnorm(100))
# Scatter plot with default theme
ggplot(data, aes(x = x, y = y)) +
geom_point() +
labs(title = "Scatter Plot with Default Theme", x = "X-axis label", y = "Y-axis label")
The default ggplot2 theme includes gray background and gridlines.
Good for quick exploratory analysis.
# Scatter plot with minimal theme
ggplot(data, aes(x = x, y = y)) +
geom_point() +
labs(title = "Scatter Plot with Minimal Theme", x = "X-axis label", y = "Y-axis label") +
theme_minimal()
theme_minimal() removes background color and gridlines, keeping only major ones.
Best for presentations and reports.
# Scatter plot with classic theme
ggplot(data, aes(x = x, y = y)) +
geom_point() +
labs(title = "Scatter Plot with Classic Theme", x = "X-axis label", y = "Y-axis label") +
theme_classic()
theme_classic() removes gridlines and background but keeps axis lines.
Good for academic papers.
# Scatter plot with dark theme
ggplot(data, aes(x = x, y = y)) +
geom_point() +
labs(title = "Scatter Plot with Dark Theme", x = "X-axis label", y = "Y-axis label") +
theme_dark()
theme_dark() is useful for dark mode interfaces or contrast-based visualizations.
Helps in low-light environments.
# Scatter plot with custom theme
ggplot(data, aes(x = x, y = y)) +
geom_point() +
labs(title = "Scatter Plot with Custom Theme", x = "X-axis label", y = "Y-axis label") +
theme(
axis.text = element_text(size = 12, color = "blue"),
plot.title = element_text(hjust = 0.5, size = 16, face = "bold")
)
element_text(size = 12, color = "blue") changes axis text size and color.
plot.title = element_text(hjust = 0.5, size = 16, face = "bold") centers and bolds the title.
Highly customizable for publications and branding.
Facets allow visualization of subsets of data in separate panels within the same figure. We explore:
Facet Wrap (single categorical variable)
Facet Grid (two categorical variables)
Free Scales in Facets
library(ggplot2)
# Create a dataset
data <- data.frame(x = rnorm(200), y = rnorm(200), category = rep(c("A", "B"), each = 100))
# Scatter plot with facet wrap
ggplot(data, aes(x = x, y = y)) +
geom_point() +
facet_wrap(~ category) +
labs(title = "Scatter Plot with Facet Wrap", x = "X-axis label", y = "Y-axis label")
facet_wrap(~ category) creates separate plots for each category (A & B).
Useful when you have one categorical variable.
Arranges plots in a flexible grid.
# Create a dataset with two categorical variables
data <- data.frame(x = rnorm(200), y = rnorm(200), category1 = rep(c("A", "B"), each = 100), category2 = rep(c("X", "Y"), times = 100))
# Scatter plot with facet grid
ggplot(data, aes(x = x, y = y)) +
geom_point() +
facet_grid(category1 ~ category2) +
labs(title = "Scatter Plot with Facet Grid", x = "X-axis label", y = "Y-axis label")
facet_grid(category1 ~ category2) creates a matrix-like layout.
Each combination of category1 (A & B) and category2 (X & Y) gets a separate panel.
Good for structured comparisons.
# Scatter plot with facet wrap and free scales
ggplot(data, aes(x = x, y = y)) +
geom_point() +
facet_wrap(~ category1, scales = "free") +
labs(title = "Scatter Plot with Facet Wrap and Free Scales", x = "X-axis label", y = "Y-axis label")
scales = "free" allows each facet to have independent axes.
Useful when data ranges vary greatly between groups.
These plots use the above-mentioned advanced features to create high-quality graphics. Please note that all input data files used in the construction of these plots can be downloaded from: https://github.com/utpalmtbi/R-Graphics
This scatter plot visualizes the relationship between Gene Expression and Cell Viability, with different colors representing distinct treatment groups.
# Load ggplot2 package
library(ggplot2)
# Read dataset
biological_data <- read.csv('adv_plot_1.csv')
# Scatter plot
ggplot(biological_data, aes(x = GeneExpression, y = CellViability, color = Treatment)) +
geom_point(size = 3, alpha = 0.8) + # Customizing point size and transparency
labs(title = "Scatter Plot of Gene Expression vs. Cell Viability",
x = "Gene Expression",
y = "Cell Viability") +
theme_minimal() + # Using a minimal theme for a clean look
scale_color_manual(values = c("blue", "red")) # Setting custom colors for treatments
Plots Gene Expression vs. Cell Viability:
The x-axis represents Gene Expression levels.
The y-axis represents Cell Viability percentages.
Colors Different Treatment Groups:
Each point represents a sample.
The color of the point indicates the treatment group (e.g., Control vs. Treated).
Customizations for Better Readability:
Point Size (size = 3) → Increases point visibility.
Transparency (alpha = 0.8) → Reduces overlap for a clearer plot.
Minimal Theme (theme_minimal()) → Makes the plot cleaner by removing unnecessary grid elements.
Manual Color Mapping (scale_color_manual(values = c("blue", "red"))) → Ensures Control samples are blue and Treatment samples are red.
Coloring treatment groups makes it easier to distinguish patterns visually.
Adding transparency ensures that overlapping points don’t obscure the data.
The minimal theme reduces clutter, making the graph more readable.
This line plot visualizes how Gene Expression changes over time for different groups. Each group is represented by a separate line, showing trends in expression levels across different time points.
# Load ggplot2 package
library(ggplot2)
# Read dataset
time_series_data <- read.csv('adv_plot_2.csv')
# Line plot
ggplot(time_series_data, aes(x = Time, y = GeneExpression, color = Group, group = Group)) +
geom_line(linewidth = 1.5) + # Customizing line thickness
geom_point(size = 3, shape = 21, fill = "white") + # Customizing point appearance
labs(title = "Time Series Plot of Gene Expression",
x = "Time Points",
y = "Gene Expression") +
theme_bw() + # Using a black and white theme for a classic look
scale_color_manual(values = c("blue", "green")) + # Setting custom colors for groups
guides(fill = guide_legend(override.aes = list(shape = 21, size = 3))) # Adjusting legend appearance
Plots Gene Expression over Time:
The x-axis represents Time Points.
The y-axis represents Gene Expression levels.
Each line represents a different Group (e.g., Control vs. Treated).
Uses Lines and Points for Clarity:
A line connects the points for each group to show trends over time.
Points are added to indicate actual data values at each time point.
Customizations for Better Readability:
Line Thickness (linewidth = 1.5) → Makes the trend lines clearer.
Point Style (shape = 21, fill = "white") → Uses hollow circles to highlight data points.
Black & White Theme (theme_bw()) → Provides a high-contrast, clean background.
Manual Color Mapping (scale_color_manual(values = c("blue", "green"))) → Ensures different groups have distinct colors.
Custom Legend (guides()) → Makes the legend easier to interpret.
Using both lines and points makes it easier to interpret trends while ensuring data points remain visible.
The black-and-white theme provides a clean and professional look.
Manually setting colors ensures consistency and better visual contrast.
A well-structured legend improves readability and clarity.
This box plot visually summarizes the distribution of Gene Expression levels for different treatment groups. It highlights key statistical properties such as medians, quartiles, and outliers.
# Load ggplot2 package
library(ggplot2)
# Read dataset
biological_data <- read.csv('adv_plot_3.csv')
# Box plot
ggplot(biological_data, aes(x = Treatment, y = GeneExpression, fill = Treatment)) +
geom_boxplot(width = 0.6, notch = TRUE, outlier.shape = 16, outlier.size = 3) + # Customize box appearance
labs(title = "Box Plot of Gene Expression by Treatment",
x = "Treatment",
y = "Gene Expression") +
theme_minimal() + # Use a minimal theme
scale_fill_manual(values = c("lightblue", "lightcoral")) + # Set custom fill colors
guides(fill = guide_legend(override.aes = list(shape = NA))) # Remove legend symbols
Compares Gene Expression Across Treatment Groups:
The x-axis represents Treatment Groups (e.g., Control vs. Treated).
The y-axis represents Gene Expression levels.
Each box represents the interquartile range (IQR), with the median shown as a horizontal line inside the box.
Identifies Data Spread & Outliers:
Whiskers extend to the smallest and largest values within 1.5× IQR.
Outliers (dots) are values outside this range.
Customizations for Clarity:
Box Width (width = 0.6) → Adjusts the width for better spacing.
Notched Boxes (notch = TRUE) → Adds a notch to compare medians visually.
Outlier Shape & Size (outlier.shape = 16, outlier.size = 3) → Ensures outliers are clearly visible.
Minimal Theme (theme_minimal()) → Removes unnecessary gridlines for a clean look.
Manual Fill Colors (scale_fill_manual()) → Ensures consistency in treatment group colors.
Legend Cleanup (guides(fill = guide_legend(override.aes = list(shape = NA)))) → Removes unnecessary symbols in the legend.
Notches help determine if medians are significantly different.
Outlier customization ensures anomalies stand out.
Manual color selection prevents default ggplot2 colors from being misleading.
A minimal theme improves focus on data rather than gridlines.
This bar plot visually represents the mean gene expression across different treatment groups. The addition of error bars provides insights into data variability.
# Load ggplot2 package
library(ggplot2)
# Read dataset
biological_data <- read.csv('adv_plot_4.csv')
# Bar plot
ggplot(biological_data, aes(x = Treatment, y = MeanExpression, fill = Treatment)) +
geom_bar(stat = "identity", position = "dodge", width = 0.6, color = "black") + # Customize bar appearance
geom_errorbar(aes(ymin = MeanExpression - SDExpression, ymax = MeanExpression + SDExpression),
position = position_dodge(0.6), width = 0.25) + # Add error bars
labs(title = "Bar Plot of Mean Gene Expression by Treatment",
x = "Treatment",
y = "Mean Gene Expression") +
theme_minimal() + # Use a minimal theme
scale_fill_manual(values = c("lightblue", "lightgreen", "lightcoral")) + # Set custom fill colors
guides(fill = guide_legend(override.aes = list(shape = NA))) # Remove legend symbols
Compares Mean Gene Expression Across Treatment Groups:
The x-axis represents Treatment Groups (e.g., Control, Treated, etc.).
The y-axis represents Mean Gene Expression levels.
Each bar represents the mean expression for a given treatment.
Includes Error Bars to Show Variability:
Standard deviation (SD) is displayed using vertical error bars.
The top and bottom of each error bar show Mean ± SD, indicating variability.
Customizations for Better Visualization:
Dodged Bars (position = "dodge") → Bars are placed side by side for better comparison.
Bar Width (width = 0.6) → Ensures proper spacing between bars.
Bar Borders (color = "black") → Enhances contrast for better visibility.
Minimal Theme (theme_minimal()) → Reduces unnecessary gridlines.
Custom Colors (scale_fill_manual()) → Assigns distinct colors for each treatment.
Legend Cleanup (guides(fill = guide_legend(override.aes = list(shape = NA)))) → Removes unnecessary symbols in the legend.
Error bars provide a measure of data spread, making the plot more informative.
Dodged bars prevent overlap, making group comparisons clearer.
Manual color assignment ensures consistency in treatment representation.
Adding borders to bars enhances visual clarity.
This violin plot is a powerful way to visualize the distribution of gene expression across treatment groups. It combines features of a box plot and a density plot, showing both summary statistics and the full data distribution.
# Load ggplot2 package
library(ggplot2)
# Read dataset
biological_data <- read.csv('adv_plot_5.csv')
# Violin plot
ggplot(biological_data, aes(x = Treatment, y = GeneExpression, fill = Treatment)) +
geom_violin(width = 0.8, trim = FALSE, draw_quantiles = c(0.25, 0.5, 0.75), fill = "lightblue") + # Violin appearance
geom_jitter(position = position_jitter(width = 0.2), size = 2, color = "black") + # Add jittered points
labs(title = "Violin Plot of Gene Expression by Treatment",
x = "Treatment",
y = "Gene Expression") +
theme_minimal() + # Use a minimal theme
scale_fill_manual(values = c("lightblue", "lightcoral")) + # Set custom fill colors
guides(fill = guide_legend(override.aes = list(shape = NA))) # Remove legend symbols
Displays Distribution Shape & Density:
The violin shape shows where most data points are concentrated.
A wider section indicates a higher density of values.
Adds Statistical Summaries:
Quartiles (draw_quantiles = c(0.25, 0.5, 0.75)) → Shows median and interquartile ranges.
No trimming (trim = FALSE) → Retains full range of data without cutting tails.
Includes Individual Data Points:
Jittered points (geom_jitter()) prevent overlapping, improving readability.
Each black dot represents an individual observation.
Customizations for Better Visualization:
Violin Width (width = 0.8) → Ensures proper spacing.
Custom Fill Colors (scale_fill_manual()) → Different colors for treatment groups.
Minimal Theme (theme_minimal()) → Reduces distractions.
Legend Cleanup (guides()) → Removes unnecessary legend symbols.
Violin plots provide a more informative alternative to box plots by showing both summary statistics and full data distribution.
Jittered points prevent overlapping, ensuring all individual data points are visible.
Displaying quartiles helps in statistical interpretation.
Retaining the full range (trim = FALSE) prevents misleading conclusions.
The heatmap visualization provides an intuitive way to analyze biological data patterns across samples. Though there are several ways to visualize heatmaps, here we discuss two versions and their key features:
Version 1: ggplot2 + viridis Heatmap
# Load necessary libraries
library(ggplot2)
library(viridis)
library(reshape2)
# Read biological dataset
df <- read.csv('adv_plot_6.csv')
# Reshape data for ggplot2
melted_data <- melt(df, id.vars = "Gene")
# Generate heatmap
heatmap_plot <- ggplot(melted_data, aes(x = variable, y = Gene)) +
geom_tile(aes(fill = value), color = "white") + # Heatmap grid
scale_fill_viridis_c() + # Use viridis color scale
theme_minimal() +
labs(title = "Biological Data Heatmap",
x = "Samples", y = "Genes")
heatmap_plot
Modify Axis Labels & Titles
heatmap_plot + labs(title = "Customized Heatmap", x = "Samples", y = "Genes")
Change Tile Size
heatmap_plot + geom_tile(width = 0.8, height = 0.8, aes(fill = value), color = "white")
Add Value Annotations
heatmap_plot + geom_text(aes(label = round(value, 2)), vjust = 1)
Customize Legend
heatmap_plot + guides(fill = guide_colorbar(title = "Expression Level"))
This version creates a customizable heatmap using ggplot2, where:
Data is first reshaped (reshape2::melt) for plotting.
geom_tile() fills the grid based on expression values.
The color scale is enhanced with viridis for better contrast.
Customizations include axis labels, tile size, annotations, and legends.
Version 2: pheatmap for Clustered Heatmap
# Load library
library(pheatmap)
# Example dataset: Extract gene expression-like data
data <- mtcars
heatmap_data <- data[c(1:7,9,11)]
annotation_data <- data[c(8,10)] # Metadata for annotation
# Define annotation colors
annotate <- list(
vs = c("0" = "blue", "1" = "red"),
gear = palette(gray.colors(100, start = 1, end = 0))
)
# Generate clustered heatmap
pheatmap(
heatmap_data,
annotation_row = annotation_data,
annotation_colors = annotate,
color = colorRampPalette(c("white", "blue", "red"))(100),
cellwidth = 40,
cellheight = 12,
fontsize_row = 5,
cluster_rows = TRUE,
cluster_cols = TRUE
)
This version uses pheatmap, which:
Automatically clusters genes/samples based on expression patterns.
Allows row and column annotations for extra metadata.
Supports custom color palettes (colorRampPalette(c("white","blue","red"))(100)).
Hierarchical Clustering (cluster_rows = TRUE, cluster_cols = TRUE)
Custom Color Palettes (colorRampPalette)
Row Annotations (annotation_row) for metadata
Adjustable Cell & Font Sizes (cellwidth, cellheight, fontsize_row)
Use ggplot2 version if you need highly customizable static heatmaps with precise control over aesthetics.
Use pheatmap version if you need automatic clustering and metadata annotations for gene/sample relationships.
This volcano plot is useful for visualizing differential expression in biological datasets. It helps identify significantly upregulated and downregulated genes based on fold change and statistical significance.
# Load necessary library
library(ggplot2)
# Read biological dataset
biological_data <- read.csv('adv_plot_7.csv')
# Generate volcano plot
ggplot(biological_data, aes(x = FoldChange, y = -log10(PValue), color = abs(FoldChange) > 2 & PValue < 0.05)) +
geom_point(alpha = 0.7, size = 3, shape = 16) + # Adjust point appearance
scale_color_manual(values = c("grey", "red")) + # Manually set color values
labs(title = "Volcano Plot of Differential Expression",
x = "Log2 Fold Change",
y = "-log10(P-Value)") +
theme_minimal() + # Minimal theme for clarity
theme(legend.position = "none") + # Remove legend
geom_hline(yintercept = -log10(0.05), linetype = "dashed", color = "blue") + # Significance threshold
annotate("text", x = 3, y = -log10(0.05) + 0.5, label = "Significance Threshold", color = "blue") # Threshold label
Add gene labels for top significant hits
ggplot(biological_data, aes(x = FoldChange, y = -log10(PValue), color = abs(FoldChange) > 3 & PValue < 0.05)) +
geom_point(alpha = 0.7, size = 3) +
geom_text(aes(label = ifelse(abs(FoldChange) > 3 & PValue < 0.05, Gene, "")), hjust = 0.5, vjust = -0.5) +
theme_minimal() + # Minimal theme for clarity
theme(legend.position = "none") + # Remove legend
geom_hline(yintercept = -log10(0.05), linetype = "dashed", color = "blue") + # P-value cutoff line
geom_vline(xintercept = c(-3, 3), linetype = "dashed", color = "darkgreen") # Fold Change cutoff lines