GGPLOT2 Tutorial
Introduction
This is a tutorial for using R
and ggplot2
to make publication-ready plots for use in the social sciences.1 The tutorial assumes moderate familiarity with R
and base functions. I primarily made this as a learning tool for my POSC 3003/6003 courses at Arkansas State University. As such, my focus is on applications relevant to political scientists but I suspect this could be of interest to many others interested in R
graphics. The associated video vignette (for students in my course) on the topic provides additional details and a walkthough every aspect of the code provided here. We will be using built-in (or generated) datasets for this entire document as I want this to remain as self-contained as possible. I hope this document can serve both as a guide for learning ggplot2
along with some tidyverse
operations as well as a long-term resource on how to make a given graph. Thanks goes to the many, many folks who have provided their code openly and freely for the rest of us to learn from. All credit goes to them.
I should note that I am far from the best expert on this topic, and there are likely better tutorials out there. It is also worth noting that there are many plot types I will not include. Finally, I will not be covering every single possible aspect of plot customization. In any case, I hope some folks find this useful. If you find issues or have suggestions please hit me up on GitHub (@cwimpy)
or via e-mail.
I move along in a series of sections. First I introduce ggplot2
for the one reader that has never heard of it. Then I show a series of examples for getting started along with important features. After that I show how to make a plot look nice before moving into as many examples as I can conjure. I am always adding to this section. Finally, I conclude by showing some extensions and other helpful tips such as exporting plots and using packages written to deal with special ggplot2
problems.
The source code for this document can be found in my GitHub here: https://github.com/cwimpy/ggplot2-tutorial, please also use this if you have suggestions or improvements!
The Grammar of Graphics
Now let’s move on to to learning about ggplot2
. ggplot2
is an R
graphic package developed by Hadley Wickham to implement (a layered version) the grammar of graphics system of data visualization proposed by Leland Wilkinson.2 The general idea is that all plots should have some basic features that make them interchangeable in the same way, no matter what is being plotted. Think of it as a language for reading and writing plots. In practice ggplot2
has become the gold standard for plotting statistical data in most statistical-heavy disciplines such as bio-statistics, economics, political science, and statistics. The basic ggplot2
implementation of the grammar of graphics is summarized as follows:
- a default dataset with aesthetic mappings,
- one or more layers, each with a geometric object(“geom”), a statistical transformation(“stat”), and a dataset with aesthetic mappings (possibly defaulted),
- a scale for each aesthetic mapping (which can be automatically generated),
- a coordinate system, and
- a facet specification.
We will tackle the meaning of these items in practical terms as we proceed. The basic idea is that a plot can (perhaps should) be built in layers and should consist of a core set of items in order to adhere to the grammar of graphics.
Getting Started
We will use the mtcars
dataset for many of our example plots. Let’s have a look at what are in those data. Note that I loaded the tidyverse
package quietly at the beginning of this document and that it includes ggplot2
and other tools we will be using throughout (dplyr
, for example).
glimpse(mtcars)
## Rows: 32
## Columns: 11
## $ mpg <dbl> 21.0, 21.0, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 19.2, 17.8,…
## $ cyl <dbl> 6, 6, 4, 6, 8, 6, 8, 4, 4, 6, 6, 8, 8, 8, 8, 8, 8, 4, 4, 4, 4, 8,…
## $ disp <dbl> 160.0, 160.0, 108.0, 258.0, 360.0, 225.0, 360.0, 146.7, 140.8, 16…
## $ hp <dbl> 110, 110, 93, 110, 175, 105, 245, 62, 95, 123, 123, 180, 180, 180…
## $ drat <dbl> 3.90, 3.90, 3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.92, 3.92,…
## $ wt <dbl> 2.620, 2.875, 2.320, 3.215, 3.440, 3.460, 3.570, 3.190, 3.150, 3.…
## $ qsec <dbl> 16.46, 17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20.00, 22.90, 18…
## $ vs <dbl> 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0,…
## $ am <dbl> 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0,…
## $ gear <dbl> 4, 4, 4, 3, 3, 3, 3, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 4, 4, 4, 3, 3,…
## $ carb <dbl> 4, 4, 1, 1, 2, 1, 4, 2, 2, 4, 4, 3, 3, 3, 4, 4, 4, 1, 2, 1, 1, 2,…
Now we can draw a blank plot where are mapped aesthetics are miles per gallon (mpg
) and car weight (wt
).
ggplot(mtcars, aes(wt, mpg))
Now we can get started with ggplot()
, the primary (and necessary) engine underlying the ggplot2
package.
Aesthetics
Our plot above, created by ggplot(mtcars, aes(wt, mpg))
, included aesthetic mappings for our two variables car weight and miles per gallon. The mapping provided by aes(x, y)
is bar far the most important part of aesthetics, but it is also the easiest. In most common use cases of ggplot2
the (x, y)
portion of the code simply refers to our x
and y
variables—two things we should already know! We will start adding other aesthetics as we move along but these are absolutely key.
A Note on Coding Style for ggplot2
Before moving further, I want to make sure that the code structure is clearly understood. Experienced R
users (or otherwise experienced programmers) will likely struggle with this far less than I did for the longest time. Let’s first look at the core aspects of the ggplot()
code structure:
ggplot(data = data, mapping = aes(x = x, y = y, ...), ...)
The first part of what we see here is the ggplot()
function itself. The second data
, x
, and y
in this case all refer to those in our data. Inside the outer set of parentheses we first point the plotting function to our data (this needs to be an R
dataframe that is available in the workspace—or it can be a tibble). This is the first place where we see major variations in code. Some people will leave off the data =
part and just type the name of the data object as a shortcut. I do this as well, but it can be confusing for folks new to ggplot2
. The next block is for mapping our aesthetics. We rarely see the mapping =
part of the code, and we can again just shortcut things by using aes()
. Within the aes()
parentheses, we then map our x
and y
variables by following the same convention. That is, we either say aes(x = x, y = y)
or simply aes(x, y)
. Most of the remaining parts of the function typically need to specific, if for nothing else human readability. We will get to those later.
The next place we see major variation in code is whether folks assign their plot to an object or just run the code outright. This is not a huge distinction for actually making a plot when using well-structured code, but how we read the code certainly changes. Since R
is an object-oriented programming language we can assign the whole plot to an object just like we can a dataset, variable, function, etc. The best way to see how this works is with an example. Returning to where we were above, we could instead assign our blank plot to an object called “p
.” Why “p
?” No reason other than it is most commonly used in these cases. You could call it “plot
,” or anything you like as long as it adheres to R
’s object naming standards (Wickham (2019)).
<- ggplot(mtcars, aes(wt, mpg)) p
Now, using R
’s object assignment operator, <-
, we have assigned our plot to the object called “p
.” But notice also that the plot did not automatically return. To get that we need to simply ask R
for it using the object name.
p
Now, where things get sometimes confusing, and this is jumping ahead, is when folks add additional layers to the plot. If we wanted to see some points on the plot with our initial code above we could do this:
ggplot(mtcars, aes(wt, mpg)) +
geom_point()
Notice that we do this by adding a +
(telling ggplot()
there is a new layer incoming) and then we type the new code. I like to put each new layer on its own line and indent a little after the first line. That helps human readability and allows us to root out bugs in the code. Using the object method we could do something similar.
<- ggplot(mtcars, aes(wt, mpg))
p + geom_point() p
Can you see any issue with this approach? It does work, but in my mind using the object +
method is somewhat void of utility (short of some type of exhibition of functionality for vignettes) given that to get the new layer we have to call the object and the new layer together. What if, though, we reassigned the new layer to our object like this?
<- ggplot(mtcars, aes(wt, mpg))
p <- p + geom_point() p
Now, the new geom_point()
layer is also assigned to p
and when we call p
we should get it all:
p
There are uses to this approach, but it needs to be thought through. It makes reading the code a bit more difficult, and all of the assignment operators and plus signs can get busy. Keep in mind that you could also create new objects with each layer should that be useful. We will return to this later when I talk about saving our plots, but that should really only be a final step taken with care. For now, just know that when you see something like p <- ggplot(...)
you should be able to just return p
to get to the same place as you would be by using ggplot(...)
directly.
Finally, with the advent of the so-called pipe (%>%
) operator from the maggritr package and later dplyr
, among others, we will sometimes see folks do this:
%>%
mtcars ggplot(aes(wt, mpg))
Indeed, I do this on occasion. Do not be scared off by the code, however, as we are just replacing our ggplot(data = data
,…) with data %>% ggplot(...)
and thus it is a simple reordering of the code. Since we cannot work with pipes beyond this in ggplot
it really just becomes a niche habit and not particularity useful otherwise.3
Geometric Shapes (geom
) and Statistical Transformations (stat
)
Let’s get back to how to make plots. Our next step is adding geometric shapes and statistical transformations. We had a glimpse at this earlier, but let’s resume with the same plot:
ggplot(mtcars, aes(wt, mpg)) +
geom_point()
In this case, we have asked ggplot()
to add points to the plot and thus we now have a scatterplot. The “geoms” are numerous, and they are the key to what type of plot you end up with whether it be a bar graph, line graph, histogram, or more. See a list here. Let’s add a linear best fit (regression) line to our scatterplot using another geom called geom_smooth()
.
ggplot(mtcars, aes(wt, mpg)) +
geom_point() +
geom_smooth(formula = y ~ x, method = lm)
Note that geom_smooth
needs to know what kind of method we are using and the formula for the calculation of the slope. In this case we are using similar syntax to that of base R
’s regression model function, lm
. Our formula simply tells ggplot()
we want to see the relationship denoted by \(y = f(x)\). Notice also that this our first example of layering two geoms on the same plot. This is key to understanding how the ggplot2
system works. If we want to plot points and labels (see below), we first plot the points and then we add a layer of labels that are relative to the points. We do not plot points and label them at the same time like we would in most other graphic implementations. Once you get this, you will begin to think of your plots as a set of layers that can be added or removed based on your needs.
What if we instead wanted a horizontal line in the middle of the scatterplot? Note that this type of geom requires that you know the coordinates of the data you are plotting.
ggplot(mtcars, aes(wt, mpg)) +
geom_point() +
geom_abline(intercept = 22.5, slope = 0)
Let’s look at some other options starting with a bar plot. We will start by looking at the number of cars for each number of possible carburetors.
ggplot(mtcars, aes(carb)) +
geom_bar()
We will get back to some more interesting bar plot as this one is akin to a histogram, which we can also easily do since there is a geom for it. Returning to mpg
ggplot(mtcars, aes(mpg)) +
geom_histogram(bins = 10)
Relatedly, we can easily make a density plot:
ggplot(mtcars, aes(mpg)) +
geom_density()
There are many types of geoms, but hopefully now you are getting an idea of how we use them. We will see quite a few more by the time we are done so for now I want to move on to the next basic building block of a plot.
Scales
Scales are used within ggplot2
to determine how the data being plotted looks to those of us reading the plot. I use scales in two primary ways, one to control the colors of the geoms and the other to format the axes. There are more extensions as well, but let’s just focus on these two for now to create an illustrative example.
ggplot(mtcars, aes(wt, mpg, color = carb)) +
geom_point() +
scale_color_continuous()
Notice that first I added to our aesthetics by using color = carb
to say I wanted to color the mapped data by the number of carburetors. I think tell ggplot()
that we want to treat this as a continuous scale. There are lots of variants to scale_color_...
and I will cover at least several of them below. The other way in which I use scales is for axis formatting. Since our x
-axis here is measured in thousands it gives us a nice chance to fix things up. Rather than writing something like (in thousands) in the axis title, we can just build some formatting right into our code. A nifty feature of ggplot2
is that we can do mathematical operations on variables right in the plotting function. So, I am going to multiply the wt
variable by 1,000
.
ggplot(mtcars, aes(wt * 1000, mpg, color = carb)) +
geom_point() +
scale_color_continuous()
That’s much better, and we will work on the axis titles later. We can go further, however, by adding a comma separator. This involves using another package, scales
, but we will call it using the ::
syntax and thus using library()
to explicitly load the package is not necessary.
ggplot(mtcars, aes(wt * 1000, mpg, color = carb)) +
geom_point() +
scale_color_continuous() +
scale_x_continuous(labels = scales::comma)
Notice here that our additional line of code was scale_x_continuous(labels = scales::comma)
. With this we are saying we want to rescale the x
-axis (we can do y
as well.) Then, we are using the labels =
option to call the comma formatting from the scales
package. This is another really cool use of ggplot2
as we can do similar formatting for percents and currency, for example. Those are two places I see folks really mail it in graphing.4
Coordinate Systems
You have probably already noticed that ggplot2
builds graphs on Cartesian coordinates by default, and it explicitly draws a grid of these coordinates thus mapping the x
and y
aesthetics quite clearly. We can decide whether we need this level of detail for the coordinates later, but for now it is important to understand a few basics. First and foremost, the default is Cartesian and these can be flipped (using the aptly named coord_flip()
function) such that x
and y
swap places. The most basic example is to use our bar plot from above:
ggplot(mtcars, aes(carb)) +
geom_bar() +
coord_flip()
We will get to some nicer examples of flipped plots later, but there is just a bit more to say on coordinates. You can do some basic tweaking of the default coordinates by fixing the aspect ratio (coord_fixed()
), projecting maps (coord_map()
), or using coord_trans()
to do transformations of the coordinates (e.g., logarithms). Finally, all of this may make you wonder how in the world you would make a pie chart using ggplot2
. The answer from many (and sometimes me) would be that you shouldn’t. But, if you must, coord_polar
is a good place to start.5
Facets
Facets are essentially a way of paneling your plot based on different categories or levels of variables. Let’s return to our scatterplot from above for an example that will make things clear. The difference is that this time we will split the observations into two plots, one for manual transmissions (1
) and one for automatic (0
).
ggplot(mtcars, aes(wt, mpg)) +
geom_point() +
geom_smooth(formula = y ~ x, method = lm) +
facet_wrap(vars(am))
There are two types of faceting, facet_grid()
and facet_wrap()
. In this case they are essentially equivalent but when we get to more categories below we will see the distinction more clearly. In terms of the syntax, much of the older code you will see would instead have something like this facet_wrap(.~am)
but the newer versions of the package allow for the cleaner usage of vars()
. You can facet by multiple variables if you wish, and you can control many things about how the facets look and are organized. We will have more examples below.
Making Your Plot Look Good
There are many, many opinions on how plots should look. As such, the style we end up with here will mainly be my own preference. Hopefully between this and the next section I can give you the tools you need to make the plots look as you would like them to. This is also where the library of options within ggplot2
explodes, so I am certainly not calling this a comprehensive treatment of plot improvement beyond the default.6
Virtually everything about a plot can be changed. We will cover the fundamentals but I am sure some things will be left off. Luckily an online search often points you to a Stack Overflow thread where someone has experienced a similar issue before. Let’s get started with some basics. Most all things we want to change about the look of a plot happens within the theme()
function, but the main exception is with the labels for the axes, title, subtitle, and caption. These use the labs
function. Let’s see how that works.7
ggplot(mtcars, aes(wt, mpg, color = carb)) +
geom_point() +
geom_smooth(formula = y ~ x, method = lm) +
scale_color_continuous() +
scale_x_continuous(labels = scales::comma) +
labs(x = "Car Weight in Pounds", y = "Miles per Gallon",
title = "The Relationship Between Car Weight and Miles Per Gallon",
subtitle = "1974 Data")
These are fairly easy, just pay attention to the quotation marks and commas. Now, what if we want a better legend title. As with some things in ggplot2
there are multiple routes to get there. I typically just use name = ""
inside of the scaling function as follows:
ggplot(mtcars, aes(wt, mpg, color = carb)) +
geom_point() +
geom_smooth(formula = y ~ x, method = lm) +
scale_color_continuous(name = "Carburetors") +
scale_x_continuous(labels = scales::comma) +
labs(x = "Car Weight in Pounds", y = "Miles per Gallon",
title = "The Relationship Between Car Weight and Miles Per Gallon",
subtitle = "1974 Data")
We will deal with many more legend examples as we go along. Before getting into that, however, let me talk about some of the basics in terms of plot styling. As I mentioned above, most of our remaining action here will be inside of the theme()
function. One of the original hallmarks of ggplot2
was the grey background in the plotting (panel) area with the white grid lines intersecting on both major and minor grids corresponding with the axis tick labels and halfway between them. That is what we have been seeing so far. Let’s start by changing the background to white.
ggplot(mtcars, aes(wt, mpg, color = carb)) +
geom_point() +
geom_smooth(formula = y ~ x, method = lm) +
scale_color_continuous(name = "Carburetors") +
scale_x_continuous(labels = scales::comma) +
labs(x = "Car Weight in Pounds", y = "Miles per Gallon",
title = "The Relationship Between Car Weight and Miles Per Gallon",
subtitle = "1974 Data") +
theme(panel.background = element_rect(fill = "#FFFFFF"))
Notice that I started here with theme
and then referenced the specific aspect of the theme I wanted to change. panel.background
refers to the plotting area. plot.background
, on the other hand, refers to the area outside the plot. For example, we could do this to fill in the plot area with A-State red:
ggplot(mtcars, aes(wt, mpg, color = carb)) +
geom_point() +
geom_smooth(formula = y ~ x, method = lm) +
scale_color_continuous(name = "Carburetors") +
scale_x_continuous(labels = scales::comma) +
labs(x = "Car Weight in Pounds", y = "Miles per Gallon",
title = "The Relationship Between Car Weight and Miles Per Gallon",
subtitle = "1974 Data") +
theme(panel.background = element_rect(fill = "#FFFFFF"), plot.background = element_rect(fill = "#cc092f"))
Notice here that I added the additional theme()
option within the same function, but for clarity you could also just have a series of theme functions each on their own line. In any case, you would want to make your code legible as the options can get lengthy quite quickly. Now, if you ask me, no color at all in the background is the way to roll. That will leave us with a blank (albeit white) canvas for both the plot and panel.
ggplot(mtcars, aes(wt, mpg, color = carb)) +
geom_point() +
geom_smooth(formula = y ~ x, method = lm) +
scale_color_continuous(name = "Carburetors") +
scale_x_continuous(labels = scales::comma) +
labs(x = "Car Weight in Pounds", y = "Miles per Gallon",
title = "The Relationship Between Car Weight and Miles Per Gallon",
subtitle = "1974 Data") +
theme(panel.background = element_blank(),
plot.background = element_blank())
Notice here that we have gone from specifying a color for the rectangle elements (element_rect()
) to just making those elements blank. This lingo will become clearer the more time you spend with it, so if it is confusing now please just hang around a little longer. I like the grid lines as a default, so let’s bring them back in a casual grey but just for the major grids:
ggplot(mtcars, aes(wt, mpg, color = carb)) +
geom_point() +
geom_smooth(formula = y ~ x, method = lm) +
scale_color_continuous(name = "Carburetors") +
scale_x_continuous(labels = scales::comma) +
labs(x = "Car Weight in Pounds", y = "Miles per Gallon",
title = "The Relationship Between Car Weight and Miles Per Gallon",
subtitle = "1974 Data") +
theme(panel.background = element_blank(),
plot.background = element_blank(),
panel.grid.major = element_line(color = "#C4C4C4"))
Notice that we used panel.grid.major
here. As you may suspect, panel.grid.minor
would take care of those minor lines if we wanted them back. But, what may be less obvious, is that panel.grid
would handle the major and minor grids together if you so wished. This goes for lots of things in the themes. For example, if we wanted to make our x-axis text as 12 point we would do this: theme(axis.text.x = element.text(size = 12))
but we could make all the axis text as 12-point by doing this: theme(axis.text = element.text(size = 12))
. Again, this same pattern applies to many of the elements for which there are multiple iterations of the same thing.
But, ugh, now our grid lines conflict with our axis tick colors. We have two options, we could remove the ticks or change the colors (a lame third option would be to leave things as is). I prefer to just remove the ticks in this case:
ggplot(mtcars, aes(wt, mpg, color = carb)) +
geom_point() +
geom_smooth(formula = y ~ x, method = lm) +
scale_color_continuous(name = "Carburetors") +
scale_x_continuous(labels = scales::comma) +
labs(x = "Car Weight in Pounds", y = "Miles per Gallon",
title = "The Relationship Between Car Weight and Miles Per Gallon",
subtitle = "1974 Data") +
theme(panel.background = element_blank(),
plot.background = element_blank(),
panel.grid.major = element_line(color = "#C4C4C4"),
axis.ticks = element_blank())
This gets us to a fairly nice looking plot. We will explore some additional theme()
options below.
Themes
In terms of the overall look of the plot, ggplot2
includes a number of built-in themes. Let’s take a look at most of them here.8 Keep in mind you can do additional customization to any of these themes but your options will need to be added after the theme itself. Otherwise the change in theme would overwrite all of your changes.
Black & White
ggplot(mtcars, aes(wt * 1000, mpg, color = carb)) +
geom_point() +
geom_smooth(formula = y ~ x, method = lm) +
scale_color_continuous(name = "Carburetors") +
scale_x_continuous(labels = scales::comma) +
labs(x = "Car Weight in Pounds", y = "Miles per Gallon",
title = "The Relationship Between Car Weight and Miles Per Gallon",
subtitle = "1974 Data") +
theme_bw()
Classic
ggplot(mtcars, aes(wt * 1000, mpg, color = carb)) +
geom_point() +
geom_smooth(formula = y ~ x, method = lm) +
scale_color_continuous(name = "Carburetors") +
scale_x_continuous(labels = scales::comma) +
labs(x = "Car Weight in Pounds", y = "Miles per Gallon",
title = "The Relationship Between Car Weight and Miles Per Gallon",
subtitle = "1974 Data") +
theme_classic()
Dark
ggplot(mtcars, aes(wt * 1000, mpg, color = carb)) +
geom_point() +
geom_smooth(formula = y ~ x, method = lm) +
scale_color_continuous(name = "Carburetors") +
scale_x_continuous(labels = scales::comma) +
labs(x = "Car Weight in Pounds", y = "Miles per Gallon",
title = "The Relationship Between Car Weight and Miles Per Gallon",
subtitle = "1974 Data") +
theme_dark()
Light
ggplot(mtcars, aes(wt * 1000, mpg, color = carb)) +
geom_point() +
geom_smooth(formula = y ~ x, method = lm) +
scale_color_continuous(name = "Carburetors") +
scale_x_continuous(labels = scales::comma) +
labs(x = "Car Weight in Pounds", y = "Miles per Gallon",
title = "The Relationship Between Car Weight and Miles Per Gallon",
subtitle = "1974 Data") +
theme_light()
Linedraw
ggplot(mtcars, aes(wt * 1000, mpg, color = carb)) +
geom_point() +
geom_smooth(formula = y ~ x, method = lm) +
scale_color_continuous(name = "Carburetors") +
scale_x_continuous(labels = scales::comma) +
labs(x = "Car Weight in Pounds", y = "Miles per Gallon",
title = "The Relationship Between Car Weight and Miles Per Gallon",
subtitle = "1974 Data") +
theme_linedraw()
Minimal
ggplot(mtcars, aes(wt * 1000, mpg, color = carb)) +
geom_point() +
geom_smooth(formula = y ~ x, method = lm) +
scale_color_continuous(name = "Carburetors") +
scale_x_continuous(labels = scales::comma) +
labs(x = "Car Weight in Pounds", y = "Miles per Gallon",
title = "The Relationship Between Car Weight and Miles Per Gallon",
subtitle = "1974 Data") +
theme_minimal()
Void
ggplot(mtcars, aes(wt * 1000, mpg, color = carb)) +
geom_point() +
geom_smooth(formula = y ~ x, method = lm) +
scale_color_continuous(name = "Carburetors") +
scale_x_continuous(labels = scales::comma) +
labs(x = "Car Weight in Pounds", y = "Miles per Gallon",
title = "The Relationship Between Car Weight and Miles Per Gallon",
subtitle = "1974 Data") +
theme_void()
Building Your Own Theme
The best theme is your own theme. —Cameron Wimpy
Seriously, I think the best way to really enjoy ggplot2
is to really take control of how the plot looks. This matters less for exploratory efforts and drafting, but good looking plots can set your work apart in finished products. If we return to all the things we did to our theme earlier, we could actually roll those into our own theme function and thus our tweaks for a given plot would be minor since the new default (using our function) would already look close to how we like it. Notice, inside the theme()
function below I am just calling a bunch of theme options, some of which you already know.9 The rest of what I am doing here is just creating the function. I am overwriting the minimal theme as a starting point and giving you a place to update your fonts. Since fonts can be tricky and are often user/OS-specific, I will let you learn about that separately. But, as an example, I use the “SF Pro Text” font from Apple. To use this here, I would just replace "Helvetica"
below with "SF Pro Text"
.10
<- function () {
theme_cam theme_minimal(base_size = 10, base_family = "Helvetica") %+replace%
theme(
panel.background = element_blank(),
panel.border = element_blank(),
plot.background = element_blank(),
plot.margin = unit(c(1,2,1,1), "cm"),
legend.box.background = element_rect(fill = "white", color = NA),
legend.key = element_rect(fill = NA, color = NA, size = 4),
legend.text = element_text(color = "black", size = 12),
legend.title = element_text(color = "black", size = 14),
legend.position = "bottom",
legend.direction = "horizontal",
panel.grid.major = element_line(color = "#C4C4C4"),
panel.grid.minor = element_blank(),
axis.text = element_text(color = "black"),
axis.title = element_text(color = "black", size = 14,
margin = margin(t = 0, r = 10, b = 0, l = 0)),
plot.title = element_text(size = 16, color = "black", hjust = 0.5,
vjust = 2, margin = margin(b = 10)),
plot.subtitle = element_text(size = 14, color = "black", hjust = 0.5,
margin = margin(b = 12)),
plot.caption = element_text(size = 9, margin = margin(t = 10), color =
"black", hjust = 0)
) }
Now that we have done that let’s look to see how we end up. I will talk about some of the new theme()
options below.
ggplot(mtcars, aes(wt * 1000, mpg, color = carb)) +
geom_smooth(formula = y ~ x, method = lm) +
geom_point() +
scale_color_continuous(name = "Carburetors") +
scale_x_continuous(breaks = c(1500,2000,2500,3000,3500,4000,4500,5000,5500), labels = scales::comma) +
scale_y_continuous(breaks = c(5,10,15,20,25,30,35)) +
labs(x = "Car Weight in Pounds", y = "Miles per Gallon",
title = "The Relationship Between Car Weight and Miles Per Gallon",
subtitle = "1974 Data") +
theme_cam()
Notice that one thing I did here was to plot the linear fit first so the points sit on top. I also specified the exact breaks I wanted on the axis scales.
Otherwise, let’s talk a bit more about the theme options I have added. Most of them have some explanation built right into their names. plot.margin
allows for adjusting the extra space around the outside of the plot. I like to move the legend to the bottom by using legend.position = "bottom"
and then make it horizontal by using legend.direction = "horizontal"
. The other aspects worth mentioning here are the adjustments in plot.title
, plot.subtitle
, and plot.caption
. hjust = 0.5
moves the text to the center of the plot. In this case, 0
would be the far left and 1
would be the far right. 0.5
, of course, falls right in the middle. vjust = 2
just moves the title (in this case) a little higher. margin = margin(b = 10))
, for example, adds 10 points of space in the margin below. For these, b
stands for bottom, t
for top, l
for left, and r
for right. Most of the other items should make sense based on what we have seen so far but we will see a bit more on theme options in the section that follows.
Example Code for Lots of Plots
In this section I simply make as many plots as I can to provide example code and useful starting points. It is important to again stress that some of these examples are my own takes on other examples provided online by other generous R
/ggplot2
users. The credit should certainly go to them. In a few cases I also provide original examples or something from my own work.11 I expect this section to be the most living part of this document so check back (probably not often) should you be interested in updates. In a number of cases we need to do some minor data work before making our plot—hopefully you find those examples to be useful as well.
Bar Plots
Let’s start with a basic bar chart.
ggplot(mtcars, aes(carb)) +
geom_bar() +
labs(x = "Number of Carburetors", y = "Count") +
theme_cam()
What if we wanted to treat each level for the number of carburetors as discrete? We can just wrap it in factor()
and the use fill =
inside our aesthetic (aes()
) function.
ggplot(mtcars, aes(factor(carb), fill = factor(carb))) +
geom_bar() +
labs(x = "Number of Carburetors", y = "Count") +
theme_cam()
Ugh, who needs that ugly legend title? We will get to those in some other examples but for now let’s just kill the legend.
ggplot(mtcars, aes(factor(carb), fill = factor(carb))) +
geom_bar() +
labs(x = "Number of Carburetors", y = "Count") +
theme_cam() +
theme(legend.position = "none")
There is more than one way to remove the legend but this one is the clearest in my view. Notice that by putting theme(legend.position = "none")
below theme_cam
we overrode my preference for the legend in my custom theme function. Had we put it above the legend would still be there. Ordering matters in these things.
Grouped Bar Chart
How about a grouped bar char for horsepower by cylinder but grouped by transmission type?
ggplot(mtcars, aes(cyl, hp, fill=factor(am))) +
geom_bar(stat = "identity", position = position_dodge()) +
labs(title="Grouped Bar Chart",
x = "Cylinders",
y = "Horsepower") +
scale_fill_discrete(guide = guide_legend(title = "Transmission", nrow = 1),
labels=c("Automatic", "Manual")) +
scale_x_continuous(breaks = c(4,6,8)) +
theme_cam()
Notice some new options here. This is the first time I have used a discrete scale (scale_fill_discrete
), and this allows us to name the legend easily using the options avaialble in guide =
and then provide some custom labels for the discrete scale. Let’s go one step further and manually define the colors using scale_fill_manual
.
ggplot(mtcars, aes(cyl, hp, fill=factor(am))) +
geom_bar(stat = "identity", position = position_dodge()) +
labs(title="Grouped Bar Chart",
x = "Cylinders",
y = "Horsepower") +
scale_fill_manual(guide = guide_legend(title = "Transmission", nrow = 1),
labels=c("Automatic", "Manual"),
values = c("#008000","#ff0000")) +
scale_x_continuous(breaks = c(4,6,8)) +
theme_cam()
Stacked Bar Chart
For this one I am going to switch to a similar dataset, mpg
. I am going to make the car names uppercase using a helper function, but first I am going to make a working copy of the data.
<- mpg
mpg1
<- function(string) {
capitalize substr(string, 1, 1) <- toupper(substr(string, 1, 1))
string
}
$manufacturer <- capitalize(mpg1$manufacturer) mpg1
Now we can make the plot. Notice some changes here. First, I am calling the fill aesthetic within geom_bar
, and I also adjust the bar width. From there I am doing some adjusting on the theme based on how this plot turns out.
ggplot(mpg1, aes(manufacturer)) +
geom_bar(aes(fill = class), width = 0.5) +
labs(title="Stacked Bar Chart",
subtitle="Manufacturer of Vehicles",
x = "Manufacturer",
y = "Count") +
scale_fill_discrete(guide = guide_legend(title = "Class")) +
theme_cam() +
theme(axis.text.x = element_text(angle=65, vjust=0.6)) +
theme(legend.position = "right", legend.direction = "vertical")
Scatterplots
We already have a working scatterplot example, so I will start with something a little different and go from there.
Dot Charts
Line Charts
Extensions
One the most amazing aspects of ggplot2
is the sheer number of extensions that smart folks have created for it. In and of itself, ggplot2
solves more plotting problems that most any other plotting package I know of (within R
or not). When you add the literally dozens of additional extensions (usually prefixed with gg
) it becomes a tour de force in data visualization. The place to start is the extensions gallery on the main ggplot2
website. Some really incredible examples include gganimate
for making animated gifs, ggthemes
for some ready-made additional themes (including one that makes your graph look like it was made in default Stata!), ggforce
for some serious added functionality, and ggTimeSeries
for time-specific graphs. There are, of course, many more that are worth exploring. In the few examples that follow I am not showing the code as it the graphs are just proof of concepts mainly from the package vignettes. If you really want to learn these packages I recommend exploring them directly. If you really want the code then it is in the .Rmd document that generate this site.
gganimate
## No renderer backend detected. gganimate will default to writing frames to separate files
## Consider installing:
## - the `gifski` package for gif output
## - the `av` package for video output
## and restarting the R session
## Warning: No renderer available. Please install the gifski, av, or magick package
## to create animated output
## NULL
ggthemes
ggforce
ggtimeseries
It also warrants mentioning that many packages include ggplot2
objects in their own flavor of making graphs. The syntax ends up being different in most cases to get started, but the output is a ggplot2
object that can then be manipulated. One of my favorites in this regard is the sjPlot
package from @strengejacke
which I use for plotting Likert scales (see my example below). Others include urbnmapr
from the Urban Institute for making easy-peasy U.S. state maps (who wants to do that anyway?), choroplethr also for maps, and ggplotly for interactive graphics.
sjPlot
## List of 2
## $ legend.spacing.x: 'simpleUnit' num 0.1cm
## ..- attr(*, "unit")= int 1
## $ panel.grid : list()
## ..- attr(*, "class")= chr [1:2] "element_blank" "element"
## - attr(*, "class")= chr [1:2] "theme" "gg"
## - attr(*, "complete")= logi FALSE
## - attr(*, "validate")= logi TRUE
Where to Learn More
There are no shortage of other ggplot2
tutorials available online. Just snoop around until you find one you like via your search engine of choice. There are, however, a few stops I recommend along the way:
References & Notes
The tidyverse website is a wonderful place to spend your time.↩︎
If you really want to learn about this I suggest buying the respective books from Wickham, Wilkinson, and possibly others (Wilkinson (2006), Wickham (2016), Wickham and Grolemund (2017)). The R for Data Science book is also a great resource. I do not know these authors but highly respect their work (obviously, that’s the point of all this!).↩︎
Note that you need to load a package that uses the pipe in order to use it as
ggplot2
does not load it automatically.↩︎Personally, I am not a fan of using whole numbers and then putting “pct” as an axis title. Even worse is when one uses a decimal and calls it a percentage, but hey, we all have our quirks.↩︎
ggforce
has some nice implementations of pie charts, see example in this document.↩︎Years ago when I was a graduate student you wanted to use the default to show you could use
R
andggplot2
. These days, since everyone is doing it, the goal is often to move as far from the default as possible.↩︎The caption here comes from the official description of
mtcars
.↩︎For more themes, see the section in this document where I reference the
ggthemes
package. To be really cool, see the section that follows on creating your own theme.↩︎Note that you would need to install the font and make sure
R
knows how to find it and that you are using the exact right name. Again, if you are into this sort of thing, I suggest learning about it separate and apart fromggplot2
.↩︎In these cases to fully replicate the plot you will want to fork my repo and get the data.↩︎