Strategy | No. students |
---|---|
Cognitive Reappraisal | 17 |
Expressive Suppression | 8 |
Rumination | 13 |
Problem Solving | 6 |
Acceptance | 10 |
Distraction | 8 |
Workshop
Introduction to RStudio and your first graphs
Exercises
🧠 Emotion‑Regulation Example
The goal
A sample of 62 first‑year psychology students completed the “Emotion Regulation Questionnaire” and were classified according to their primary emotion‑regulation strategy. You are going to type data in R, summarise and plot it.
The data are as a frequency table:
You will create a figure like this:
Creating the data
Start by making a vector called strategy
that holds strategy s
Write the following in your script:
# strategy s
strategy <- c("Cognitive Reappraisal",
"Expressive Suppression",
"Rumination",
"Problem Solving",
"Acceptance",
"Distraction")
Remember, the shortcut for <-
is Alt+- (hold the Alt key down then hit the minus key ).
Notice I have used a comment. Comment your code as much as possible!
Ensure your cursor is on the line with the command and do Control+Enter to send the command to the console to be executed.
Examine the ‘structure’ of the
strategy
object using str()
str(strategy)
chr [1:6] "Cognitive Reappraisal" "Expressive Suppression" "Rumination" ...
It’s vector of 6 character values, chr
Create a vector called
freq
containing the numbers of students with each strategy and examine it with str()
.
Check
sum(freq)
gives the answer you expect:
# the total Number of students
sum(freq)
[1] 62
Packages
Commands like c()
, sum()
, and str()
are in packages which are part the ‘base’ R system. A package is a collection of related commands. Base packages are installed automatically when you install R.
Other packages, such as ggplot2
(Wickham 2016) need to be installed once and then loaded each session. ggplot2
is one of the tidyverse
(Wickham et al. 2019) packages.
If you are working on a University computer (or the VDS) you do not need to install tidyverse
.
If you are working on your own computer or using RStudio cloud you do need to install tidyverse
.
To install a package:
Go the Packages tab on the lower right pane. Click Install and type tidyverse
into the box that appears. DO not do if working on a Uni computer.
Wait until you get the prompt back. It will take a few moments, be patient!
To use a package which is installed you have to load it with the library()
function. You will need to do this whether you are working on your own computer or on a University computer
Load the
tidyverse
:
You will likely be warned of some function name conflicts but these will not be a problem for you.
Plotting the data with ggplot()
ggplot()
takes a dataframe for an argument
We can make a dataframe of the two vectors, strategy
and freq
using the data.frame()
function.
Make a dataframe called
strategy_data
strategy_data <- data.frame(strategy, freq)
Check the structure of
strategy_data
Click on strategy_data
in the Environment to open a spreadsheet-like view of it.
A barplot
Create a simple barplot using
ggplot
like this:
ggplot()
alone creates a blank plot.
ggplot(data = strategy_data)
looks the same.
aes()
gives the ‘Aesthetic mappings’. How variables (columns) are mapped to visual properties (aesthetics) e.g., axes, , shapes.
Thus…
ggplot(data = strategy_data, aes(x = strategy, y = freq))
produces a plot with axes
geom_col
A ‘Geom’ (Geometric object) gives the visual representations of the data: points, lines, bars, boxplots etc.
Note that ggplot2
is the name of the package and ggplot()
is its most important command.
Using the help manual
‘Arguments’ can be added to the geom_col()
command inside the brackets.
Commands do something and their arguments (in brackets) and can specify:
- what object to do it to
- how exactly to do it
Many arguments have defaults so you don’t always need to supply them.
Open the manual page for
geom_col()
using:
?geom_col
The manual page has several sections.
-
Description an overview of what the command does
-
Usage lists arguments
- form: argument name = default value
- some arguments MUST be supplied others have defaults
-
...
means etc and includes arguments that can be passed to many ‘geoms’
- form: argument name = default value
- Arguments gives the detail about the arguments
-
Details describes how the command works in more detail
- Value gives the output of the command
- Don’t be too perturbed by not fully understanding the information
Customising the plot
Bar
Change the fill of the bars using
fill
:
Colours can be given by their name, “lightblue” or code, “#ADD8E6”.
Look up colour by name or code
Change the bars to a colour you like.
fill
is one of the arguments covered by ...
. fill
is an ‘aesthetic’. If you look for ...
in the list of arguments you will see it says:
Other arguments passed on to layer(). These are often aesthetics, used to set an aesthetic to a fixed value, like = “red” or size = 3. They may also be parameters to the paired geom/stat.
We just set the `fill` aesthetic to a fixed value.
Further down the manual, there is a section on Aesthetics which lists those understood by geom_col()
We can set (map) the fill
aesthetic to a fixed inside geom_col()
or map it to a variable from the dataframe inside the aes()
instead. This means the will be different for different values in that variable.
Map the
fill
aesthetic to the strategy
variable:
Note that we have taken fill = "lightblue"
out of the geom_col()
and instead put fill = strategy
in the aes()
.
Use the manual to put the bars next to each other. Look for the argument that will mean there is no space between the bars.
Use the manual to change the of the lines around each bar to black.
Changing the axes
We can make changes to the axes using:
- Changes to a discrete x axis:
scale_x_discrete()
- Changes to a continuous y axis:
scale_y_continuous()
ggplot
automatically extends the axes slightly. You can turn this behaviour off with the expand
argument in scale_x_discrete()
and scale_y_continuous()
.1
Remove the gap between the axes and the data:
ggplot(data = strategy_data, aes(x = strategy, y = freq)) +
geom_col(fill = "lightblue",
width = 1,
colour = "black") +
scale_x_discrete(expand = c(0, 0)) +
scale_y_continuous(expand = c(0, 0))
Each ‘layer’ is added to the ggplot() command with a +
Make your code easier to read by using white space and new lines
- put spaces around
=
,->
and after,
- use a newline after every comma in a command with lots of arguments
Look up
scale_x_discrete
in the manual and work out how to change the axis title from “strategy” to “strategy”. Also change the y-axis title.
I would prefer to see the y-axis extend a little beyond the data and we can change the axis “limits” in the
scale_y_continuous()
ggplot(data = strategy_data, aes(x = strategy, y = freq)) +
geom_col(fill = "lightblue",
width = 1,
colour = "black") +
scale_x_discrete(expand = c(0, 0),
name = "strategy ") +
scale_y_continuous(expand = c(0, 0),
name = "Number of students",
limits = c(0, 25))
Getting rid of the grey background
The grey grid background is useful for examining plots on a screen but for a report of publication you will want a more scientific style. Every aspect of the “theme” of a plot - the non-data elements such as fonts, background s, axis line s etc - can be controlled individually2 but there are some handy built in themes that apply several changes at once. One of these is theme_classic()
Add
theme_classic()
to the plot:
ggplot(data = strategy_data, aes(x = strategy, y = freq)) +
geom_col(width = 1,
colour = "black",
fill = "lightblue") +
scale_x_discrete(expand = c(0, 0),
name = "strategy ") +
scale_y_continuous(expand = c(0, 0),
name = "Number of students",
limits = c(0, 25)) +
theme_classic()
Changing the order of bars
The default ordering of a categorical variable like strategy
is alphabetical. Often we want to change the order. For example, you might want a “Control” on the left or the categories might have an inherent order (e.g., small, medium and large). We can alter (mutate
) the strategy
variable using fct_relevel()
.
Make “Distraction” the first category:
strategy_data <- strategy_data |>
mutate(strategy = fct_relevel(strategy, "Distraction"))
|>
|>
is called the “pipe”. A keyboard shortcut is Control+Shift+M
The pipe puts the output of one command (one the left) as input to another command (on the right). It can be read as “and then”. You will more about it next week.
Now plot again.
If you wanted Distraction and then Rumination you would do fct_relevel(strategy, c("Distraction", "Rumination")
We can also order the categories by the values in another variable by using reorder()
in the plot code.
Reorder the categories in
strategy
by the the value in freq
:
ggplot(data = strategy_data,
aes(x = reorder(strategy, freq, decreasing = TRUE),
y = freq)) +
geom_col(width = 1,
colour = "black",
fill = "lightblue") +
scale_x_discrete(expand = c(0, 0),
name = "strategy ") +
scale_y_continuous(expand = c(0, 0),
name = "Number of students",
limits = c(0, 25)) +
theme_classic()
We have one more thing to do before we finish. The overlapping category labels are not great. One solution for this is to add “wrapping” to the labels so that they are not too long. We can use the scales
package function label_wrap(10)
to do this.
Add
labels = scales::label_wrap(10)
to the scale_x_discrete()
command.
strategy_data <- data.frame(strategy, freq)
ggplot(data = strategy_data,
aes(x = reorder(strategy, freq, decreasing = TRUE),
y = freq)) +
geom_col(colour = "black", fill = "lightblue", width = 1) +
scale_x_discrete(name = "Strategy",
expand = c(0, 0),
labels = scales::label_wrap(10)) +
scale_y_continuous(name = "Number of students",
expand = c(0, 0),
limits = c(0, 25)) +
theme_classic()
The number is the maximum number of characters in a line before it wraps to the next line.
Return to the workshop page.
References
Footnotes
There are also
scale_x_continous()
andscale_y_discrete()
functions when you have those types of variable↩︎