Introduction to R and RStudio

Published

14 July, 2025

Introduction

  • Finding your way round RStudio
  • Typing in data and plotting it
  • Importing data: working directories and paths

Finding your way round RStudio

RStudio: live demonstration

  • the panels
    • Top left: Script - write and edit code and comments to keep
    • Bottom left: Console - where commands get executed and can be typed
    • Top right: Environment - where you can see the objects you have created; History - of commands
    • Bottom right: Files - a file explorer; Plots; Packages; Help
  • making yourself comfortable
  • typing in the console sending commands
  • using R as a calculator
  • assigning values
  • where to see objects
  • using a script - make sure to execute
  • comments #
  • data types and structures
  • functions c()
  • types of R files: .R, .RData .RHistory

Overview of demonstration

There are several ways you can recap the demo at a later date:

🖼️ Refer to this infographic Larger format infographic

📖 Read First Steps in RStudio

📹 Watch

Typing in data and plotting it

🎬 We code together

🛝 You try!

The goal 🐈 plot

We will work with some data on the coat colour of 62 cats. You are going to type data in R, summarise and plot it

The data are as a frequency table:

Frequency of coat colours in 62 cats
Coat colour No. cats
black 23
white 15
tabby 8
ginger 10
tortoiseshell 5
calico 1

You will create a figure like this:

Getting set up

  • 🎬 In RStudio do File | New project | New directory Be purposeful about where you create it and name it. I suggest cats-1

  • 🎬 Make a new script and save it as type-data-and-plot.R to carry out the rest of the work.

Creating the data

Start by making a vector called coat that holds coat colours

🎬 Write the following in your script:

# coat colours
coat <- c("black", "white", "tabby", "ginger", "tortoiseshell", "calico")

The shortcut for <- is Alt+- (hold the Alt key down then hit the minus key ).

Ensure your cursor is on the line with the command and do Control+Enter to send the command to the console to be executed.

I have used a comment. Comment your code as much as possible!

🛝 Create a vector called freq containing the numbers of cats with each coat colour

Answer - don’t look until you have tried!
# numbers of cats with each coat colour
freq <- c(23, 15, 8, 10, 5, 1)

Packages

Commands like c() and sum() are in packages which are part the ‘base’ R system. A package is a collection of related commands. Base packages are installed automatically when you install R.

Other packages, such as ggplot2 (Wickham 2016) need to be installed once, and then loaded each session. ggplot2 is one of the tidyverse (Wickham et al. 2019) packages.

🎬 Load the tidyverse:

You will likely be warned of some function name conflicts but these will not be a problem for you.

Plotting the data with ggplot()

  • ggplot() takes a dataframe for an argument
  • We can make a dataframe of the two vectors, coat and freq using the data.frame() function.

🎬 Make a dataframe called coat_data

coat_data <- data.frame(coat, freq)

🎬 Click on coat_data in the Environment to open a spreadsheet-like view of it.

🎬 Create a simple barplot using ggplot like this:

ggplot(data = coat_data, aes(x = coat, y = freq)) +
  geom_col()

  • ggplot() alone creates a blank plot.
  • ggplot(data = coat_data) looks the same.
  • aes() gives the ‘Aesthetic mappings’. How variables (columns) are mapped to visual properties (aesthetics) e.g., axes, colour, shapes.
  • Thus ggplot(data = coat_data, aes(x = coat, y = freq)) produces a plot with axes
  • geom_col: A ‘Geom’ (Geometric object) gives the visual representations of the data: points, lines, bars, boxplots etc.

Note that ggplot2 is the name of the package and ggplot() is its most important command.

Using the help manual

The manual tells us about a function. ‘Arguments’ can be added to the geom_col() command inside the brackets.

Commands do something and their arguments (in brackets) and can specify:

  • what object to do it to
  • how exactly to do it

Many arguments have defaults so you don’t always need to supply them.

🎬 Open the manual page for geom_col() using:

?geom_col

The manual page has several sections.

  • Description an overview of what the command does
  • Usage lists arguments
    • form: argument name = default value
    • some arguments MUST be supplied others have defaults
    • ... means etc and includes arguments that can be passed to many ‘geoms’
  • Arguments gives the detail about the arguments
  • Details describes how the command works in more detail
  • Value gives the output of the command
  • Don’t be too perturbed by not fully understanding the information

Customising a plot

Bar colour

🎬 Change the fill of the bars using fill:

ggplot(data = coat_data, aes(x = coat, y = freq)) +
  geom_col(fill = "lightblue")

Colours can be given by their name, “lightblue” or code, “#ADD8E6”.

Look up by name or code

🎬 Change the bars to a colour you like.

fill is one of the arguments covered by .... fill is an ‘aesthetic’. If you look for ... in the list of arguments you will see it says:

Other arguments passed on to layer(). These are often aesthetics, used to set an aesthetic to a fixed value, like colour = “red” or size = 3. They may also be parameters to the paired geom/stat.

We just set the fill aesthetic to a fixed value.

Further down the manual, there is a section on Aesthetics which lists those understood by geom_col()

We can set (map) the fill aesthetic to a fixed colour inside geom_col() or map it to a variable from the dataframe inside the aes() instead. This means the colour will be different for different values in that variable.

🎬 Map the fill aesthetic to the coat variable:

ggplot(data = coat_data, aes(x = coat, y = freq, fill = coat)) +
  geom_col()

Note that we have taken fill = "lightblue" out of the geom_col() and instead put fill = coat in the aes().

🛝 Use the manual to put the bars next to each other. Look for the argument that will mean there is no space between the bars.

Answer - don’t look until you have tried!
ggplot(data = coat_data, aes(x = coat, y = freq)) +
  geom_col(fill = "lightblue", width = 1)

🛝 Use the manual to change the colour of the lines around each bar to black.

Answer - don’t look until you have tried!
ggplot(data = coat_data,
       aes(x = coat, y = freq)) +
  geom_col(fill = "lightblue", 
           width = 1, 
           colour = "black")

Changing the axes

We can make changes to the axes using:

ggplot automatically extends the axes slightly. You can turn this behaviour off with the expand argument in scale_x_discrete() and scale_y_continuous().1

🎬 To remove the gap between the axes and the data:

ggplot(data = coat_data, aes(x = coat, y = freq)) +
  geom_col(fill = "lightblue", 
           width = 1, 
           colour = "black") +
  scale_x_discrete(expand = c(0, 0)) + 
  scale_y_continuous(expand = c(0, 0)) 

Each ‘layer’ is added to the ggplot() command with a +

Top Tip

Make your code easier to read by using white space and new lines

  • put spaces around = , -> and after ,
  • use a newline after every comma in a command with lots of arguments

🛝 Look up scale_x_discrete in the manual and work out how to change the axis title from “coat” to “Coat colour”. Also change the y-axis title.

Answer - don’t look until you have tried!
ggplot(data = coat_data, aes(x = coat, y = freq)) +
  geom_col(fill = "lightblue", 
           width = 1, 
           colour = "black") +
  scale_x_discrete(expand = c(0, 0),
                   name = "Coat colour") + 
  scale_y_continuous(expand = c(0, 0),
                     name = "Number of cats") 

🎬 I would prefer to see the y-axis extend a little beyond the data and we can change the axis “limits” in the scale_y_continuous()

ggplot(data = coat_data, aes(x = coat, y = freq)) +
  geom_col(fill = "lightblue", 
           width = 1, 
           colour = "black") +
  scale_x_discrete(expand = c(0, 0),
                   name = "Coat colour") + 
  scale_y_continuous(expand = c(0, 0),
                     name = "Number of cats",
                     limits = c(0, 25)) 

Getting rid of the grey background

The grey grid background is useful for examining plots on a screen but for a report or publication you will want a more scientific style. Every aspect of the “theme” of a plot - the non-data elements such as fonts, background colours, axis line colours etc - can be controlled individually2 but there are some handy built in themes that apply several changes at once. One of these is theme_classic()

🎬 Add theme_classic() to the plot:

ggplot(data = coat_data, aes(x = coat, y = freq)) +
  geom_col(width = 1, 
           colour = "black",
           fill = "lightblue") +
  scale_x_discrete(expand = c(0, 0),
                   name = "Coat colour") + 
  scale_y_continuous(expand = c(0, 0),
                     name = "Number of cats",
                     limits = c(0, 25)) +
  theme_classic()

Importing data from files

Typing in data is not practical when you have a lot of it. Far more commonly, we import data from a file into R. This requires you know two pieces of information.

  1. What format the data are in

    The format of the data determines what function you will use to import. The file extension often indicates format.

    • .txt a plain text file3, where the columns are often separated by a space but might also be separated by a tab, a backslash or forward slash, or some other character
    • .csv a plain text file where the columns are separated by commas
    • .xlsx an Excel file
  2. Where the file is relative to your working directory

    R can only read in a file if you say where it is, i.e., you give its relative path.

We will first save our file for this workshop to our Project folder and read them in. Then we will then create a new folder inside our Project folder called data-raw, move the data file into it and read them in from there. This will allow you to see how the file paths need to be modified when a file is not in your working directory.

The goal

Getting set up

  • 🎬 In RStudio do File | New project | New directory Be purposeful about where you create it and name it. I suggest cats-2

  • 🎬 Make a new script and save it as import-data-and-summarise-plot.R.

  • 🎬 Save this file to your cats-2 folder: The coat colour and mass of 62 cats - cat-coats.csv

Learning about working directories and paths

Reading in data from the Project folder cats-2

CSV files can be read read in with core tidyverse (Wickham et al. 2019) functions.

🎬 Load the packages

A .csv. extension suggests this is plain text file with comma separated columns. However, before we attempt to read it it, we should take a look at it. We can do this from RStudio

🎬 Go to the Files pane (bottom right), click on the cat-coats.csv file and choose View File4

🎬 Read in the csv file with:

cats <- read_csv("cat-coats.csv")

The data from the file will be read into a dataframe called cats and you will be able to see it in the Environment.

Reading in data from a different folder

To help you understand relative file paths, we will now move the data file.

🎬 First remove the dataframes you just created to make it easier to see whether you can successfully read in the files from a different place:

rm(cats)

🎬 Now make a new folder called data-raw. You can do this on the Files Pane by clicking New Folder and typing into the box that appears.

🎬 Check the boxes next to the file names and choose More | Move… and select the data-raw folder.

The file will move. To import data from a file in the data-raw folder, you need to give the relative path to the file from the working directory. The working directory is the Project folder, cats-2 so the relative path is data-raw/cat-coats.csv

🎬 Import the cat-coats.csv data like this:

cats <- read_csv("data-raw/cat-coats.csv")

Creating a boxplot

🎬 Create a simple barplot using ggplot like this:

ggplot(data = cats, aes(x = coat, y = mass)) +
  geom_boxplot()

🛝 Can you format the boxplot as shown above?

Answer - don’t look until you have tried!
ggplot(data = cats, aes(x = coat, y = mass)) +
  geom_boxplot() +
  scale_y_continuous(name = "Mass (g)", 
                     limits = c(0, 10), 
                     expand = c(0, 0)) +
  scale_x_discrete(name = "Coat colour") +
  theme_classic()

Summary

🔧 RStudio Interface

  • Script – Where you write and save your code (.R or .qmd files)
  • Console – Where commands are executed and immediate output appears
  • Environment – Lists your objects and data
  • Files / Plots / Help – Where you can manage files, view plots, read help docs

🧠 Core ideas

  • Assignment – Use <- to store values: x <- 5
  • Vectors – One-dimensional data: c(1, 2, 3)
  • Dataframes – Table-like structures that are most widely used: data.frame(name = c("A", "B"))
  • Functions – Commands that perform actions: mean(x), ggplot()
  • Data import functions read data from files to dataframes for analysis
  • Packages like tidyverse are loaded with library()
  • Always use RStudio Projects and paths relative to the project folder so your code is reproducible and portable

📊 Plotting with ggplot2

Pages made with R (R Core Team 2025), Quarto (Allaire et al. 2024), knitr (Xie 2024, 2015, 2014), kableExtra (Zhu 2024)

References

Allaire, J. J., Charles Teague, Carlos Scheidegger, Yihui Xie, Christophe Dervieux, and Gordon Woodhull. 2024. Quarto.” https://doi.org/10.5281/zenodo.5960048.
R Core Team. 2025. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Wickham, Hadley. 2016. “Ggplot2: Elegant Graphics for Data Analysis.” https://ggplot2.tidyverse.org.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the Tidyverse 4: 1686. https://doi.org/10.21105/joss.01686.
Xie, Yihui. 2014. “Knitr: A Comprehensive Tool for Reproducible Research in R.” In Implementing Reproducible Computational Research, edited by Victoria Stodden, Friedrich Leisch, and Roger D. Peng. Chapman; Hall/CRC.
———. 2015. Dynamic Documents with R and Knitr. 2nd ed. Boca Raton, Florida: Chapman; Hall/CRC. https://yihui.org/knitr/.
———. 2024. Knitr: A General-Purpose Package for Dynamic Report Generation in r. https://yihui.org/knitr/.
Zhu, Hao. 2024. kableExtra: Construct Complex Table with ’Kable’ and Pipe Syntax. https://CRAN.R-project.org/package=kableExtra.

Footnotes

  1. There are also scale_x_continous() and scale_y_discrete() functions when you have those types of variable↩︎

  2. Modify components of a theme↩︎

  3. Plain text files can be opened in notepad or other similar editor and still be readable.↩︎

  4. Do not be tempted to import data this way. Unless you are careful, your data import will not be scripted or will not be scripted correctly.↩︎