Intro to R and R Studio

In this module we will:

  • get an overview of R and R Studio,

  • do basic interactive R tasks,

  • learn about R data types, installing packages, reading data into and working with dataframes,

  • learn how to create and use basic R scripts to document a series of analysis steps,

  • write our own “tutorials” or “learning guides” using the magic of R Markdown documents along with the knitr package.

Readings

  • Intro to R (I2R) - Chapters 1-3

  • See the Additional Resources section below, especially the stuff on asking good questions

Downloads

Activities

We will work through a series of R scripts and R Markdown documents as we start to learn the basics of R and R Studio. In the Downloads file you’ll find both “shell” versions of files that we’ll fill in in class as well as fully completed versions you can refer to later.

Note

As of 2024-11-03, I have started replacing R Markdown documents with Quarto documents and updating the screencasts to reflect this change. Quarto is the future and it’s time to move to it.

Overview of R

I’ll use the slides for structure.

The obligatory hello world example

We’ll see a basic R script and learn how to run commands from it.

R Studio and Quarto

Time to get familiar with working in R Studio with Quarto markdown documents. We’ll go over:

  • R Studio interface

  • code chunks

  • basics of markdown

We’ll use:

Vectors

Vectors are a fundamental data structure in R. We need to start thinking in vectors.

Dataframes

Dataframes are the primary data structure in R. You can think of them kind of like a table in a database or a spreadsheet.

Reading text files into R dataframes

This is a fundamental R task and we’ll get our first look at it here. We will also see an example of bringing data in R from a database.

Note

The default for stringsAsFactors was changed to FALSE in R 4.0. Strings as factors as FALSE for the default is particularly nice when you are reading in some table that has string fields with a gagillion unique values that you would never use as a factor. You’ll see that in many vids we do some string to factor work with as.factor(). For me personally, I don’t have a strong opinion as to whether the default should be TRUE or FALSE, just that it’s important to understand what the default is if you are using read.csv or read_csv from the readr package. Then you just adapt on the fly and set it to whatever makes the most sense for the specific data set you are reading.

Datetime conversions

This last Rmd file has some additional details on datetime conversions. Dates and times can be tricky and we’ll learn about converting between character and datetime data types

  • File: chardate_POSIXct_conversion.Rmd

Additional Resources

Jenny Bryan developed and taught R related courses at the University of British Columbia. In particular, she was the driving force behind a course called STAT 545 and an online textbook. Now she works at Posit on Hadley Wickham’s team, but her course lives on. Many of the resources she created for STAT 545 are still available and will continue to evolve. I highly recommend them as they really get at the heart of effectively using R to do analysis. It also includes invaluable practical R related information that is rarely found all in one place. Highly recommended.

Asking good questions

One particularly helpful page Jenny Bryan created in the old 545 course was one on “How to get unstuck”. Unfortunately, while the new course website includes a link to it, the link is broken. Using the magical wayback machine I was able to locate an archived copy which I’m including here until the link gets fixed. One of the resources she links to in her “unstuck” page is a classic document entitled “How to Ask Questions the Smart Way” by Eric Raymond and Rick Moen. It is not for the easily offended, but as Jenny says, they “speak truth”.

Reproducible examples (reprex)

A big part of asking good questions is creating something known as a minimal reproducible example, or reprex for short. When people post questions to places like StackOverflow or a GitHub Issue they are strongly encouraged (in SO it’s almost a must) to include a reprex. There’s a very good FAQ entry in the R Studio Community Forum that discusses the reprex in both general terms and provides links the R reprex package which makes it easy to create a reprex that can be pasted from the clipboard into a question or forum.

DataCamp

There’s an Intro to R course at DataCamp that covers much of what we do in this first session. There are many R courses available on DataCamp.

Explore