Intro to R and R Studio

Intro and Objectives

In this module we will:

  • get an overview of R and R Studio,

  • do basic interactive R tasks,

  • learn about R data types, installing packages, reading data into and working with dataframes,

  • learn how to create and use basic R scripts to document a series of analysis steps,

  • write our own “tutorials” or “learning guides” using the magic of R Markdown documents along with the knitr package.

Readings

  • R for Everyone (RfE) - Chapters 1-6

  • Practical Data Science with R (PDSwR) - Chapter 2

  • See the Additional Resources section below, especially the stuff on asking good questions

Downloads

Activities

We will work through a series of R scripts and R Markdown documents as we start to learn the basics of R and R Studio. In the Downloads file you’ll find both “shell” versions of files that we’ll fill in in class as well as fully completed versions you can refer to later.

As of 2022-09-11, all of the screencasts and files have been updated for R 4.x. See the note below related to the part on reading CSV files into R for more details.

As of 2024-11-03, I have started replacing R Markdown documents with Quarto documents and updating the screencasts to reflect this change. Quarto is the future and it’s time to move to it.

  • Overview of R
  • HelloWorld.R
  • interactive1_shell.qmd
  • vectors_shell.qmd
    • a fundamental data structure in R

    • start to learn to think in vector terms

    • SCREENCAST: R Vectors (21:17)

  • dataframes_shell.qmd
  • readcsv_shell.qmd
    • getting data into R from text files

    • NOTE: The default for stringsAsFactors was changed to FALSE in R 4.0. Strings as factors as FALSE for the default is particularly nice when you are reading in some table that has string fields with a gagillion unique values that you would never use as a factor. You’ll see that in many vids we do some string to factor work with as.factor(). For me personally, I don’t have a strong opinion as to whether the default should be TRUE or FALSE, just that it’s important to understand what the default is if you are using read.csv or read_csv from the readr package. Then you just adapt on the fly and set it to whatever makes the most sense for the specific data set you are reading.

    • also getting data from databases

    • SCREENCAST: Intro to Getting Data into R (14:28)

This last Rmd file has some additional details on datetime conversions.

  • chardate_POSIXct_conversion.Rmd

    • dates and times can be tricky

    • converting between character and datetime data types

Additional Resources

Jenny Bryan developed and taught R related courses at the University of British Columbia. In particular, she was the driving force behind a course called STAT 545 and an online textbook. Now she works at Posit on Hadley Wickham’s team, but her course lives on. Many of the resources she created for STAT 545 are still available and will continue to evolve. I highly recommend them as they really get at the heart of effectively using R to do analysis. It also includes invaluable practical R related information that is rarely found all in one place. Highly recommended.

Asking good questions

One particularly helpful page Jenny Bryan created in the old 545 course was one on “How to get unstuck”. Unfortunately, while the new course website includes a link to it, the link is broken. Using the magical wayback machine I was able to locate an archived copy which I’m including here until the link gets fixed. One of the resources she links to in her “unstuck” page is a classic document entitled “How to Ask Questions the Smart Way” by Eric Raymond and Rick Moen. It is not for the easily offended, but as Jenny says, they “speak truth”.

Reproducible examples (reprex)

A big part of asking good questions is creating something known as a minimal reproducible example, or reprex for short. When people post questions to places like StackOverflow or a GitHub Issue they are strongly encouraged (in SO it’s almost a must) to include a reprex. There’s a very good FAQ entry in the R Studio Community Forum that discusses the reprex in both general terms and provides links the R reprex package which makes it easy to create a reprex that can be pasted from the clipboard into a question or forum.

DataCamp

There’s an Intro to R course at DataCamp that covers much of what we do in this first session. There are many R courses available on DataCamp.

Explore