Intro to R and R Studio¶
In this module we will:
get an overview of R and R Studio,
do basic interactive R tasks,
learn about R data types, installing packages, reading data into and working with dataframes,
learn how to create and use basic R scripts to document a series of analysis steps,
write our own “tutorials” or “learning guides” using the magic of R Markdown documents along with the knitr package.
Readings¶
Intro to R (I2R) - Chapters 1-3
See the Additional Resources section below, especially the stuff on asking good questions
Downloads¶
R Markdown cheat sheet (R Studio)
Activities¶
We will work through a series of R scripts and R Markdown documents as we start to learn the basics of R and R Studio. In the Downloads file you’ll find both “shell” versions of files that we’ll fill in in class as well as fully completed versions you can refer to later.
Note
As of 2024-11-03, I have started replacing R Markdown documents with Quarto documents and updating the screencasts to reflect this change. Quarto is the future and it’s time to move to it.
Overview of R¶
I’ll use the slides for structure.
SCREENCAST: Overview of R (17:11)
The obligatory hello world example¶
We’ll see a basic R script and learn how to run commands from it.
File: HelloWorld.R
SCREENCAST: HelloWorld.R (8:34)
R Studio and Quarto¶
Time to get familiar with working in R Studio with Quarto markdown documents. We’ll go over:
R Studio interface
code chunks
basics of markdown
We’ll use:
File: interactive1_shell.qmd
Vectors¶
Vectors are a fundamental data structure in R. We need to start thinking in vectors.
File: vectors_shell.qmd
SCREENCAST: R Vectors (21:17)
Dataframes¶
Dataframes are the primary data structure in R. You can think of them kind of like a table in a database or a spreadsheet.
File: dataframes_shell.qmd
SCREENCAST: R Dataframes (10:27)
Reading text files into R dataframes¶
This is a fundamental R task and we’ll get our first look at it here. We will also see an example of bringing data in R from a database.
File: readcsv_shell.qmd
Note
The default for
stringsAsFactors
was changed to FALSE in R 4.0. Strings as factors as FALSE for the default is particularly nice when you are reading in some table that has string fields with a gagillion unique values that you would never use as a factor. You’ll see that in many vids we do some string to factor work withas.factor()
. For me personally, I don’t have a strong opinion as to whether the default should be TRUE or FALSE, just that it’s important to understand what the default is if you are usingread.csv
orread_csv
from the readr package. Then you just adapt on the fly and set it to whatever makes the most sense for the specific data set you are reading.
Datetime conversions¶
This last Rmd file has some additional details on datetime conversions. Dates and times can be tricky and we’ll learn about converting between character and datetime data types
File: chardate_POSIXct_conversion.Rmd
Additional Resources¶
Jenny Bryan developed and taught R related courses at the University of British Columbia. In particular, she was the driving force behind a course called STAT 545 and an online textbook. Now she works at Posit on Hadley Wickham’s team, but her course lives on. Many of the resources she created for STAT 545 are still available and will continue to evolve. I highly recommend them as they really get at the heart of effectively using R to do analysis. It also includes invaluable practical R related information that is rarely found all in one place. Highly recommended.
Asking good questions¶
One particularly helpful page Jenny Bryan created in the old 545 course was one on “How to get unstuck”. Unfortunately, while the new course website includes a link to it, the link is broken. Using the magical wayback machine I was able to locate an archived copy which I’m including here until the link gets fixed. One of the resources she links to in her “unstuck” page is a classic document entitled “How to Ask Questions the Smart Way” by Eric Raymond and Rick Moen. It is not for the easily offended, but as Jenny says, they “speak truth”.
How to write a great online question - great advice from Kevin Markham at Data School
Reproducible examples (reprex)¶
A big part of asking good questions is creating something known as a minimal reproducible example, or reprex for short. When people post questions to places like StackOverflow or a GitHub Issue they are strongly encouraged (in SO it’s almost a must) to include a reprex. There’s a very good FAQ entry in the R Studio Community Forum that discusses the reprex in both general terms and provides links the R reprex package which makes it easy to create a reprex that can be pasted from the clipboard into a question or forum.
DataCamp¶
There’s an Intro to R course at DataCamp that covers much of what we do in this first session. There are many R courses available on DataCamp.
Explore¶
R-bloggers - Aggregation site for R related blogs
Simply Statistics - Roger Peng and two other biostats guys from Johns Hopkins blog on data science and R. Peng has a super popular online R course through Coursera and these folks have launched a several course series on data science in R on Coursera.
Introducing R to a non-programmer in one hour - Just what it says.