Intro to R and R Studio¶
Intro and Objectives¶
In this module we will:
get an overview of R and R Studio,
do basic interactive R tasks,
learn about R data types, installing packages, reading data into and working with dataframes,
learn how to create and use basic R scripts to document a series of analysis steps,
write our own “tutorials” or “learning guides” using the magic of R Markdown documents along with the knitr package.
Readings¶
R for Everyone (RfE) - Chapters 1-6
Practical Data Science with R (PDSwR) - Chapter 2
See the Additional Resources section below, especially the stuff on asking good questions
Downloads¶
R Markdown cheat sheet (R Studio)
Activities¶
We will work through a series of R scripts and R Markdown documents as we start to learn the basics of R and R Studio. In the Downloads file you’ll find both “shell” versions of files that we’ll fill in in class as well as fully completed versions you can refer to later.
As of 2022-09-11, all of the screencasts and files have been updated for R 4.x. See the note below related to the part on reading CSV files into R for more details.
As of 2024-11-03, I have started replacing R Markdown documents with Quarto documents and updating the screencasts to reflect this change. Quarto is the future and it’s time to move to it.
- Overview of R
see pdf slides
SCREENCAST: Overview of R (17:11)
- HelloWorld.R
a basic R script
learn how to run commands.
SCREENCAST: HelloWorld.R (8:34)
- interactive1_shell.qmd
R Studio interface
code chunks
basics of markdown
- vectors_shell.qmd
a fundamental data structure in R
start to learn to think in vector terms
SCREENCAST: R Vectors (21:17)
- dataframes_shell.qmd
the primary data structure in R
like a table in a database or a spreadsheet
SCREENCAST: R Dataframes (10:27)
- readcsv_shell.qmd
getting data into R from text files
NOTE: The default for
stringsAsFactors
was changed to FALSE in R 4.0. Strings as factors as FALSE for the default is particularly nice when you are reading in some table that has string fields with a gagillion unique values that you would never use as a factor. You’ll see that in many vids we do some string to factor work withas.factor()
. For me personally, I don’t have a strong opinion as to whether the default should be TRUE or FALSE, just that it’s important to understand what the default is if you are usingread.csv
orread_csv
from the readr package. Then you just adapt on the fly and set it to whatever makes the most sense for the specific data set you are reading.also getting data from databases
This last Rmd file has some additional details on datetime conversions.
chardate_POSIXct_conversion.Rmd
dates and times can be tricky
converting between character and datetime data types
Additional Resources¶
Jenny Bryan developed and taught R related courses at the University of British Columbia. In particular, she was the driving force behind a course called STAT 545 and an online textbook. Now she works at Posit on Hadley Wickham’s team, but her course lives on. Many of the resources she created for STAT 545 are still available and will continue to evolve. I highly recommend them as they really get at the heart of effectively using R to do analysis. It also includes invaluable practical R related information that is rarely found all in one place. Highly recommended.
Asking good questions¶
One particularly helpful page Jenny Bryan created in the old 545 course was one on “How to get unstuck”. Unfortunately, while the new course website includes a link to it, the link is broken. Using the magical wayback machine I was able to locate an archived copy which I’m including here until the link gets fixed. One of the resources she links to in her “unstuck” page is a classic document entitled “How to Ask Questions the Smart Way” by Eric Raymond and Rick Moen. It is not for the easily offended, but as Jenny says, they “speak truth”.
How to write a great online question - great advice from Kevin Markham at Data School
Reproducible examples (reprex)¶
A big part of asking good questions is creating something known as a minimal reproducible example, or reprex for short. When people post questions to places like StackOverflow or a GitHub Issue they are strongly encouraged (in SO it’s almost a must) to include a reprex. There’s a very good FAQ entry in the R Studio Community Forum that discusses the reprex in both general terms and provides links the R reprex package which makes it easy to create a reprex that can be pasted from the clipboard into a question or forum.
DataCamp¶
There’s an Intro to R course at DataCamp that covers much of what we do in this first session. There are many R courses available on DataCamp.
Explore¶
R-bloggers - Aggregation site for R related blogs
Simply Statistics - Roger Peng and two other biostats guys from Johns Hopkins blog on data science and R. Peng has a super popular online R course through Coursera and these folks have launched a several course series on data science in R on Coursera.
Introducing R to a non-programmer in one hour - Just what it says.