EDA with R

Note

The notes for this session have been updated to use Quarto instead of R Markdown. You can find the old version here.

Intro and Objectives

We will begin to do exploratory data analysis in R. After completing the activities in this module, you should be able to explore a dataset using:

  • descriptive statistics,

  • simple R scripts including writing your own functions,

  • basic (and not so basic) plots with ggplot2.

We are going to explore a dataset related to New York City condo evaluations for fiscal year 2011-2012. It was obtained from the NYC Open Data initiative - https://data.cityofnewyork.us/.

Readings

  • I2R - Chapters 4-7

  • R4DS - Chapters 1-2

Downloads and other resources

Other Resources:

Activities

We will work through two tutorials on EDA (with a short detour on creating user defined functions in R)

Summary statistics

R makes it easy to compute summary statistics. We will also see how to create R Projects to help you organize your R work.

Writing your own functions

We will do a brief introduction to writing functions in R.

Note

The video above makes a few references to the “R for Everyone” text that

we are not longer using. Instead, see Chapter 7 of the ` An Introduction to R <https://intro2r.com/>`_ online textbook.

Plots and graphs

Now we are going to see an area where R really shines - plotting.

Explore (OPTIONAL)

Data visualization

R Markdown

Percentiles

It’s easy to get enamored with averages. They don’t tell the whole story. Look at percentiles, too.