EDA with R

Intro and Objectives

We will begin to do exploratory data analysis in R. After completing the activities in this module, you should be able to explore a dataset using:

  • descriptive statistics,

  • simple R scripts including writing your own functions,

  • basic (and not so basic) plots with ggplot2.

We are going to explore a dataset related to New York City condo evaluations for fiscal year 2011-2012. It was obtained from the NYC Open Data initiative - https://data.cityofnewyork.us/.

Readings

  • I2R - Chapters 4-7

  • R4DS - Chapters 1-2

Downloads and other resources

Other Resources:

Activities

We will work through two tutorials on EDA (with a short detour on creating user defined functions in R)

Summary statistics

R makes it easy to compute summary statistics. We will also see how to create R Projects to help you organize your R work.

Writing your own functions

We will do a brief introduction to writing functions in R.

Note

The video above makes a few references to the “R for Everyone” text that

we are not longer using. Instead, see Chapter 7 of the ` An Introduction to R <https://intro2r.com/>`_ online textbook.

Plots and graphs

Now we are going to see an area where R really shines - plotting.

Explore (OPTIONAL)

Data visualization

R Markdown

Percentiles

It’s easy to get enamored with averages. They don’t tell the whole story. Look at percentiles, too.