# Resource Center¶

Here a links to a variety of data science and business analytics related resources.

## General business analytics, data science, statistical modeling¶

Analytics Magazine It’s published by INFORMS - the Institute for Operations Research and Management Science. They are the premier professional society for analytics and it’s inexpensive to join as a student. Full disclosure, I’ve been an INFORMS member since about 1986 (when it was still ORSA/TIMS). We were doing analytics before it was called analytics. :)

Data science - A first introduction - nice online and free text that gets you going with data science using R and the tidyverse by a collection of faculty at the University of British Columbia

Awesome Data Science - giant curated list of data science resources

Statistical Modeling, Causal Inference, and Social Science - Andrew Gelman’s very high quality blog. Lots of stuff on doing stats correctly.

Statistical Modeling: The Two Cultures - paper by L. Breiman (2001) that gets at the heart of the statistics vs ML tension.

Open data science curriculum - collection of free courses on various data science, math and stat topics

## Careers¶

- There are numerous groups on Reddit related to analytics and data science. These can be very good resources for unvarnished conversations/opinions about careers, grad school as well as technical advice.

December 2022: Data Science Career Resources and Essay on data science hiring process by Eric Ma who writes the Data Science Programming Newsletter on Substack.

## Programming tutorial hubs¶

Software Carpentry and Data Carpentry - helping scientists learn to do computational work with R, Python, SQL and other tools

## Online courses¶

There are numerous online courses available through DataCamp, Coursera, EdX, Udemy and others. Here’s a few Python and R ones I’ve checked out over the years.

Intro to Data Science in Python - I did this short course in Feb 2017 (Coursera UMich). Great fun. If you want a good pandas/python learning challenge, try the assignments.

Python for Everybody course - This site includes a bunch of videos and supplementary files. The whole thing was created by a professor at University of Michigan and is meant to be a totally open set of freely available learning materials for Python in the context of data analysis.

Coursera has some well regarded R based data science courses

## Learning the command line¶

As you’ve no doubt gathered from this class, I’m a big fan of using the command line for certain data related tasks and think that command line skills are really important. There’s a new 2e of the O’Reilly book “Data science at the command line”. The second edition is FREELY available online.

https://datascienceatthecommandline.com/ - home page for the book

https://datascienceatthecommandline.com/2e/ - the free 2nd edition

Chapter 1 is a great overview of why you should become adept at using the command line.

## Learning R¶

### Online R tutorials, books and examples for getting started¶

R-bloggers- The aggregator for R related blogs.

R for Data Science - Free, online version of the book,

**R for Data Science**by Hadley Wickham and Garrett Grolemund.Quick-R - This is a great site dedicated to helping R newbies get over the somewhat steep R learning curve.

fasteR: The fast lane to learning R - created by Norm Matloff who is a big proponent of learning base R first before doing things with tidyverse packages.

STAT 545 - Data wrangling, exploration, and analysis with R - Jenny Bryan’s course developed at UBC and still used even though JB has moved on to R Studio. Not only does this cover R, but also gets into things like version control, web scraping and Shiny.

Cookbook for R - Another great site for learning R. In their words: “The goal of the cookbook is to provide solutions to common tasks and problems in analyzing data.”

Webinars from R Studio- The creators of the hugely popular R Studio package have a ton of learning resources on their site.

The Official R Manuals - These are accessible from the main R Project page in the Documentation section.

Contributed Documentation - Many people have written tutorials, books, and other free documentation for various aspects of R. This is part of the magic of R community.

Introducing R to a non-programmer in one hour - Just what it says.

Teach yourself Shiny- A somewhat recent development by the folks at R Studio is something called a Shiny web app. Learn to create interactive, R driven, web apps!

### The base R vs tidyverse debate¶

The tidyverse has become increasingly popular and with this popularity has come more scrutiny. In particular, there’s a healthy debate on whether new R users should first learn base R and then move on to the tidyverse or whether they should immediately be taught the tidyverse approach. It really isn’t an either-or question and in this course you will both base R and tidyverse approaches. I do start with base R because I think you need a good understanding of things like vectors to make the most of the R language. At the end of the day, we use R to solve problems and the more tools you have to tackle those problems, the better off you will be. A few good resources on this debate include the following.

The TidyverseSkeptic project by Norm Matloff (his fasteR project is described above) is a well known essay on why new R learners should be taught base R first. Check out the Issues for some heated discussion.

David Robinson argues the tidyverse first side in posts such as http://varianceexplained.org/r/teach-tidyverse/ and http://varianceexplained.org/r/why-I-use-ggplot2/. (yes, technically ggplot2 predates the tidyverse).

Data Carpentry has a post on base R and tidy equivalents

Caret vs tidymodels also (kinda) falls into this debate - see On not using tidymodels, and Caret vs tidymodels: the old and the new, and this Reddit post.

One criticism of the tidyverse is that it can lead to dependency bloat - learn more from this essay about the “tinyverse”.

I did a short blog post on base vs tidy

There’s no doubt that ggplot is awesome, but check out what can be done if you have a good grasp of base plotting in R. When I read this, it felt a bit like matplotlib, the venerable Python based plotting package.

There are a few Reddit threads that address this topic including this one and this other one

### Packages¶

The R ecosystem relies on high quality packages and its community of package developers. Here are some collections of package descriptions and links.

RStartHere- A very comprehensive and well organized list of packages for doing data science in R.

Awesome R- Curated list of R packages by category (IDE, data manipulation, etc.)

## Learning Python¶

### Online Python tutorials, books and examples for getting started¶

Software Carpentry - Lessons - Software Carpentry is one of my all time favorite resources for teaching and learning practical programming skills. This link takes you to their list of “Lessons” (really entire mini-courses). In addition to a lesson on Python, you’ll find lessons on tons of stuff that is useful for business analytics and data science. Highly, highly recommended.

Whirlwind Tour of Python - Jake VanderPlas - Free 100 page pdf and associated Jupyter notebooks for those who want to learn Python for data science use and have some prior knowledge of programming.

Python for Everybody - Charles Severance - This is a remixed, freely available, textbook on learning Python to do data analysis.

Think Python (Downey) - terrific book for newish Python learners

Automate the Boring Stuff with Python (Sweigart) - another really good free online book

Ted Petrou’s GitHub repos - I stumbled on this via LinkedIn. I went through his Jupyter notebooks in the Learn-Pandas repo and they were outstanding.

### Blogs and listservs¶

Practical Business Python - Super relevant blog for business students learning Python.

Pycoders Weekly - Weekly email newsletter. Always has interesting stuff and almost always something directly data science related.

### Libraries¶

Awesome Python - A curated list of awesome Python frameworks, libraries, software and resources

## Statistics¶

If you are rusty on statistics, there’s a really good OpenIntro Stats book available as a free online book or you can pay what you want for a paperback copy. It includes R based material.

You can also find high quality free online statistics courses through the Open Learning Initiative as well as places like Coursera and EdX.

Cross Validated is a great Q&A forum for all things statistics. Lots of R related content.

## Publicly available data¶

DrivenData Competitions - not suggesting you compete (you can) but these are a great source of high quality datasets. You’ll need to create a free account to be able to download data. I used this site as a motivation for my series of blog posts on algal bloom detection from satellite imagery.

Kaggle Datasets - need to create a free Kaggle account

Data is Plural - links to many interesting datasets

Modern plain text computing - this course has a list of practice data sources on the main page (check out tidy Tuesday)

Data.gov - US government data

Census.gov - US census data

Bureau of Transportation Statistics - tons of transportation rel

OpenML Datasets - site with many ML resources

cs109 Resources (2014) - Many links to datasets (as well as links to Python and misc data science stuff)

https://github.com/rstudio/RStartHere#data - From the RStartHere site

## Workflow and reproducible analysis¶

Modern plain text computing - a course by Kieran Healy

Data Science Workflow: Overview and Challenges - Blog post by Philip Guo who did his dissertation on this topic.

Cookiecutter Data Science - “A logical, reasonably standardized, but flexible project structure for doing and sharing data science work.”