Textbooks¶
I’ve been teaching this course since 2016. Since then, a growing number of really good, free, web based books and other resources have been contributed by the R and Python ecosystem. At this point, there is no need for me to require any non-free textbooks for the course.
At the bottom of this page I’ll list the books I’ve used at some point in the past - but you do NOT need them.
Required texts for Winter 2025¶
All of the texts have free versions available online. All these books also have official websites from which you can buy print, PDF, or eBooks. Of course, you can also find them at numerous places on the web. I’ve listed approximate pricing from checking a few of the online booksellers.
An Introduction to R (Alex Douglas, Deon Roos, Francesca Mancini, Ana Couto & David Lusseau) (I2R)
This online book also has a companion website with exercises, more tutorials and other resources.
Introduction to Statistical Learning (with Applications in R) (Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani) (ISLR)
This is a text that does a great job of explaining the main statistical learning techniques at an accessible mathematical level. You can download a free PDF from the link above. They’ve even created a whole set of video lectures accompanying the book - https://www.r-bloggers.com/in-depth-introduction-to-machine-learning-in-15-hours-of-expert-videos/.
ISLR Tidymodels lab (Emil Hvitfeldt )
This is a new addition to the ISLR family. It’s a terrific online text that translates all of the examples in ISLR to use tidymodels.
R for Data Science (2e) (Wickham, Cetinkaya-Rundel and Grolemund) (R4DS)
The second edition of this book was released in summer of 2023. It’s a pretty big overhaul and it’s freely available online. This is a newly released book by one of the giants in the R community. Hadley Wickham has created some of the most widely used R packages and has had a tremendous influence on the use of R for data science.
A Whirlwind Tour of Python (Jake VanderPlas) (WToP)
One of JVP’s contributions is this very nice, concise, intro to programming in Python.
Python Data Science Handbook (Jake VanderPlas) (PDSH)
There is a free online edition at https://jakevdp.github.io/PythonDataScienceHandbook/.
You can also purchase a paperback copy from O’Reilly for ~$45
This is another newish book and is written by a scientist who has been a big contributor to the Python data science world. This books covers all the main essentials for doing data science work in Python.
More good books (NOT required)¶
R for Everyone (2nd Edition) (Jared Lander) (RforE)
~$33 new, less for used (IMPORTANT: Make sure you get the Second Edition.)
This provides an accessible, modern and thorough introduction to the world of the R statistical computing platform. I’ve used this book the past three years.
Practical Data Science with R (2ed) (Nina Zumel and John Mount) (PDSwR)
~$40 new, less for used
This is a newish book (2014 with 2ed just out a few years ago) that does just what the title suggests. It is structured around typical business analytics or data science projects and covers the main statistical learning techniques along with tons of practical advice on doing data science projects.
Python for Data Analysis (Wes McKinney)
Free online but print also available
This is a somewhat more advanced book on using Python for data analysis. It was written by the developer of the hugely popular Python package, pandas. In addition to a thorough coverage of pandas, it covers numpy, IPython, and even an intro to the Python language. This is the 3rd edition which just came out in 2022.
A few years we used the following book along with RforE and PDSwR. It’s not required for the class this year but I do highly recommend it for those interested in more advanced web scraping and other data wrangling tasks. See the description below.
Some older books¶
[DWwP] Data Wrangling with Python - http://shop.oreilly.com/product/0636920032861.do Jacqueline Kazil & Katharine Jarmul
~$30 new + ebook, less for used
Finally, a problem driven book that introduces the Python language as it’s needed to solve these problems. Tons of practical advice and written in a style that matches how this work is really done - lots of trying stuff and partially succeeding and then trying other stuff … (repeat till happy). I believe this is a great way to learn to be an effective programmer and both get useful things done and have fun while doing it.
[DDS] Doing Data Science: Straight Talk from the Frontline - http://shop.oreilly.com/product/0636920028529.do
Cathy O’Neil & Rachel Schutt ~$25
This book is more a collection of chapters written by the authors and various data science practitioners. It’s very readable and full of insights on the practice of data science.
[PCfB] Practical Computing for Biologists - http://practicalcomputing.org/
Steve Haddock & Casey Dunn ~$40-55
I fell in love with this book immediately and found myself wishing that someone would write a similar book for business. It is aimed at scientists who realize that they need to get better at computing to deal with all the data they need to process and analyze. That sounds like many business analysts. It’s Mac and Linux based and is crammed full of useful information on text files, using the command line, regular expressions, shell scripts, Python programming, dealing with image files, relational databases and even working with physical data collection devices.