Intro to Python for data analytics¶
Intro and Objectives¶
Python is a full-featured programming language used for all kinds of things. Our focus will be on learning Python from the perspective of doing data analytics work. Python has several advantages for data science work:
easy to learn
tons of data science related libraries
many Python data science tutorials
Python, being a full featured language, is much more powerful than R for many of the data science related tasks you’ll face
it is widely used
In this first part of the intro we’ll learn about fundamental Python programming concepts such as:
Jupyter notebooks and Jupyter lab
variables, numpy, math, peek at plotting
looping and conditional logic
lists and dictionaries
reading files
We’ll start learning to use the Spyder IDE as we start to create more complex programs.
I’m assuming that if you are in this class, you have some familiarity with computer programming. So, this is a “whirlwind” intro to Python. The syntax is really quite easy to learn and I focus instead on some of the concepts and data structures that those of you that just have a VBA background might not be familiar with such as lists, dictionaries, sequential reading of files. The Jupyter notebooks upon which this session is based are for you to use as a starting point for further exploration and learning.
Readings¶
- Whirlwind Tour of Python (WToP) - pages 1-40 - available as free pdf. You can find the associated Jupyter notebooks in theDownloads folder for this session.
Whirlwind Tour of Python notebooks as Kaggle Kernels - JVPs WToP free book has this set of notebooks to accompany it. Kaggle has distributed them as Kernals.
StackOverflow is THE number one Q&A site for all things programming. There are tags for every conceivable programming language. It is essential that you learn how to ask good questions on sites like this or when asking questions of me or in our Help Forum.
An xkcd comic on error messages in programs. Remember to do the mouse-over after reading the main comic.
Downloads¶
Inside the Download file, in addition to the files needed for this session, you’ll also find a folder with all of the Jupyter notebooks for the Whirlwind Tour of Python book listed above.
Activities¶
Note
Using Jupyter notebooks with conda virtual environments
is an evolving thing. As of now (Fall 2024), and described at this part of our software page, I am using a dedicated conda virtual env named jupyter
for launching Jupyter Lab.
$ conda activate jupyter
$ jupyter lab
Then, within Jupyter Lab, I’m changing the notebook kernel to our datasci
conda virtual env.
But, as described at the link above, if you end up launching Jupyter Lab from the base conda env, you’ll be fine.
We will begin with an overview of Python and its use in data analytics. Then we’ll start to learn Python in the context of data analysis, by working through a number of Jupyter notebooks together. While working through the notebooks, the topic of Conda virtual environments will come up. Conda is the package and environment management system for the Anaconda Python distro. Here is a nice newbie intro to Conda virtual environments.
If you end up using Anaconda from Windows or Mac, eventually you’ll learn about creating virtual environments (see the pcda VM page). Until then, things will work just fine in the base environment. You may have to install a few libraries but that’s no big deal.
Intro to Python
see the
intro_to_python_for_business_analytics.html
file in the Downloads folderSCREENCAST: Intro to Python (23:47)
jupyter notebook vs jupyter lab
we will be using JupyterLab which is the latest evolution in the Jupyter Notebook ecosystem
here’s a nice overview post about this evolution
it is important to note that the underlying Jupyter notebooks themselves have not changed. What has changed is the interface for working with them.
SCREENCAST: The relationship between Jupyter Lab and Jupyter Notebook (9:24)
01-basics-lookahead-pcda.ipynb
variables, numpy, math, peek at plotting
02-loop-conditionals-pcda.ipynb
repeating actions
03-lists-pcda.ipynb
flexible data storage, indexing and slicing
SCREENCAST: Lists (10:11)
04-intro-dictionaries-readingfiles-pcda.ipynb
another storage container, more on reading data files
SCREENCAST: Dictionaries (8:40)
SCREENCAST: File reading (17:25)
I didn’t make screencasts for these last two. There are very short. Just explore them.
05-file-globbing-pcda.ipynb
processing a bunch of data files by globbing
06-more-on-conditions-pcda.ipynb
if-then-elif-else logic
Explore¶
The next billion programmers - Interesting musings on Excel and Python
The MBA Data Science Toolkit: 8 resources to go from spreadsheet to command line -
Data Carpentry - Was a sibling organization to Software Carpentry, focused on helping people build better data science skills. However, recently, the two have merged and will be The Carpentries.