*********************************** Intro to PCDA course *********************************** Welcome to my Practical Computing for Data Analytics (PCDA) class. We'll do several things the first week of class: * overview of the field of business analytics / data science * course overview and logistics * get some hands on experience with some of the technology we'll use in the course * start to learn how to use the Linux shell for basic file management and putting together Linux commands to accomplish simple analytical tasks Objectives ==================== Through this module you will: * explore the syllabus and course web sites so that you know how this course will operate, * have had a preview of some of the types of things you'll learn and the activities you'll do in this course, * have begun to get hands on experience with some of the technical computing tools used in this course, * be ready to learn all kinds of cool business analytics things. Readings ======== We'll start using the Linux bash shell during the first week of class. So, might as well get going on learning the basics. For now, read Section 1 of the Software Carpentry tutorial entitled: `The Unix Shell `_. In Week 2 we'll be learning the things covered in Sections 1-4 so feel free to skim those if you want to get a head start. See the Explore section below for additional Linux shell related resources. Downloads ========= * `Download_Session01_Intro.tar.gz `_ * `Download_Session01_Intro.zip `_ - just a zip version of the same compressed archive There will always be one or more "Download" files for each class. It is a compressed archive containing all the files we'll need for the session. In the Windows world, this would usually be a ``.zip`` file. However, in the Linux world, we often use "gzipped tarballs" which will have a ``.tar.gz`` extension. We'll extract these in our Linux virtual machine (as part of our Week 1 intro), though you can certainly extract these files in Windows as well using the free utility 7-Zip. Activities ================================ .. note:: Our SBA web server has some issues that sometimes leadto problems loading our course webpages or my faculty home page. If this happens, you can usually fix theproblem by clearing your browser cache and reloading the page. Or, you can use one of the alternative links - `https://pcda.misken.org `_ or `https://mis5470.netlify.app `_. Overview of pcda class ---------------------- I'll present an overview of this class as well as the general topic of data science / business analytics. .. warning:: If you are using the VM, do **NOT** watch the screencasts from within the pcda VM. Watch them from a browser opened in your host OS (i.e. Windows or Mac). - `SCREENCAST: Overview of business analytics and data science `_ (15:03) Class logistics --------------- Between the "Course welcome video" and the "Week 1 Welcome Video" (both available via Moodle), all of the course logistics are covered. So, if you haven't watched these yet, please do so ASAP. Also, read the syllabus carefully (again, Moodle). Finally, review the first two Announcements I made in Moodle. The **pcda** computing appliance -------------------------------- We'll discuss things which led to the pcda appliance: - why's and what's of Linux - why's and what's of R and Python - open source facilitates contributed packages with latest and greatest statistical techniques, bug fixes, domain specific tools, etc. - free, like speech and like beer - efficiency of command line and scripts vs GUI - reproducible analysis/research You should go through (if you haven't already) the screencasts and instructions on the `pcda VM page `_ that covers installation and an overview of VirtualBox and the Lubuntu desktop. The screencasts below are from Fall 2020 but nothing has changed except the name of the VM. - `SCREENCAST: Intro to the pcda VM `_ (11:36) Preview of data science with R and R Studio ------------------------------------------- You'll get your first peek at these tools and get a preview of a typical analysis project involving building and comparing predictive models. This will serve as a preview of much of what this course is about. - `SCREENCAST: Preview of R and R Studio `_ (8:20) Preview of Python and Anaconda ------------------------------ We'll just do a quick look so that those who are curious can start to tinker around. We'll be learning Python later in the semester. - `SCREENCAST: Preview of Python `_ (10:59) Explore (OPTIONAL) ================== A few more Linux shell tutorials that I've found useful are: * `Learn Enough Command Line to be Dangerous `_ * `Linux Tutorial from Ryan's Tutorials `_ * `Unix Tutorial for Beginners `_ .. note:: In the "Learn Enough Command Line to be Dangerous..." tutorial, there are two nice boxes describing the `"magic of computers" `_ and `"technical sophistication" `_. **READ THEM.** This section will typically have links related to the topic, ... or not. Have fun exploring and learning more. * `Mapping the CRAN social network with R `_ * Are "super nerds" killing baseball? This `article `_ raise some thought provoking issues about analytics in sports. * `Hurricane models `_ it's not just one model * `Advice for constructing an online portfolio for analytics job seekers `_ - Q&A on Quora. Another thread on Quora discussed the types of `classes one might take to learn data science `_. * `Getting started in data science `_ Short blog post. No hype. Good advice. For another dose of advice, check out this podcast from TalkPython on paths to a `data science career `_