MSDS621 Introduction to Machine Learning

“In God we trust; all others bring data.” — Attributed to W. Edwards Deming and George Box

<img src="images/iris-TD-5-X-Arial.svg" width="300" align="right">This course introduces students to the key processes, models, and concepts of machine learning with a focus on:

Regularization of linear models (finishing linear regression topic from previous class)
Gradient descent loss minimization
Naive bayes
Nonparametric methods such as k-nearest neighbor
Decision trees
Random forests
Mode interpretation
Vanilla neural networks using pytorch

We study a few key models deeply, rather than providing a broad but superficial survey of models. As part of the lab you will learn about data cleaning, feature engineering, and model assessment.

<img src="images/feynman.png" width="150" align="right" style="padding-top:10px">As part of this course, students implement linear and logistic regression with regularization through gradient descent, a Naive Bayes model for text sentiment analysis, decision trees, and random forest models. Implementing these models yourself is critical to truly understanding them. As Richard Feynman wrote, "What I cannot create, I do not understand." (From his blackboard at the time of his death.) With an intuition behind how the models work, you'll be able to understand and predict their behavior much more easily.

Class details

INSTRUCTOR. Terence Parr. I’m a professor in the computer science and data science program departments and was founding director of the MS in Analytics program at USF (which became the MS data science program). Please call me Terence or Professor (“Terry” is not ok).

SPATIAL COORDINATES<br>

Class is held at 101 Howard 5th floor classroom 529.
Exams will be via HonorLock online/remote.
My office is room 525 @ 101 Howard on 5th floor.

TEMPORAL COORDINATES<br>

Classes run Tue Oct 21 through Tue Dec 7. I believe we will have 12 class lectures, due to exams and Thanksgiving break.

Lectures: 10AM-11:50AM (section 1) and 1-2:50PM (section 2)
Exam 1: Wed Nov 10, 2021
Exam 2: Wed Dec 8, 2021

INSTRUCTION FORMAT. Class runs for 1:50 hours, 2 days/week. Instructor-student interaction during lecture is encouraged and we'll mix in mini-exercises / labs during class. All programming will be done in the Python 3 programming language, unless otherwise specified.

COURSE BOOK. There is no textbook for the course, but you might find The elements of statistical learning and The Mechanics of Machine Learning (in progress) useful.

TARDINESS. Please be on time for class. It is a big distraction if you come in late.

Student evaluation

| Artifact | Grade Weight | Due date | |--------|--------|--------| |Linear models| 10%| Sun Oct 31, 11:59PM | |Naive Bayes | 8% | Tue Nov 9, 1:00PM (start of pm class) | |Decision trees | 15% | Wed Nov 24, 1:00PM | |Random Forest | 12% | Sun Dec 5, 11:59PM | |Exam 1| 27%| Wed Nov 10 10am-10pm| |Exam 2| 28%| Wed Dec 8 10am-10pm|

All projects will be graded with the specific input or tests given in the project description, so you understand precisely what is expected of your program. Consequently, projects will be graded in binary fashion: They either work or they do not. Each failed unit test gets a fixed amount off, no partial credit. The only exception is when your program does not run on the grader's or my machine because of some cross-platform issue or some obviously trivial problem. (Attention to detail is critical. For example, if you return an integer from a function and my code expects a string, your code makes my code fail.) This is typically because a student has hardcoded some file name or directory into their program. In that case, we will take off a minimum of 10% instead of giving you a 0, depending on the severity of the mistake. Please go to github and verify that the website has the proper files for your solution. That is what I will download for testing.

For some projects, I run a small set of hidden tests that you do not have. These are typically worth 10% and the only way to get into the 90% range is to produce code that works on more than just the tests you're given.

No partial credit. Students are sometimes frustrated about not getting partial credit for solutions they labored on that do not actually work. Unfortunately, "almost working" just never counts in a job situation because nonfunctional solutions have no value. We are not writing essays in English that have some value even if they are not superb. When it comes to software, there is no fair way to assign such partial credit, other than a generic 30% or whatever for effort. The only way to determine what is wrong with your project is for me to fix and/or complete the project. That is just not possible for 90 students. Even if that were possible, there is no way to fairly assign partial credit between students. A few incorrect but critical characters can mean the difference between perfection and absolute failure. If it takes a student 20 hours to find that problem, is that worth more or less partial credit than another project that is half-complete but could be finished in five hours? To compensate, I try to test multiple pieces of the functionality in an effort to approximate partial credit.

Each project has a hard deadline and only those projects working correctly before the deadline get credit. My grading script pulls from github at the deadline. All projects are due at the start of class on the day indicated, unless otherwise specified.

I reserve the right to change projects until the day they are assigned.

Grading standards. I consider an A grade to be above and beyond what most students have achieved. A B grade is an average grade for a student or what you could call "competence" in a business setting. A C grade means that you either did not or could not put forth the effort to achieve competence. Below C implies you did very little work or had great difficulty with the class compared to other students.

Syllabus

Getting started

The first lecture is an overview of the entire machine learning process:

Overview (Day 1)

Regularization for linear models

This topic more or less finishes off the linear regression course you just finished.

Review of linear models (slides) (Day 1)
- Lab: Plotting decision surfaces for linear models (Day 1)
Regularization of linear models L1, L2 (slides) (Day 2)
- Lab: Exploring regularization for linear regression (Day 2)
- Lab: Regularization for logistic regression (Day 3)
- See my deep dive: A visual explanation for regularization of linear models

Training linear models with gradient descent

<img src="https://explained.ai/gradient-boosting/images/directions.png" width="120" align="right">This topic is required so we can train regularized linear models, and is critical to understanding neural networks that you'll study in a future class.

Gradient Descent optimization (slides) (Day 3)
- Lab: Gradient descent in action (Day 3)
(Regularization project)

Models

We will learn 3 models in depth for this course: naive bayes, decision trees, and random forests but will examine k-nearest-neighbor (kNN) briefly.

Naive Bayes (slides) (Day 4)
- Lab: Naive bayes by hand (Day 4)
- (Naive Bayes project)
Intro to non-parametric machine learning models (slides) (Day 5)
Decision trees (slides) (Day 5)
- Lab: Partitioning feature space (Day 6)
- Binary tree crash course (slides) (Day 6)
- Lab: Binary trees (Day 6)
- Training decision trees (slides) (Day 7)
- (Decision trees project)
[Random Forests](https://github.com/parrt/msds621/raw/master/

Msds621

Install / Use

README