EC524W20
Masters-level applied econometrics course—focusing on prediction—at the University of Oregon (EC424/524 during Winter quarter, 2020 Taught by Ed Rubin
Install / Use
/learn @edrubin/EC524W20README
EC 524, Winter 2020
Welcome to Economics 524 (424): Prediction and machine-learning in econometrics, taught by Ed Rubin and Connor Lennon.
Schedule
Lecture Tuesday and Thursday, 10:00am–11:50am, 105 Peterson Hall
Lab Friday, 12:00pm–12:50pm, 102 Peterson Hall
Office hours
- Ed Rubin (PLC 519): Thursday (2pm–3pm); Friday (1pm–2pm)
- Connor Lennon (PLC 430): Monday (1pm-2pm)
Syllabus
Books
Required books
Suggested books
- R for Data Science
- Introduction to Data Science (not available without purchase)
- The Elements of Statistical Learning
Lecture notes
- Why do we have a class on prediction?
- How is prediction (and how are its tools) different from causal inference?
- Motivating examples
001 - Statistical learning foundations
- Why do we have a class on prediction?
- How is prediction (and how are its tools) different from causal inference?
- Motivating examples
- Model accuracy
- Loss for regression and classification
- The variance bias-tradeoff
- The Bayes classifier
- KNN
- Review
- The validation-set approach
- Leave-out-out cross validation
- k-fold cross validation
- The bootstrap
In-class: Validation-set exercise (Kaggle)
004 - Linear regression strikes back
- Returning to linear regression
- Model performance and overfit
- Model selection—best subset and stepwise
- Selection criteria
- Ridge regression
- Lasso
- Elasticnet
- Introduction to classification
- Why not regression?
- But also: Logistic regression
- Assessment: Confusion matrix, assessment criteria, ROC, and AUC
- Introduction to trees
- Regression trees
- Classification trees—including the Gini index, entropy, and error rate
- Introduction
- Bagging
- Random forests
- Boosting
- Hyperplanes and classification
- The maximal margin hyperplane/classifier
- The support vector classifier
- Support vector machines
Projects
Intro Predicting sales price in housing data (Kaggle)
Help: Kaggle notebooks
001 KNN and loss (Kaggle notebook) <br> You will need to sign into you Kaggle account and then hit "Copy and Edit" to add the notebook to your account. <br> Due 21 January 2020 before midnight.
002 Cross validation and linear regression (Kaggle notebook) <br> Due 04 February 2020 before midnight.
003 Model selection and shrinkage (Kaggle notebook) <br> Due 13 February 2020 before midnight.
004 Predicting heart disease (Kaggle competition) | Competition Due 20 February 2020 before midnight.
005 Classifying customer churn (Kaggle competition) | Competition Due In-class 27 February 2020.
Class project Due 12 March 2020 before class.
Lab notes
- General "best practices" for coding
- Working with RStudio
- The pipe (
%>%)
001 - dplyr and Kaggle notebooks
002 - Cross validation and simulation
- Cross-validation review
- CV and interdependence
- Writing functions
- Introduction to learning via simulation
- Simulation: CV and dependence
Additional R script for simulation
004 - Data cleaning and workflow with tidymodels
005 - Perceptrons and neural nets
Additional Data cleaning in R (with caret)
- Converting numeric variables to categorical
- Converting categorical variables to dummies
- Imputing missing values
- Standardizing variables (centering and scaling)
Additional resources
R
- RStudio's recommendations for learning R, plus cheatsheets, books, and tutorials
- [YaRrr! The Pirate’s Guide to R](https://bookdown.org/ndph
