Source: https://drive.google.com/file/d/1kiISfUk70piO42XPdPw_gc858DLeDh3Q/view?usp=sharing

Machine Learning

First Steps to Supervised Prediction

Data Partitioning
Feature Selection
Model Selection
Testing the Model

Data Partitioning

Full Dataset
Training and Testing set
Training Set, Validation Set, Testing Set

Training set is the data used to build your predictive model
Testing set is new and independent data set used to assess if prediction model is generalizable
Validation set is data from original dataset that was held out and not used in training the model; helpful in fine-tuning prediction accuracy

Feature Selection

feature selection determines which variables are most predictive and includes them in the model
Note, whatever you do to your training data (removing, transforming, etc.), you will eventually do to your test data as well
variables that can be used for accurate prediction exploit the relationship between the variables but do NOT mean that one causes the other.

Model Selection

What variable are you trying to predict?

Regression: predicting continuous variables (i.e. Age)

Classification: predicting categorical variables (i.e. education level)

Supervised machine learning: Tree-based algorithms

Decision tree

A decision tree is just a function that is displayed as a tree
Decision trees compromise of the following components:
A. Internal nodes: Test one feature of the input (e.g. height)
B. Branches: Possible values of the feature being tested
C. Leaf nodes: The label of input whose features correspond to the root-to-leaf path

How do we grow decision tree from data?

Before considering decision trees, let’s consider decision stumps.
Stumps are trees where only one splitting is allowed (decision trees of depth 1).

What would be the parameters for defining a stump?

How do we decide the optimal ‘rule’ for splitting?
A rule consists of 3 things:

Feature

Threshold value for the feature if continuous

Labels at the leaf nodes after splitting

First, we have to define a ‘score’ or ‘metric’ for evaluating the split:

We want to learn decision stumps by finding the rule with the best ‘score’

‘score’ defines what is considered ‘best’

How do we grow a decision stump from data?

The prediction should improve after the split
Most intuitive score: “accuracy”

Find the rule that maximizes the number of training examples that are classified correctly

Several issues with accuracy:

Often, it’s difficult to find decision rules that improve accuracy

Decision stumps divide input space by horizontal/vertical lines

It could happen that no horizontal or vertical line improves accuracy

In practice, more complex score measures like ‘information gain’ are used

Summary of Decision Trees

Good:

Very interpretable (‘Rules’)

Very fast (learning and classification)

No data preprocessing
Bad:

Have to rely on a greedy strategy - not optimal

Often not very accurate - we can use a simple trick

Training the Model

Several examples of a cost function

Measurement Purpose Commonly Used Variable Type
RMSE Root Mean Squared Error: summarize distance b/w prediction and actual value; sensitive to outliers (lower = better) Continuous
Accuracy What % where predicted correctly? (higher = better) Categorical
Sensitivity or Recall Of those that were positives, what % were predicted to be positive? (higher = better) Categorical
Specificity Of those that were negative, what % were predicted to be negative? (higher = better) Categorical
Precision of PPV Of those that should have been positives, what % were predicted to be positive? (higher = better) Categorical
F1 Score Considers Precision and Recall (0 = poor; 1 = best) Categorical

Type I Error (false positive) vs. Type II Error (false negative)

Confusion Matrix

Measurement	Purpose	Commonly Used Variable Type
RMSE	Root Mean Squared Error: summarize distance b/w prediction and actual value; sensitive to outliers (lower = better)	Continuous
Accuracy	What % where predicted correctly? (higher = better)	Categorical
Sensitivity or Recall	Of those that were positives, what % were predicted to be positive? (higher = better)	Categorical
Specificity	Of those that were negative, what % were predicted to be negative? (higher = better)	Categorical
Precision of PPV	Of those that should have been positives, what % were predicted to be positive? (higher = better)	Categorical
F1 Score	Considers Precision and Recall (0 = poor; 1 = best)	Categorical

Final Steps to Supervised Prediction

Training & Testing Final Model
Deploying Final Model

Carter's Digital Garden

Explorer

COGS 9 Lecture 19

Machine Learning

First Steps to Supervised Prediction

Supervised machine learning: Tree-based algorithms

Final Steps to Supervised Prediction

Graph View

Table of Contents

Backlinks