one-hot encoding: encoding where the feature vector has a single “1” entry

  • feature = [0, 0, 0] for “male”
  • feature = [1, 0, 0] for “female”
  • feature = [0, 1, 0] for “other”
  • feature = [0, 0, 1] for “not specified”
  • Note that to capture 4 possible categories, we only need three dimensions (a dimension for “male” would be redundant)
  • This approach can be used to capture a variety of categorical feature types, along with objects that belong to multiple categories

Features can be piecewise functions, allowing us to handle complex shapes, periodicity, etc.

  • Still a form of one-hot encoding

Regression Diagnostics

Mean-squared error (MSE)

Why MSE (and not mean-absolute-error or something else)

Assuming the errors form a Gaussian distribution (centered around 0, mostly small errors, large errors are rare) (not important) Can use a Q-Q plot to visualize the distribution of the errors

How long does the MSE have to be before it's "low enough"?

It depends. The MSE is proportional to the variance of the distribution

Coefficient of determination

The statistic Mean: Variance: MSE:

FVU = fraction of variance unexplained FVU(f) = 1: trivial predictor FVU(f) = 0: perfect predictor

= 0: trivial predictor = 1: perfect predictor

Can't we get an of 1 by throwing in a bunch of random features?

Yes “Among competing hypotheses, the one with the fewest assumptions should be selected”