one-hot encoding: encoding where the feature vector has a single “1” entry
- feature = [0, 0, 0] for “male”
- feature = [1, 0, 0] for “female”
- feature = [0, 1, 0] for “other”
- feature = [0, 0, 1] for “not specified”
- Note that to capture 4 possible categories, we only need three dimensions (a dimension for “male” would be redundant)
- This approach can be used to capture a variety of categorical feature types, along with objects that belong to multiple categories
Features can be piecewise functions, allowing us to handle complex shapes, periodicity, etc.
- Still a form of one-hot encoding
Regression Diagnostics
Mean-squared error (MSE)
Why MSE (and not mean-absolute-error or something else)
Assuming the errors form a Gaussian distribution (centered around 0, mostly small errors, large errors are rare) (not important) Can use a Q-Q plot to visualize the distribution of the errors
How long does the MSE have to be before it's "low enough"?
It depends. The MSE is proportional to the variance of the distribution
Coefficient of determination
The
statistic Mean: Variance: MSE:
FVU = fraction of variance unexplained FVU(f) = 1: trivial predictor FVU(f) = 0: perfect predictor
= 0: trivial predictor = 1: perfect predictor
Can't we get an of 1 by throwing in a bunch of random features?
Yes “Among competing hypotheses, the one with the fewest assumptions should be selected”