More Simple Linear Regression
Correlation
Quantifying patterns in scatter plots
- Correlation coefficient
- A measure of the strength of the linear association of two variables,
and - Intuitively, it measures how tightly clustered a scatter plot is around a straight line
- ranges between -1 and 1
negative: negative association positive: positive association
The correlation coefficient
- The correlation coefficient,
, is defined as the average of the product of and , when both are in standard units. - Let
be the standard deviation of the s, and be the mean of the s. in standard units is . - The correlation coefficient, then, is:
Why multiply the product of SUs?
Top right
positive and positive
Bottom Leftnegative and negative
Another way to express
- It turns out that for
, the optimal slope for the linear hypothesis function when using squared loss (i.e. the regression line), can be written in terms of
TODO
Proof that
Interpreting the formulas
Interpreting the slope
- The units of the slope are units of
per units of . - In our commute times example,
, our predicted commute time decreases by 8.19 minutes per hour. - Since
and , the slope’s sign is ‘s sign. - As the
values get more spread out, increases, so the slope gets steeper. - As the
values get more spread out, increases, so the slope gets shallower.
Interpreting the intercept
- What are the units of the intercept?
- units of
: minutes - What is the value of
?
Correlation and mean squared error
- Claim: Suppose that
and are the optimal intercept and slope for the regression line. Then,
- That is, the mean squared error of the regression line’s predictions and the correlation coefficient,
, always satisfy the relationship above. - Even if it’s true, why do we care?
- In machine learning, we often use both the mean squared error and
to compare the performances of different models. - If we can prove the above statement, we can show that finding models that minimize mean squared error is equivalent to finding models that maximize
Important forDSC_80
Connections to related models
Exercise
Suppose we chose the model
What is the optimal model parameter,
TODO
Comparing mean squared errors
TODO