More Simple Linear Regression

Correlation

Quantifying patterns in scatter plots

  • Correlation coefficient
  • A measure of the strength of the linear association of two variables, and
  • Intuitively, it measures how tightly clustered a scatter plot is around a straight line
  • ranges between -1 and 1
  • negative: negative association
  • positive: positive association

The correlation coefficient

  • The correlation coefficient, , is defined as the average of the product of and , when both are in standard units.
  • Let be the standard deviation of the s, and be the mean of the s.
  • in standard units is .
  • The correlation coefficient, then, is:

Why multiply the product of SUs?

Top right positive and positive
Bottom Left negative and negative

Another way to express

  • It turns out that for , the optimal slope for the linear hypothesis function when using squared loss (i.e. the regression line), can be written in terms of

    TODO

Proof that







Interpreting the formulas

Interpreting the slope

  • The units of the slope are units of per units of .
  • In our commute times example, , our predicted commute time decreases by 8.19 minutes per hour.
  • Since and , the slope’s sign is ‘s sign.
  • As the values get more spread out, increases, so the slope gets steeper.
  • As the values get more spread out, increases, so the slope gets shallower.

Interpreting the intercept

  • What are the units of the intercept?
  • units of : minutes
  • What is the value of ?



Correlation and mean squared error

  • Claim: Suppose that and are the optimal intercept and slope for the regression line. Then,
  • That is, the mean squared error of the regression line’s predictions and the correlation coefficient, , always satisfy the relationship above.
  • Even if it’s true, why do we care?
  • In machine learning, we often use both the mean squared error and to compare the performances of different models.
  • If we can prove the above statement, we can show that finding models that minimize mean squared error is equivalent to finding models that maximize

Important forDSC_80


Exercise

Suppose we chose the model and squared loss.
What is the optimal model parameter, ?


TODO

Comparing mean squared errors

TODO