Source: https://dsc40a.com/resources/lectures/lec04/lec04-filled.pdf

Simple Linear Regression


Recap - Center and Spread

The relationship between and

  • Recall, for a general loss function and the constant model empirical risk is of the form:
  • , the value of that minimizes empirical risk, represents the center of the dataset in some way.
  • , the smallest possible value of empirical risk, represents the spread of the dataset in some way.
  • The specific center and spread depend on the choice of loss function.

Examples

When using squared loss:

  • .
  • .

When using absolute loss:

  • .
  • .

0-1 Loss

  • The empirical risk for the 0-1 loss is:
  • This is the proportion (between 0 and 1) of data points not equal to .
  • is minimized when .
  • Therefore, is the proportion of data points not equal to the mode.
  • Example: What’s the proportion of values not equal to the mode in the dataset ?

A poor way to measure spread

  • The minimum value of is the proportion of data points not equal to the mode.
  • A higher value means less of the data is clustered at the mode.
  • Just as the mode is a very basic way of measuring the center of the data, is a very basic and uninformative way of measuring spread.

Summary of center and spread

  • Different loss functions lead to different empirical risk functions , which are minimized at various measures of center.
  • The minimum values of empirical risk, , are various measures of spread.
  • larger values of spread data is more spread out
  • There are many different ways to measure both center and spread; these are sometimes called descriptive statistics.

Simple linear regression

What’s next?

  • In Lecture 1, we introduced the idea of a hypothesis function, .
  • We’ve focused on finding the best constant model, .
  • Now that we understand the modeling recipe, we can apply it to find the best simple linear regression model, .
  • This will allow us to make predictions that aren’t all the same for every data point.

Recap: Hypothesis functions and parameters

A hypothesis function, , takes in an as an input and returns a predicted .
Parameters define the relationship between the input and output of a hypothesis function.

The simple linear regression model, , has two parameters: and .

The modeling recipe

  1. Choose a model.
    Before:
    Now:
  2. Choose a loss function.

  3. Minimize average loss to find optimal model parameters.

Minimizing mean squared error for the simple linear model

  • We’ll choose squared loss, since it’s the easiest to minimize.
  • Our goall then, is to find the linear hypothesis function that minimizes empirical risk:
  • Since linear hypothesis functions are of the form , we can re-write as a function of and :
  • How do we find parameters and that minimize ?

Loss surface

For the constant model, the graph of looked like a parabola.
What does the graph of look like for the simple linear regression model? bowl-looking “loss surface”


Minimizing mean squared error for the simple linear model

Minimizing multivariate functions

  • Our goal is to find the parameters and that minimize mean squared error:
  • is a function of two variables: and .
  • To minimize a function of multiple variables:
  • Take partial derivatives with respect to each variable.
  • Set all partial derivatives to 0.
  • Solve the resulting system of equations.
  • Ensure that you’ve found a minimum, rather than a maximum or saddle point (using the second derivative test for multivariate functions).

Example. Find the point at which the following function is minimized.



Minimized at

Minimizing mean squared error


To find the and that minimize , we’ll:

  1. Find and set it equal to 0.
  2. Find and set it equal to 0.
  3. Solve the resulting system of equations.




Strategy

We have a system of two equations and two unknowns ( and ):

To proceed, we’ll:

  1. Solve for in the first equation.
    The result becomes , because it’s the “best intercept.”
  2. Plug into the second equation and solve for .
    The result becomes , because it’s the “best slope.”

Solving for








Solving for

Goal: isolate .


Use






An equivalent formula for

Claim:

Key idea:
Proof:


left numerator
denominator follows a similar argument

Least squares solutions

  • The least squares solutions for the intercept and slope are:
  • We say and are optimal parameters, and the resulting line is called the regression line.
  • The process of minimizing empirical risk to find optimal parameters is also called “fitting to the data.
  • To make predictions about the future, we use .

Causality

Can we conclude that leaving later causes you to get to school earlier?
No! This is just an observed pattern