Source: https://dsc40a.com/resources/lectures/lec07/lec07-filled.pdf

Orthogonal Projections

Spans and Projections

Projecting onto a single vector

  • Let and be two vectors in .
  • The span of is the set of all vectors of the form:

    where is a scalar.
  • Question: What vector in is closest to ?
  • The vector in that is closest to is the ___ projection of onto .

Projection error

  • Let be the projection error: that is, the vector that connects to .
  • Goal: Find the that makes as short as possible.
  • That is, minimize:
  • Equivalently, minimize:
  • Idea: To make as short as possible, it should be orthogonal to .

Minimizing projection error

  • Goal: Find the that makes as short as possible.
  • Idea: To make as short as possible, it should be orthogonal to .
  • Can we prove that making orthogonal to minimizes ?
  • Pythagorean theorem:


  • Now we know that to minimize must be orthogonal to .
  • Given this fact, how can we solve for ?


Orthogonal projection

  • Question: What vector in is closest to ?
  • Answer: It is the vector , where:
  • Note that is the solution to a minimization problem, specifically, this one:
  • We call the orthogonal projection of onto j.
  • Think of as the “shadow” of .

Exercise

Let and .
What is the orthogonal projection of onto ?
Your answer should be of the form , where is a scalar.

Orthogonal projection of onto is .

Moving to multiple dimensions

  • Let’s now consider three vectors, and , all in .
  • Question: What vector in is closest to ?
  • Vectors in are of the form , where are scalars.

Minimizing projection error in multiple dimensions

  • Question: What vector in is closest to ?
  • That is, what vector minimizes , where:
  • Answer: It’s the vector such that is orthogonal to .
  • Issue: Solving for and in the following equation is difficult:
  • It’s hard for us to solve for and in:
  • Observation: All we really need is for and to individually be orthogonal to .l
  • That is, it’s sufficient for to be orthogonal to the spanning vectors themselves.
  • If and then:


  • Question: What vector in is closest to ?
  • Answer: It’s the vector such that is orthogonal to .
  • Equivalently, it’s the vector such that and are both orthogonal to :

  • This is a system of two equations, two unknowns ( and ), but it still looks difficult to solve.

Now what?

  • We’re looking for the scalars and that satisfy the following equations:

  • In this example, we just have two spanning vectors, and .
  • If we had any more, this system of equations would get extremely messy, extremely quickly.
  • Idea: Rewrite the above system of equations as a single equation, involving matrix-vector products.

Matrices

Matrices

  • An matrix is a table of numbers with rows and columns.
  • We use upper-case letters to denote matrices.
  • Since has two rows and three columns, we say .
  • Key idea: Think of a matrix as several column vectors, stacked next to each other.

Matrix addition and scalar multiplication

  • We can add two matrices only if they have the same dimensions.
  • Addition occurs elementwise
  • TODO
  • Scalar multiplication occurs elementwise, too:
  • TODO

Matrix-matrix multiplication

  • Key idea: We can multiply matrices and if and only if:
  • If is and is , then is .
  • Example: If is as defined below, what is ?

Matrix-vector multiplication

  • A vector is a matrix with rows and 1 column.
  • Suppose .
  • What must the dimensions of be in order for the product to be valid?
  • What must the dimensions of be in order for the product to be valid?

One view of matrix-vector multiplication

  • One way of thinking about the product is that it is the dot product of with every row of .

Another view of matrix-vector multiplication

  • Another way of thinking about the product of is that it is a linear combination of the columns of , using the weights in .

Matrix-vector products create linear combinations of columns!

  • Key idea: It’ll be very useful to think of the matrix-vector product as a linear combination of the columns of , using the weights in .


Spans and projections, revisited

Moving to multiple dimensions

  • Let’s now consider three vectors, , and , all in .
  • Question: What vector in is closest to ?
  • That is, what values of and minimize ?
    Answer: and such that:

Matrix-vector products create linear combinations of columns!

  • Combining and into a single matrix gives:
  • Then, if , linear combinations of and can be written as .
  • The span of the columns of , or , consists of all vectors that can be written in the form .

Minimizing projection error in multiple dimensions

  • Goal: Find the vector such that is minimizedl.
  • As we’ve seen, must be such that:
  • How can we use our knowledge of matrices to rewrite this system of equations as a single equation?

Simplifying the system of equations, using matrices




  1. can be written as , so .
  2. The condition that must be orthogonal to each column of is equivalent to the condition that .




    Rows of are the columns of

The normal equations

  • Goal: Find the vector such that is minimized.
  • We now know that it is the vector such that:



  • The last statement is referred to as the normal equations.

The general solution to the normal equation

  • Goal, in general: Find the vector such that is minimized.
  • We now know that it is the vector such that:

  • Assuming is invertible, this is the vector:
  • This is a big assumption, because it requires to be full rank.
  • If is not full rank, then there are infinitely many solutions to the normal equations, .

What does it mean?

  • Original question: What vector in is closest to ?
  • Final answer: It is the vector , where:

An optimization problem, solved

  • We just used linear algebra to solve an optimization problem.
  • Specifically, the function we minimized is:
  • This is a function whose input is a vector, , and whose output is a scalar
  • The input, , to that minimizes it is:
  • We’re going to use this frequently!

Regression and linear algebra

Wait… why do we need linear algebra?

  • Soon, we’ll want to make predictions using more than one feature.
  • Example: Predicting commute times using departure hour and temperature.
  • Thinking about linear regression in terms of matrices and vectors will allow us to find hypothesis functions that:
  • Use multiple features (input variables).
  • Are non-linear in the features, e.g. .
  • Let’s see if we can put what we’ve just learned to use.

Simple linear regression, revisited

  • Model: .
  • Loss function: .
  • To find and we minimized empirical risk, i.e. average loss:
  • Observation: kind of looks like the formula for the norm of a vector,

Regression and linear algebra

Let’s define a few new terms:

  • The observation vector is the vector . This is the vector of observed “actual values”.
  • The hypothesis vector is the vector with components . This is the vector of predicted values.
  • The error vector is the vector with components:
  • Key idea: We can rewrite the mean squared error of as:

The hypothesis vector

  • The hypothesis vector is the vector with components . This is the vector of predicted values.
  • For the linear hypothesis function , the hypothesis vector can be written:

Rewriting the mean squared error

  • Define the design matrix as:
  • Define the parameter vector to be .
  • Then, , so the mean squared error becomes:

What’s next?

  • To find the optimal model parameters for simple linear regression, and , we previously minimized:
  • Now that we’ve reframed the simple linear regression problem in terms of linear algebra, we can find and by minimizing:
  • We’ve already solved this problem! Assuming is invertible, the best is: