Source: https://dsc40a.com/resources/lectures/lec07/lec07-filled.pdf

Orthogonal Projections

Spans and Projections

Projecting onto a single vector

Let and be two vectors in .
The span of is the set of all vectors of the form:

where is a scalar.
Question: What vector in is closest to ?
The vector in that is closest to is the ___ projection of onto .

Projection error

Let be the projection error: that is, the vector that connects to .
Goal: Find the that makes as short as possible.
That is, minimize:
Equivalently, minimize:
Idea: To make as short as possible, it should be orthogonal to .

Minimizing projection error

Goal: Find the that makes as short as possible.
Idea: To make as short as possible, it should be orthogonal to .
Can we prove that making orthogonal to minimizes ?
Pythagorean theorem:
Now we know that to minimize must be orthogonal to .
Given this fact, how can we solve for ?

Orthogonal projection

Question: What vector in is closest to ?
Answer: It is the vector , where:
Note that is the solution to a minimization problem, specifically, this one:
We call the orthogonal projection of onto j.
Think of as the “shadow” of .

Exercise

Let and .
What is the orthogonal projection of onto ?
Your answer should be of the form , where is a scalar.

Orthogonal projection of onto is .

Moving to multiple dimensions

Let’s now consider three vectors, and , all in .
Question: What vector in is closest to ?
Vectors in are of the form , where are scalars.

Minimizing projection error in multiple dimensions

Question: What vector in is closest to ?
That is, what vector minimizes , where:
Answer: It’s the vector such that is orthogonal to .
Issue: Solving for and in the following equation is difficult:
It’s hard for us to solve for and in:
Observation: All we really need is for and to individually be orthogonal to .l
That is, it’s sufficient for to be orthogonal to the spanning vectors themselves.
If and then:
Question: What vector in is closest to ?
Answer: It’s the vector such that is orthogonal to .
Equivalently, it’s the vector such that and are both orthogonal to :
This is a system of two equations, two unknowns ( and ), but it still looks difficult to solve.

Now what?

We’re looking for the scalars and that satisfy the following equations:
In this example, we just have two spanning vectors, and .
If we had any more, this system of equations would get extremely messy, extremely quickly.
Idea: Rewrite the above system of equations as a single equation, involving matrix-vector products.

Matrices

An matrix is a table of numbers with rows and columns.
We use upper-case letters to denote matrices.
Since has two rows and three columns, we say .
Key idea: Think of a matrix as several column vectors, stacked next to each other.

Matrix addition and scalar multiplication

We can add two matrices only if they have the same dimensions.
Addition occurs elementwise
TODO
Scalar multiplication occurs elementwise, too:
TODO

Matrix-matrix multiplication

Key idea: We can multiply matrices and if and only if:
If is and is , then is .
Example: If is as defined below, what is ?

Matrix-vector multiplication

A vector is a matrix with rows and 1 column.
Suppose .
What must the dimensions of be in order for the product to be valid?
What must the dimensions of be in order for the product to be valid?

One view of matrix-vector multiplication

One way of thinking about the product is that it is the dot product of with every row of .

Another view of matrix-vector multiplication

Another way of thinking about the product of is that it is a linear combination of the columns of , using the weights in .

Matrix-vector products create linear combinations of columns!

Key idea: It’ll be very useful to think of the matrix-vector product as a linear combination of the columns of , using the weights in .

Spans and projections, revisited

Moving to multiple dimensions

Let’s now consider three vectors, , and , all in .
Question: What vector in is closest to ?
That is, what values of and minimize ?
Answer: and such that:

Matrix-vector products create linear combinations of columns!

Combining and into a single matrix gives:
Then, if , linear combinations of and can be written as .
The span of the columns of , or , consists of all vectors that can be written in the form .

Minimizing projection error in multiple dimensions

Goal: Find the vector such that is minimizedl.
As we’ve seen, must be such that:
How can we use our knowledge of matrices to rewrite this system of equations as a single equation?

Simplifying the system of equations, using matrices

can be written as , so .
The condition that must be orthogonal to each column of is equivalent to the condition that .

Rows of are the columns of

The normal equations

Goal: Find the vector such that is minimized.
We now know that it is the vector such that:
The last statement is referred to as the normal equations.

The general solution to the normal equation

Goal, in general: Find the vector such that is minimized.
We now know that it is the vector such that:
Assuming is invertible, this is the vector:
This is a big assumption, because it requires to be full rank.
If is not full rank, then there are infinitely many solutions to the normal equations, .

What does it mean?

Original question: What vector in is closest to ?
Final answer: It is the vector , where:

An optimization problem, solved

We just used linear algebra to solve an optimization problem.
Specifically, the function we minimized is:
This is a function whose input is a vector, , and whose output is a scalar
The input, , to that minimizes it is:
We’re going to use this frequently!

Regression and linear algebra

Wait… why do we need linear algebra?

Soon, we’ll want to make predictions using more than one feature.
Example: Predicting commute times using departure hour and temperature.
Thinking about linear regression in terms of matrices and vectors will allow us to find hypothesis functions that:
Use multiple features (input variables).
Are non-linear in the features, e.g. .
Let’s see if we can put what we’ve just learned to use.

Simple linear regression, revisited

Model: .
Loss function: .
To find and we minimized empirical risk, i.e. average loss:
Observation: kind of looks like the formula for the norm of a vector,

Regression and linear algebra

Let’s define a few new terms:

The observation vector is the vector . This is the vector of observed “actual values”.
The hypothesis vector is the vector with components . This is the vector of predicted values.
The error vector is the vector with components:
Key idea: We can rewrite the mean squared error of as:

The hypothesis vector

The hypothesis vector is the vector with components . This is the vector of predicted values.
For the linear hypothesis function , the hypothesis vector can be written:

Rewriting the mean squared error

Define the design matrix as:
Define the parameter vector to be .
Then, , so the mean squared error becomes:

What’s next?

To find the optimal model parameters for simple linear regression, and , we previously minimized:
Now that we’ve reframed the simple linear regression problem in terms of linear algebra, we can find and by minimizing:
We’ve already solved this problem! Assuming is invertible, the best is:

Carter's Digital Garden

Explorer

DSC 40A Lecture 7

Orthogonal Projections

Spans and Projections

Projecting onto a single vector

Projection error

Minimizing projection error

Orthogonal projection

Exercise

Moving to multiple dimensions

Minimizing projection error in multiple dimensions

Now what?

Matrices

Matrices

Matrix addition and scalar multiplication

Matrix-matrix multiplication

Matrix-vector multiplication

One view of matrix-vector multiplication

Another view of matrix-vector multiplication

Matrix-vector products create linear combinations of columns!

Spans and projections, revisited

Moving to multiple dimensions

Matrix-vector products create linear combinations of columns!

Minimizing projection error in multiple dimensions

Simplifying the system of equations, using matrices

The normal equations

The general solution to the normal equation

What does it mean?

An optimization problem, solved

Regression and linear algebra

Wait… why do we need linear algebra?

Simple linear regression, revisited

Regression and linear algebra

The hypothesis vector

Rewriting the mean squared error

What’s next?

Graph View

Table of Contents

Backlinks