Source: https://dsc40a.com/resources/lectures/lec07/lec07-filled.pdf
Orthogonal Projections
Spans and Projections
Projecting onto a single vector
- Let
and be two vectors in . - The span of
is the set of all vectors of the form:
whereis a scalar. - Question: What vector in
is closest to ? - The vector in
that is closest to is the ___ projection of onto .
Projection error
- Let
be the projection error: that is, the vector that connects to . - Goal: Find the
that makes as short as possible. - That is, minimize:
- Equivalently, minimize:
- Idea: To make
as short as possible, it should be orthogonal to .
Minimizing projection error
- Goal: Find the
that makes as short as possible. - Idea: To make
as short as possible, it should be orthogonal to . - Can we prove that making
orthogonal to minimizes ? - Pythagorean theorem:
- Now we know that to minimize
must be orthogonal to . - Given this fact, how can we solve for
?
Orthogonal projection
- Question: What vector in
is closest to ? - Answer: It is the vector
, where: - Note that
is the solution to a minimization problem, specifically, this one: - We call
the orthogonal projection of onto j. - Think of
as the “shadow” of .
Exercise
Let
What is the orthogonal projection of
Your answer should be of the form
Orthogonal projection of
Moving to multiple dimensions
- Let’s now consider three vectors,
and , all in . - Question: What vector in
is closest to ? - Vectors in
are of the form , where are scalars.
Minimizing projection error in multiple dimensions
- Question: What vector in
is closest to ? - That is, what vector minimizes
, where: - Answer: It’s the vector such that
is orthogonal to . - Issue: Solving for
and in the following equation is difficult: - It’s hard for us to solve for
and in: - Observation: All we really need is for
and to individually be orthogonal to .l - That is, it’s sufficient for
to be orthogonal to the spanning vectors themselves. - If
and then:
- Question: What vector in
is closest to ? - Answer: It’s the vector such that
is orthogonal to . - Equivalently, it’s the vector such that
and are both orthogonal to :
- This is a system of two equations, two unknowns (
and ), but it still looks difficult to solve.
Now what?
- We’re looking for the scalars
and that satisfy the following equations:
- In this example, we just have two spanning vectors,
and . - If we had any more, this system of equations would get extremely messy, extremely quickly.
- Idea: Rewrite the above system of equations as a single equation, involving matrix-vector products.
Matrices
Matrices
- An
matrix is a table of numbers with rows and columns. - We use upper-case letters to denote matrices.
- Since
has two rows and three columns, we say . - Key idea: Think of a matrix as several column vectors, stacked next to each other.
Matrix addition and scalar multiplication
- We can add two matrices only if they have the same dimensions.
- Addition occurs elementwise
- TODO
- Scalar multiplication occurs elementwise, too:
- TODO
Matrix-matrix multiplication
- Key idea: We can multiply matrices
and if and only if:
- If
is and is , then is . - Example: If
is as defined below, what is ?
Matrix-vector multiplication
- A vector
is a matrix with rows and 1 column.
- Suppose
. - What must the dimensions of
be in order for the product to be valid? - What must the dimensions of
be in order for the product to be valid?
One view of matrix-vector multiplication
- One way of thinking about the product
is that it is the dot product of with every row of .
Another view of matrix-vector multiplication
- Another way of thinking about the product of
is that it is a linear combination of the columns of , using the weights in .
Matrix-vector products create linear combinations of columns!
- Key idea: It’ll be very useful to think of the matrix-vector product
as a linear combination of the columns of , using the weights in .
Spans and projections, revisited
Moving to multiple dimensions
- Let’s now consider three vectors,
, and , all in . - Question: What vector in
is closest to ? - That is, what values of
and minimize ?
Answer: and such that:
Matrix-vector products create linear combinations of columns!
- Combining
and into a single matrix gives:
- Then, if
, linear combinations of and can be written as . - The span of the columns of
, or , consists of all vectors that can be written in the form .
Minimizing projection error in multiple dimensions
- Goal: Find the vector
such that is minimizedl. - As we’ve seen,
must be such that:
- How can we use our knowledge of matrices to rewrite this system of equations as a single equation?
Simplifying the system of equations, using matrices
can be written as , so .- The condition that
must be orthogonal to each column of is equivalent to the condition that .
Rows of are the columns of
The normal equations
- Goal: Find the vector
such that is minimized. - We now know that it is the vector
such that:
- The last statement is referred to as the normal equations.
The general solution to the normal equation
- Goal, in general: Find the vector
such that is minimized. - We now know that it is the vector
such that:
- Assuming
is invertible, this is the vector:
- This is a big assumption, because it requires
to be full rank. - If
is not full rank, then there are infinitely many solutions to the normal equations, .
What does it mean?
- Original question: What vector in
is closest to ? - Final answer: It is the vector
, where:
An optimization problem, solved
- We just used linear algebra to solve an optimization problem.
- Specifically, the function we minimized is:
- This is a function whose input is a vector,
, and whose output is a scalar - The input,
, to that minimizes it is:
- We’re going to use this frequently!
Regression and linear algebra
Wait… why do we need linear algebra?
- Soon, we’ll want to make predictions using more than one feature.
- Example: Predicting commute times using departure hour and temperature.
- Thinking about linear regression in terms of matrices and vectors will allow us to find hypothesis functions that:
- Use multiple features (input variables).
- Are non-linear in the features, e.g.
. - Let’s see if we can put what we’ve just learned to use.
Simple linear regression, revisited
- Model:
. - Loss function:
. - To find
and we minimized empirical risk, i.e. average loss:
- Observation:
kind of looks like the formula for the norm of a vector,
Regression and linear algebra
Let’s define a few new terms:
- The observation vector is the vector
. This is the vector of observed “actual values”. - The hypothesis vector is the vector
with components . This is the vector of predicted values. - The error vector is the vector
with components:
- Key idea: We can rewrite the mean squared error of
as:
The hypothesis vector
- The hypothesis vector is the vector
with components . This is the vector of predicted values. - For the linear hypothesis function
, the hypothesis vector can be written:
Rewriting the mean squared error
- Define the design matrix
as:
- Define the parameter vector
to be . - Then,
, so the mean squared error becomes:
What’s next?
- To find the optimal model parameters for simple linear regression,
and , we previously minimized:
- Now that we’ve reframed the simple linear regression problem in terms of linear algebra, we can find
and by minimizing:
- We’ve already solved this problem! Assuming
is invertible, the best is: