m x n where is the number of rows and the number of columns.

Definitions

where is the ith column of the matrix.

, just each column vector of B acting on A.

For an inverse to exist, needs to have a unique solution.

Independent vectors i.e. it doesn’t exist s.t. .

Independent vectors form a basis in . Every vector in the space is a unique combination of those basis vectors.

Here are particular bases for Rn among all the choices we could make: Standard basis = columns of the identity matrix General basis = columns of any invertible matrix Orthonormal basis = columns of any orthogonal matrix

If is invertible, then is invertible (follows from invertible), symmetric (always), and positive definite (less clear, probably to do with eigenvalues/singular valuess.

Similar matrices

  • Two similar matrices describe the same linear map i.e. their mapping are isomorphic and the isomorphism is the matrix . also called a change of basis.
  • Two square matrices are called similar if there is an invertible matrix s.t. .
  • Property: Similar matrices have the same characteristic polynomial (i.e. same eigenvalues).
    • Proof: (using multiplicative property of determinant) = .
  • Column and row space

Definition: The column space contains all the combinations of the columns

Useful decomposition if A is not full rank One can always decompose a matrix into column matrix and row matrix where A is m x n, C is m x r, R is r x n.

For a 3x3 matrix with rank , one can decompose into where C is 3x2 (columns of C are a basis for the column space) and R is 2x3 (rows of R are a basis for the row space).

is the identity if is full-rank. Otherwise, it will be a block matrix where the columns of describe how to get the columns $r+1, …, n by using the previous columns.

Orthogonality of null-spaces and column spaces

  • Remember that Ax=0 means that a dot product between x and every row of A is equal to zero.

Motivation for least squares

Suppose A is tall and thin (m > n). The n columns are likely to be independent. But if b is not in the column space, Ax = b has no solution. The least squares method minimizes by solving (i.e. project into row space ).

Orthogonal vectors

  • , columns are orthogonal, (dot product between columns)
  • orthonormal, the columns are also unit vectors, .
  • If is square, then also, and thus .
  • are rotation transforms. Indeed, .
  • For the eigenvalues of , β‡’ β‡’
  • For A full-rank, we can orthogonalize
    • . Then the columns of are orthornormal. is upper-triangular (by Gram-Schmidt iterative construction).
    • Example for least squares:
      • , equations , unknowns, minimize .
      • Normal equations for the best : or . or .
      • If , then which leads to (R is much easier to invert)

Eigenvalues and Eigenvectors

  • An eigenvector with eigenvalue of matrix (only for square matrices) is
  • To find the eigenvalues, we need to find the nullspace of i.e. s.t.
  • There exists a nullspace iff is not invertible iff . This is the characteristic equation, and we solve for .
  • Property: The eigenvalues of a triangular matrix are the entries on its main diagonal.

If not symmetric

  • have the same eigenvectors as A.

Spectral theorem

  • Let be a symmetric matrix

  • Then has orthogonal eigenvectors =0. (easy proof)

  • Let be the eigenvectors of , then and thus, . (spectral theorem). This is a sum of a rank one matrices formed by

Singular values

  • is square, symmetric, nonnegative definite.
  • With (thus symmetric), this will lead to the singular values of A. SVD: with and .
  • We have = .
  • Indeed, the are eigenvectors of and is symmetric. . .
  • We then have and thus
  • SVD

Trace

  • is the divergence of the vector field created by ?
  • divergence = rate of area change/area (of a local area around a point that evolves in a vector field).
  • Usually, divergence is a quantity dependent on the position of the point within the vector field.
  • But for the vector generated by matrix A, tr(A) is a constant.

Determinant

  • Property 1. .

  • Property 2. Exchange rows of : reverse the sign of . Thus, for permutation matrices, .

  • Property 3a. For ,

  • Property 3b. For ,

  • is a linear operator for each row, while keeping other rows the same.

  • Property 4. If there’s 2 equal rows β†’ (test for invertibility). Proof: exchange the two equal rows. The determinant must change sign but the matrix is the same β‡’

PCA

  • Given data matrix with datapoints and features, we can project into smaller dimensional space, which we will optimally linearly combine the features, according to least-squares, best low-rank approximation and Frobenius norm.
  • where comes from the SVD decomposition
  • or the eigenvector decomposition of the covariance matrix (don’t forget to demean the data matrix) because the covariance matrix is symmetric and positive-definite.

Big picture

  • elimination
  • orthogonalization
  • eigenvalues
  • Singular values