$A$ m x n where $m$ is the number of rows and $n$ the number of columns.
Definitions
$Ax=a_{1}x_{1}+...+a_{n}x_{n}$ where $a_{i}$ is the ith column of the matrix.
$AB=[Ab_{1},...,Ab_{n}]$, just each column vector of B acting on A.
For an inverse to exist, $Ax=0$ needs to have a unique solution.
Independent vectors $v_{1},...,v_{n}$ i.e. it doesnβt exist $Ξ»_{2},...,Ξ»_{n}$ s.t. $v_{1}=Ξ»_{2}v_{2}+...+Ξ»_{n}v_{n}$.
Independent vectors form a basis in $R_{n}$. Every vector in the space is a unique combination of those basis vectors.
Here are particular bases for Rn among all the choices we could make: Standard basis = columns of the identity matrix General basis = columns of any invertible matrix Orthonormal basis = columns of any orthogonal matrix
If $A$ is invertible, then $A_{T}A$ is invertible (follows from invertible), symmetric (always), and positive definite (less clear, probably to do with eigenvalues/singular valuess.
Similar matrices
 Two similar matrices describe the same linear map i.e. their mapping are isomorphic and the isomorphism is the matrix $S$. $S$ also called a change of basis.
 Two square matrices $A,B$ are called similar if there is an invertible matrix $S$ s.t. $A=S_{β1}BS$.
 Property: Similar matrices have the same characteristic polynomial (i.e. same eigenvalues).
 Proof: $P_{A}(Ξ»)=det(AβΞ»I)=det(S_{β1}BSβΞ»I)=det(S_{β1}(BβΞ»I)S)$ (using multiplicative property of determinant) = $det(S_{β1})det(BβΞ»I)det(S)=det(I)det(BβΞ»I)=P_{B}(Ξ»)$.

Column and row space
Definition: The column space contains all the combinations of the columns
Useful decomposition if A is not full rank One can always decompose a matrix $A$ into column matrix and row matrix $A=CR$ where A is m x n, C is m x r, R is r x n.
For a 3x3 matrix with rank $r=2$, one can decompose into $A=CR$ where C is 3x2 (columns of C are a basis for the column space) and R is 2x3 (rows of R are a basis for the row space).
$R$ is the identity $I$ if $A$ is fullrank. Otherwise, it will be a block matrix $[I_{R},D]$ where the columns of $D$ describe how to get the columns $r+1, β¦, n by using the previous columns.
Orthogonality of nullspaces and column spaces
 Remember that Ax=0 means that a dot product between x and every row of A is equal to zero.
Motivation for least squares
Suppose A is tall and thin (m > n). The n columns are likely to be independent. But if b is not in the column space, Ax = b has no solution. The least squares method minimizes $βbβAxβ_{2}$ by solving $A_{T}Ax^=A_{T}b$ (i.e. project $Ax=b$ into row space ).
Orthogonal vectors
 $Q_{T}Q=Ξ±I$ , columns are orthogonal, (dot product between columns)
 orthonormal, the columns are also unit vectors, $Q_{T}Q=I$.
 If $Q$ is square, then $QQ_{T}=I$ also, and thus $Q_{T}=Q_{β1}$.
 $Q$ are rotation transforms. Indeed, $β£β£Qxβ£β£_{2}=β£β£xβ£β£_{2}$.
 For the eigenvalues of $Q$, $Qx=Ξ»x$ β $β£β£Qxβ£β£_{2}=β£Ξ»β£_{2}β£β£xβ£β£_{2}$ β $Ξ»=Β±1$
 For A fullrank, we can orthogonalize
 $A=QR$. Then the columns of $Q$ are orthornormal. $R$ is uppertriangular (by GramSchmidt iterative construction).
 Example for least squares:
 $m>n$, $m$ equations $Ax=b$, $n$ unknowns, minimize $β£β£bβAxβ£β£=β£β£eβ£β£_{2}$.
 Normal equations for the best $x^$: $A_{T}e=0$ or $A_{T}Ax^=A_{T}b$. or $x^=(A_{T}A)_{β1}A_{T}b$.
 If $A=QR$, then $R_{T}Q_{T}QRx^=R_{T}Q_{T}b$ which leads to $Rx^=Q_{T}b$ (R is much easier to invert)
 $A=QR$. Then the columns of $Q$ are orthornormal. $R$ is uppertriangular (by GramSchmidt iterative construction).
Eigenvalues and Eigenvectors
 An eigenvector $x$ with eigenvalue $Ξ»$ of matrix $A$ (only for square matrices) is $Ax=Ξ»x$
 To find the eigenvalues, we need to find the nullspace of $(AβΞ»I)$ i.e. $x$ s.t. $(AβΞ»I)x=0$
 There exists a nullspace iff $(AβΞ»I)$ is not invertible iff $det(AβΞ»I)=0$. This is the characteristic equation, and we solve for $Ξ»$.
 Property: The eigenvalues of a triangular matrix are the entries on its main diagonal.
If not symmetric
 $A=XΞX_{β1}$
 $A_{2},A_{3},...$ have the same eigenvectors as A. $A_{n}=XΞ_{n}X_{β1}$
Spectral theorem

Let $S$ be a symmetric matrix

Then $S=S_{T}$ has orthogonal eigenvectors $x_{T}y$ =0. (easy proof)

Let $Q$ be the eigenvectors of $S$, then $SQ=ΞQ$ and thus, $S=QΞQ_{β1}=QΞQ_{T}$. (spectral theorem). This is a sum of a rank one matrices formed by $S=QΞQ_{T}=Ξ»_{1}q_{1}q_{1}+...+Ξ»_{r}q_{n}q_{n}$
Singular values
 $A_{T}A$ is square, symmetric, nonnegative definite.
 With $S=A_{T}A$ (thus symmetric), this will lead to the singular values of A. SVD: $A=UΞ£V_{T}$ with $U_{T}U=I$ and $V_{T}V=I$.
 We have $A_{T}A$ = $VΞ£_{T}U_{T}UΞ£V_{T}=VΞ£_{2}V_{T}$.
 Indeed, the $v_{i}$ are eigenvectors of $A_{T}A$ and $A_{T}A$ is symmetric. $A_{T}Av_{i}=Ξ»_{i}=Ο_{i}v_{i}$. $V_{T}V=I$.
 We then have $AV=UΞ£$ and thus $u_{i}=Ο_{i}Av_{i}β$
 SVD
Trace
 $tr(A)$ is the divergence of the vector field created by $A$ ?
 divergence = rate of area change/area (of a local area around a point that evolves in a vector field).
 Usually, divergence is a quantity dependent on the position of the point within the vector field.
 But for the vector generated by matrix A, tr(A) is a constant.
Determinant

Property 1. $det(I)=1$.

Property 2. Exchange rows of $A$: reverse the sign of $det(A)$. Thus, for permutation matrices, $det(P)=Β±1$.

$βacβbdββ=adβbc$

Property 3a. For $tβR$, $βtacβtbdββ=tβacβbdββ$

Property 3b. For $a_{β²}andb_{β²}βR$, $βa+a_{β²}cβb+b_{β²}dββ=βacβbdββ+βa_{β²}cβb_{β²}dββ$

$det$ is a linear operator for each row, while keeping other rows the same.

Property 4. If thereβs 2 equal rows β $det=0$ (test for invertibility). Proof: exchange the two equal rows. The determinant must change sign but the matrix is the same β $det=0$
PCA
 Given data matrix $X(nΓp)$ with $n$ datapoints and $p$ features, we can project $X$ into smaller dimensional space, which we will optimally linearly combine the features, according to leastsquares, best lowrank approximation and Frobenius norm.
 $X^=XV$ where $V$ comes from the SVD decomposition $X=UΞ£V_{T}$
 or the eigenvector decomposition of the covariance matrix (donβt forget to demean the data matrix) $X_{T}X=VΞV_{T}$ because the covariance matrix is symmetric and positivedefinite.
Big picture
 elimination $A=LU$
 orthogonalization $A=QR$
 eigenvalues $A=SΞS_{β1}$
 Singular values $A=UΞ£V_{T}$