In this section we consider bases of \(\mathbb{R}^n\) that are associated with eigenvectors a square matrix \(A\) known as eigenbases (see Definition 2.1.14). The benefit to looking at such a basis instead of using the standard basis of \(\RR^n\) is that the eigenbasis will make products involving \(A\) much simpler through a process known as diagonalization.
SubsectionEigenbases
Recall that a basis of \(\RR^n\) is a linearly independent collection of \(n\) vectors in \(\RR^n\) (see also Definition 1.4.7). The defining characteristic of a basis is this: if \(\qty{\vb{b}_1,\ldots,\vb{b}_n}\) is a basis of \(\RR^n\) and if \(\vb{x}\in\RR^n\text{,}\) then there exists a unique set of scalars \(c_1,\ldots,c_n\) such that
This makes it possible to use the basis as a coordinate system in \(\RR^n\text{.}\) Therefore we may view an eigenbasis (Definition 2.1.14) of a matrix \(A\) as a particular coordinate system that is well-suited to calculations involving \(A\text{,}\) an idea which we make precise below.
Example2.4.1.Using an Eigenbasis to Compute a Matrix Product.
Let
\begin{equation*}
A = \mqty[-2 & 3 \\ -4 & 5]\text{ and }\vb{b} = \mqty[-2 \\ 10]\text{.}
\end{equation*}
Given that
\begin{equation*}
\vv_1 = \mqty[1\\1]\text{ and }\vv_2 = \mqty[3\\4]
\end{equation*}
are eigenvectors of \(A\) with corresponding eigenvalues \(\lambda_1 = 1\) and \(\lambda_2 = 2\text{,}\) find \(A^{100}\bb\text{.}\)
First, note that \(\qty{\vv_1,\vv_2}\) is a basis of \(\RR^2\text{.}\) Therefore it's an eigenbasis since each vector is an eigenvector of \(A\text{.}\)
Since \(\qty{\vv_1,\vv_2}\) is a basis of \(\RR^2\) then there exist scalars \(c_1,c_2\) such that \(\bb = c_1\vv_1 + c_2\vv_2\text{.}\) We can find these scalars by row reduction of the augmented matrix \(\mqty[\vv_1 & \vv_2 & \bb]\text{.}\) This reduces to
Example 2.4.1 shows that the existence of an eigenbasis can greatly simplify certain computations. Unfortunately, not every matrix has a corresponding eigenbasis (see Definition 2.1.12). However, the following theorem gives a simple condition that can be used to guarantee the existence of an eigenbasis.
Theorem2.4.2.Distinct Eigenvalues and Eigenbases.
Let \(A\) be an \(n\times n\) matrix and suppose that \(A\) has \(n\) distinct eigenvalues (equivalently, no eigenvalue is repeated). Then \(A\) has an eigenbasis.
The proof of this statement follows from the fact that eigenvectors corresponding to distinct eigenvalues must be linearly independent, so we'll prove this first. So let \(\qty{\lambda_i}_{i=1}^n\) denote the eigenvalues of \(A\) and let \(\xx_i\) denote an eigenvector of \(A\) corresponding to \(\lambda_i\text{.}\) We'll show that \(\qty{\xx_i}_{i=1}^n\) is a linearly independent set. As this will then be a set of \(n\) linearly independent vectors in \(\RR^n\text{,}\) this is enough to show that it's a basis as well.
Suppose that we have scalars \(\qty{c_i}\) such that \(\sum_{i=1}^n c_i\xx_i = \vb{0}\text{.}\) We need to show that \(c_1=\ldots=c_n=0\text{.}\) Now, since each \(\xx_i\) is an eigenvector of \(A\) with eigenvalue \(\lambda_i\text{,}\) it follows that
where \(e_i = (\lambda_i-\lambda_2)d_i = (\lambda_i-\lambda_2)(\lambda_i-\lambda_1)c_i\text{.}\) Continuing this process, we are left with the equation
since we can safely disregard \(c_n\xx_n\text{.}\) But there's nothing stopping us from applying the previous trick to this new sum, which will eventually show that \(c_{n-1} = 0\text{,}\) and then \(c_{n-2} = 0\text{,}\) and so on. Therefore
and the set \(\qty{\xx_i}_{i=1}^n\) must be linearly independent, which was what we needed to prove.
SubsectionDiagonalization
Now we'll take a closer look at just what we did in Example 2.4.1 to compute \(A^{100}\bb\) (or, more simply, \(A\bb\)). First, we found \(c_1,c_2\) such that \(\bb = c_1\vv_1 + c_2\vv_2\text{.}\) If we let \(P = \smqty[\vv_1&\vv_2]\) then this is equivalent to solving the matrix equation
We therefore view \(P^{-1}\bb\) as the coordinates of \(\bb\) with respect to the eigenbasis \(\qty{\vv_1,\vv_2}\text{.}\) Once we had the coordinates of \(\bb\) with respect to the eigenbasis, then finding \(A\bb\) amounted to multiplying \(c_1\) and \(c_2\) by \(\lambda_1\) and \(\lambda_2\) respectively. In matrix notation, this is equivalent to computing
\begin{equation*}
D\mqty[c_1 \\ c_2]\text{ where } D = \mqty[\lambda_1 & 0 \\ 0 & \lambda_2]\text{.}
\end{equation*}
Finally, we used the weights \(\lambda_1 c_1, \lambda_2 c_2\) to reconstruct \(A\bb\) from \(\vv_1,\vv_2\text{:}\)
Since this equation is true for any \(\bb\in\RR^2\text{,}\) it follows that \(A = PDP^{-1}\text{.}\)
The process outlined above and in Example 2.4.1 is known as diagonalization. This is only possible when \(A\) has an eigenbasis to work with, but can lead to vastly more efficient computations involving \(A\) by making use of the formula
since raising diagonal matrices to a power is much simpler than raising general matrices to a power. Finding \(A^{100}\bb\) in Example 2.4.1 was equivalent to the following computations:
such that \(P\) is invertible and \(A = PDP^{-1}\text{.}\)
We want to show that \(A\) must have an eigenbasis. We'll do this by showing that each column \(\vv_i\) of \(P\) must be an eigenvector of \(A\) with eigenvalue \(\lambda_i\text{.}\) Now, since \(\vv_i = 0\vv_1 + 0\vv_2 + \cdots + 1\vv_i + \cdots + 0\vv_n\) it follows that \(P^{-1}\vv_i\) must be the vector with a single \(1\) in the \(i^\th\) entry and \(0\)s elsewhere. Therefore
and so \(\vv_i\) must be an eigenvector of \(A\) with eigenvalue \(\lambda_i\) for each \(i\) from \(1\) to \(n\text{.}\) Since each column of \(P\) is an eigenvector of \(A\) and since \(P\) is invertible, it follows that the columns must be a basis and, hence, an eigenbasis for \(A\text{.}\)
Now we prove the reverse direction. So assume that \(A\) has an eigenbasis \(\qty{\vv_i}_{i=1}^n\) with corresponding eigenvalues \(\qty{\lambda_i}_{i=1}^n\text{.}\) We will show that \(A\) is diagonalized by
Let \(\xx\in\RR^n\text{.}\) Then we can find the coordinates of \(\xx\) with respect to the eigenbasis \(\qty{\vv_i}\) by computing \(P^{-1}\xx\text{.}\) It follows that applying \(A\) to \(\xx\) is equivalent to applying \(PD\) to \(P^{-1}\xx\text{:}\)\(DP^{-1}\xx\) will multiply the coordinates of \(\xx\) with respect to the eigenbasis by the corresponding eigenvalues, and \(PDP^{-1}\xx\) reconstructs \(A\xx\) using the weights \(DP^{-1}\xx\) to form a linear combination of the columns of \(P\text{.}\) Therefore \(A\xx=PDP^{-1}\xx\) and so \(A = PDP^{-1}\text{.}\)
Example2.4.5.Diagonalizing a Matrix.
Find matrices \(P\) and \(D\) (if possible) that diagonalize
By Theorem 2.4.4, \(A\) is diagonalizble if and only if \(A\) has an eigenbasis. Using Octave we quickly see that the eigenvalues of \(A\) are \(\lambda_1 = 0, \lambda_2 = \lambda_3 = 3\text{.}\) Now we need to \(\smqty[A - 0I & \vb{0}]\) and \(\smqty[A - 3I & \vb{0}]\text{:}\)
SubsectionDiagonalizations of Symmetric and Hermitian Matrices
Symmetric matrices have particularly nice diagonalizations. First, their eigenvalues must be limited to real numbers.
Theorem2.4.6.Eigenvalues of Symmetric Matrices.
Let \(A\) be a symmetric matrix with real entries and let \(\lambda\) be an eigenvalue of \(A\text{.}\) Then \(\lambda\) must be a real number.
The proof of Theorem 2.4.6 is relatively simple but requires us to expand our terminology and notation a bit. First, we redefine the inner product so that it also applies to complex vectors in \(\CC^n\text{.}\)
Definition2.4.7.Complex Inner Product.
Let \(\xx,\yy\in\CC^n\text{.}\) The (complex) inner product of \(\xx\) and \(\yy\) is the (complex) scalar
where \(\yy^*\) denotes the conjugate transpose of \(\yy\text{.}\)
Now we expand our definition of symmetric matrices to include the complex case as well.
Definition2.4.8.Hermitian Matrices.
Let \(A\) denote a square matrix. Then \(A\) is Hermitian if \(A = A^{*}\text{.}\)
Definition 2.4.8 generalizes the definition of a real symmetric matrix since \(A^* = A^T\) if \(A\) only has real entries. The conjugate transpose and inner product also share many properties, including the following.
Proposition2.4.9.Conjugate Transpose and Inner Product.
Let \(\xx,\yy\in\CC^n\) and let \(A\) be a square matrix with complex entries. Then
Since \(\dotprod{\xx,\xx} = \norm{\xx}^2\neq0\text{,}\) it follows that \(\lambda = \overline{\lambda}\text{.}\) Therefore \(\lambda\in\RR\text{.}\)
The eigenvectors of a Hermitian matrix also have nice geometric properties.
Theorem2.4.11.Eigenvectors of a Hermitian Matrix.
Let \(A\) be a Hermitian matrix. Suppose that \(\xx\) and \(\yy\) are eigenvectors for two distinct eigenvalues of \(A\text{,}\) say \(\lambda_i\) and \(\lambda_j\text{.}\) Then \(\xx\) and \(\yy\) are orthogonal.
or just \((\lambda_i-\lambda_j)\dotprod{\xx,\yy} = 0\text{.}\) Since \(\lambda_i\neq \lambda_j\text{,}\) it follows that \(\dotprod{\xx,\yy} = 0\text{.}\)
Theorem 2.4.11 leads to an extremely useful fact about real symmetric matrices: they can always be orthogonally diagonalized. This means that we can choose an eigenbasis that is also an orthonormal basis. As an example, consider the eigenbasis found in Example 2.4.5, replicated here as columns of the matrix \(P\text{:}\)
Note that the first column is orthogonal to the other two which is guaranteed by Theorem 2.4.11 since these vectors correspond to different eigenvalues.
Although the last two columns are not orthogonal, they can be orthogonalized. One way of doing so is to replace them with the vectors
Both vectors are still eigenvectors with eigenvalue \(0\) and the two of them, together with \(\smqty[1\\1\\1]\text{,}\) still form an eigenbasis of \(\RR^3\text{.}\) However, this new basis is orthogonal.
If we go one step further and normalize the vectors in this eigenbasis we then get the orthonormal eigenbasis
\begin{equation*}
A = UDU^{-1} = UDU^T\text{.}
\end{equation*}
Such a diagonalization is always possible for real symmetric and Hermitian matrices, a result known as the Spectral Theorem.
Theorem2.4.12.Spectral Theorem.
Let \(A\) be a (real) symmetric matrix. Then there exists an orthogonal matrix \(U\) and a diagonal matrix \(D\) such that
\begin{equation*}
A = UDU^T\text{.}
\end{equation*}
Equivalently, if \(\qty{\uu_i}_{i=1}^n\) is an orthonormal eigenbasis of \(A\) with corresponding eigenvalues \(\qty{\lambda_i}_{i=1}^n\text{,}\) then
\begin{equation*}
A = \sum_{i=1}^n \lambda_i \uu_i\uu_i^T\text{.}
\end{equation*}
Example2.4.13.A Signal Processing Application.
In the mathematics of signal processing, signals are often represented as particular vectors in \(\RR^n\text{.}\) The problem of signal transmission then reduces to sending a list of numbers \(c_1,\ldots, c_m\) such that the receiver can use these to reconstruct a signal \(\xx\text{.}\) Such a scheme can be implemented by choosing an appropriate \(n\times m\) matrix \(F = \smqty[\vb{f}_1 & \ldots & \vb{f}_m]\) and then computing and transmitting the coordinates of
The important properties of the columns of \(F\) can then be encoded in the Gram matrix\(G = F^TF = \mqty[\dotprod{\vb{f}_i,\vb{f}_j}]\text{.}\) Find a \(3\times 4\) matrix \(F\) for which that Gram matrix is
This problem can be solved by diagonalizing \(G\text{:}\) if \(G = UDU^T\) then we can define \(F\) by using the rows of \(U\sqrt{D}\text{.}\) So we'll diagonalize \(G\) with the help of the Octave cell below. Doing so, we see that the eigenvalues of \(G\) are in fact nonnegative and so \(U\sqrt{D}\) will only contain real values. Furthermore, \(U\) only has \(3\) nonzero columns since \(G\) only has three nonzero eigenvalues. By removing the zero column from \(U\sqrt{D}\text{,}\) we obtain a \(4\times3\) matrix which we define to be \(F^T\text{.}\) Then \(F^TF = G\text{.}\)
Symmetric matrices are also useful in analyzing quadratic forms, which are expressions of the form
where the \(x_i\) are variables and the \(c_{ij}\) are the coefficients. Such an expression can be rewritten as \(\xx^T A\xx\) where \(A\) is a symmetric matrix determined from the coefficients.
If \(A\) denotes a square matrix, and if \(f(x)\) is analytic at \(a=0\text{,}\) then we can try to make sense of the expression \(f(A)\) by using the power series for \(f(x)\text{:}\)
This can be verified using the command expm in Octave as below.
Example2.4.16.Function of a Diagonal Matrix.
Let \(f(x)\) be a function analytic at \(0\text{.}\) Let \(D\) be a diagonal matrix whose diagonal entries are within the interval of convergence of the power series for \(f(x)\) centered at \(0\text{.}\) Find \(f(D)\text{.}\)
The last example shows that if \(f(x)\) is analytic at \(0\) then it is relatively straightforward to find \(f(D)\text{,}\) assuming that the diagonal entries of \(D\) are within the interval of convergence for the series representation of \(f(x)\) at \(0\text{.}\) Therefore we can find functions of diagonalizable matrices as well.
Theorem2.4.17.Analytic Functions of Diagonalizable Matrices.
Let \(f(x)\) be a function that is analytic at \(0\) and whose series representation at \(0\) has interval of convergence \(I\text{.}\) Let \(A\) be a diagonalizable matrix diagonalized by \(P\) and \(D\) with eigenvalues contained in \(I\text{.}\) Then
Theorem 2.4.17 allows for straightforward computations of matrix exponentials of diagonalizable matrices. This is useful in differential equations when solving linear systems of ODEs (see here 3 ).
Theorem2.4.18.Exponential Solutions of Linear Systems of ODEs.
Let \(A\) be a (constant) square matrix and consider the first-order system \(\xx^\prime = A\xx\) with initial condition \(\xx(0) = \xx_0\text{.}\) Then the solution of this initial value problem is
then the system can be written as the matrix ODE \(\xx^\prime = A\xx\) with initial condition \(\xx_0 = \smqty[2 \\ 1]\text{.}\)\(A\) is symmetric and can be orthogonally diagonalized by
\begin{equation*}
U = \mqty[\frac{\sqrt{2}}{2} & \frac{\sqrt{2}}{2} \\ \frac{\sqrt{2}}{2} & -\frac{\sqrt{2}}{2}]\text{ and } D = \mqty[-1 & 0 \\ 0 & 7]
\end{equation*}
as seen in the code cell below this example. Therefore \(A = UDU^T\text{,}\)\(e^{At} = Ue^{Dt}U^T\) and the solution of the system must be