Skip to main content
Logo image

Section 2.4 Diagonalization of Matrices

In this section we consider bases of \(\mathbb{R}^n\) that are associated with eigenvectors a square matrix \(A\) known as eigenbases (see Definition 2.1.14). The benefit to looking at such a basis instead of using the standard basis of \(\RR^n\) is that the eigenbasis will make products involving \(A\) much simpler through a process known as diagonalization.

Subsection Eigenbases

Recall that a basis of \(\RR^n\) is a linearly independent collection of \(n\) vectors in \(\RR^n\) (see also Definition 1.4.7). The defining characteristic of a basis is this: if \(\qty{\vb{b}_1,\ldots,\vb{b}_n}\) is a basis of \(\RR^n\) and if \(\vb{x}\in\RR^n\text{,}\) then there exists a unique set of scalars \(c_1,\ldots,c_n\) such that
\begin{equation*} \xx = \sum_{i=1}^{n}c_i\vb{b}_i\text{.} \end{equation*}
This makes it possible to use the basis as a coordinate system in \(\RR^n\text{.}\) Therefore we may view an eigenbasis (Definition 2.1.14) of a matrix \(A\) as a particular coordinate system that is well-suited to calculations involving \(A\text{,}\) an idea which we make precise below.

Example 2.4.1. Using an Eigenbasis to Compute a Matrix Product.

Let
\begin{equation*} A = \mqty[-2 & 3 \\ -4 & 5]\text{ and }\vb{b} = \mqty[-2 \\ 10]\text{.} \end{equation*}
Given that
\begin{equation*} \vv_1 = \mqty[1\\1]\text{ and }\vv_2 = \mqty[3\\4] \end{equation*}
are eigenvectors of \(A\) with corresponding eigenvalues \(\lambda_1 = 1\) and \(\lambda_2 = 2\text{,}\) find \(A^{100}\bb\text{.}\)
Solution.
First, note that \(\qty{\vv_1,\vv_2}\) is a basis of \(\RR^2\text{.}\) Therefore it's an eigenbasis since each vector is an eigenvector of \(A\text{.}\)
Since \(\qty{\vv_1,\vv_2}\) is a basis of \(\RR^2\) then there exist scalars \(c_1,c_2\) such that \(\bb = c_1\vv_1 + c_2\vv_2\text{.}\) We can find these scalars by row reduction of the augmented matrix \(\mqty[\vv_1 & \vv_2 & \bb]\text{.}\) This reduces to
\begin{equation*} \mqty[1 & 0 & -38 \\ 0 & 1 & 12]\text{,} \end{equation*}
and so \(c_1 = -38, c_2 = 12\) and
\begin{equation*} \bb = -38\vv_1 + 12\vv_2\text{.} \end{equation*}
Now that we've written \(\bb\) in terms of the eigenbasis \(\qty{\vv_1,\vv_2}\text{,}\) the computation of \(A^{100}\bb\) becomes almost trivial:
\begin{align*} A^{100}\bb & = A^{100}(-38\vv_1 + 12\vv_2) \\ & = -38 A^{100}\vv_1 + 12 A^{100}\vv_2 \\ & = -38 \vv_1 + 12\cdot 2^{100}\vv_2 \\ & = \mqty[-38 + 36(2^{100}) \\ -38 + 48(2^{100})] \text{.} \end{align*}
Example 2.4.1 shows that the existence of an eigenbasis can greatly simplify certain computations. Unfortunately, not every matrix has a corresponding eigenbasis (see Definition 2.1.12). However, the following theorem gives a simple condition that can be used to guarantee the existence of an eigenbasis.
The proof of this statement follows from the fact that eigenvectors corresponding to distinct eigenvalues must be linearly independent, so we'll prove this first. So let \(\qty{\lambda_i}_{i=1}^n\) denote the eigenvalues of \(A\) and let \(\xx_i\) denote an eigenvector of \(A\) corresponding to \(\lambda_i\text{.}\) We'll show that \(\qty{\xx_i}_{i=1}^n\) is a linearly independent set. As this will then be a set of \(n\) linearly independent vectors in \(\RR^n\text{,}\) this is enough to show that it's a basis as well.
Suppose that we have scalars \(\qty{c_i}\) such that \(\sum_{i=1}^n c_i\xx_i = \vb{0}\text{.}\) We need to show that \(c_1=\ldots=c_n=0\text{.}\) Now, since each \(\xx_i\) is an eigenvector of \(A\) with eigenvalue \(\lambda_i\text{,}\) it follows that
\begin{equation*} A\qty(\sum_{i=1}^n c_i\xx_i) = \sum_{i=1}^n c_i A\xx_i = \sum_{i=1}^n c_i\lambda_i\xx_i\text{.} \end{equation*}
Since \(A\vb{0} = \vb{0}\) as well, we have
\begin{equation*} \sum_{i=1}^n c_i\lambda_i\xx_i = \vb{0}\text{.} \end{equation*}
We can also multiply the original equation \(\sum_{i=1}^{n}c_i\xx_i = \vb{0}\) by \(\lambda_1\) to get
\begin{equation*} \sum_{i=1}^{n}\lambda_1 c_i\xx_i = \vb{0}\text{.} \end{equation*}
Subtracting the previous two equations allows us to write
\begin{equation*} \vb{0} = \sum_{i=1}^n c_i (\lambda_i - \lambda_1)\xx_i = \sum_{i=2}^n c_i(\lambda_i-\lambda_1)\xx_i = \sum_{i=2}^{n}d_i \xx_i \end{equation*}
where \(d_i = c_i(\lambda_i-\lambda_1)\text{.}\) Now we can repeat the above process and write
\begin{equation*} \sum_{i=2}^n d_i\lambda_i\xx_i = \vb{0} = \sum_{i=2}^n d_i\lambda_2\xx_i \end{equation*}
which gives (after subtracting)
\begin{equation*} \vb{0} = \sum_{i=2}^n d_i (\lambda_i-\lambda_2)\xx_i = \sum_{i=3}^n d_i (\lambda_i-\lambda_2)\xx_i = \sum_{i=3}^n e_i\xx_i \end{equation*}
where \(e_i = (\lambda_i-\lambda_2)d_i = (\lambda_i-\lambda_2)(\lambda_i-\lambda_1)c_i\text{.}\) Continuing this process, we are left with the equation
\begin{equation*} \vb{0} = (\lambda_n-\lambda_{n-1})(\lambda_n-\lambda_{n-2})\cdots(\lambda_n-\lambda_1)c_n\xx_n\text{,} \end{equation*}
which forces \(c_n = 0\text{.}\)
Now we are left with
\begin{equation*} \sum_{i=1}^{n-1}c_{i}\xx_i = \vb{0} \end{equation*}
since we can safely disregard \(c_n\xx_n\text{.}\) But there's nothing stopping us from applying the previous trick to this new sum, which will eventually show that \(c_{n-1} = 0\text{,}\) and then \(c_{n-2} = 0\text{,}\) and so on. Therefore
\begin{equation*} c_1=\ldots = c_n = 0 \end{equation*}
and the set \(\qty{\xx_i}_{i=1}^n\) must be linearly independent, which was what we needed to prove.

Subsection Diagonalization

Now we'll take a closer look at just what we did in Example 2.4.1 to compute \(A^{100}\bb\) (or, more simply, \(A\bb\)). First, we found \(c_1,c_2\) such that \(\bb = c_1\vv_1 + c_2\vv_2\text{.}\) If we let \(P = \smqty[\vv_1&\vv_2]\) then this is equivalent to solving the matrix equation
\begin{equation*} P\mqty[c_1 \\ c_2] = \bb\implies \mqty[c_1 \\ c_2] = P^{-1}\bb\text{.} \end{equation*}
We therefore view \(P^{-1}\bb\) as the coordinates of \(\bb\) with respect to the eigenbasis \(\qty{\vv_1,\vv_2}\text{.}\) Once we had the coordinates of \(\bb\) with respect to the eigenbasis, then finding \(A\bb\) amounted to multiplying \(c_1\) and \(c_2\) by \(\lambda_1\) and \(\lambda_2\) respectively. In matrix notation, this is equivalent to computing
\begin{equation*} D\mqty[c_1 \\ c_2]\text{ where } D = \mqty[\lambda_1 & 0 \\ 0 & \lambda_2]\text{.} \end{equation*}
Finally, we used the weights \(\lambda_1 c_1, \lambda_2 c_2\) to reconstruct \(A\bb\) from \(\vv_1,\vv_2\text{:}\)
\begin{equation*} A\bb = \lambda_1 c_1\vv_1 + \lambda_2 c_2\vv_2\text{.} \end{equation*}
Therefore
\begin{align*} A\bb & = \lambda_1 c_1\vv_1 + \lambda_2 c_2\vv_2\\ & = P\mqty[\lambda_1c_1 \\ \lambda_2c_2] \\ & = PD\mqty[c_1 \\ c_2] \\ & = PDP^{-1}\bb \text{.} \end{align*}
Since this equation is true for any \(\bb\in\RR^2\text{,}\) it follows that \(A = PDP^{-1}\text{.}\)
The process outlined above and in Example 2.4.1 is known as diagonalization. This is only possible when \(A\) has an eigenbasis to work with, but can lead to vastly more efficient computations involving \(A\) by making use of the formula
\begin{equation*} A^n = PD^nP^{-1} \end{equation*}
since raising diagonal matrices to a power is much simpler than raising general matrices to a power. Finding \(A^{100}\bb\) in Example 2.4.1 was equivalent to the following computations:
\begin{align*} A^{100}\bb & = PD^{100}P^{-1}\bb \\ & = \mqty[1 & 3 \\ 1 & 4]\mqty[1^{100} & 0 \\ 0 & 2^{100}]\mqty[4 & -3 \\ -1 & 1]\mqty[-2 \\ 10] \\ & = \mqty[1 & 3 \\ 1 & 4]\mqty[1 & 0 \\ 0 & 2^{100}]\mqty[-38 \\ 12] \\ & = \mqty[1 & 3 \\ 1 & 4]\mqty[-38 \\ 12(2^{100})] \\ & = \mqty[-38 + 36(2^{100}) \\ -38 + 48(2^{100})] \text{.} \end{align*}

Definition 2.4.3. Diagonalization.

A matrix \(A\) is diagonalizable if there exists a matrix \(P\) and a diagonal matrix \(D\) such that
\begin{equation*} A = PDP^{-1}\text{.} \end{equation*}
As mentioned above, a matrix \(A\) is diagonalizable if and only if \(A\) has an eigenbasis.
First, assume that \(A\) is diagonalizble. Then there exist matrices \(P\) and \(D\text{,}\) say
\begin{equation*} P = \mqty[\vv_1&\ldots&\vv_n]\text{ and }D = \mqty[\lambda_1 & 0 & \ldots & 0 \\ 0 & \lambda_2 & \ldots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \ldots & \lambda_n] \end{equation*}
such that \(P\) is invertible and \(A = PDP^{-1}\text{.}\)
We want to show that \(A\) must have an eigenbasis. We'll do this by showing that each column \(\vv_i\) of \(P\) must be an eigenvector of \(A\) with eigenvalue \(\lambda_i\text{.}\) Now, since \(\vv_i = 0\vv_1 + 0\vv_2 + \cdots + 1\vv_i + \cdots + 0\vv_n\) it follows that \(P^{-1}\vv_i\) must be the vector with a single \(1\) in the \(i^\th\) entry and \(0\)s elsewhere. Therefore
\begin{align*} A\vv_i & = PDP^{-1}\vv_i \\ & = PD\mqty[0\\0\\\vdots\\1\\\vdots\\0] \\ & = P\mqty[0\\0\\\vdots\\\lambda_i\\\vdots\\0] \\ & = \lambda_i\vv_i \end{align*}
and so \(\vv_i\) must be an eigenvector of \(A\) with eigenvalue \(\lambda_i\) for each \(i\) from \(1\) to \(n\text{.}\) Since each column of \(P\) is an eigenvector of \(A\) and since \(P\) is invertible, it follows that the columns must be a basis and, hence, an eigenbasis for \(A\text{.}\)
Now we prove the reverse direction. So assume that \(A\) has an eigenbasis \(\qty{\vv_i}_{i=1}^n\) with corresponding eigenvalues \(\qty{\lambda_i}_{i=1}^n\text{.}\) We will show that \(A\) is diagonalized by
\begin{equation*} P = \mqty[\vv_1&\ldots&\vv_n]\text{ and }D = \mqty[\lambda_1 & 0 & \ldots & 0 \\ 0 & \lambda_2 & \ldots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \ldots & \lambda_n]\text{.} \end{equation*}
Let \(\xx\in\RR^n\text{.}\) Then we can find the coordinates of \(\xx\) with respect to the eigenbasis \(\qty{\vv_i}\) by computing \(P^{-1}\xx\text{.}\) It follows that applying \(A\) to \(\xx\) is equivalent to applying \(PD\) to \(P^{-1}\xx\text{:}\) \(DP^{-1}\xx\) will multiply the coordinates of \(\xx\) with respect to the eigenbasis by the corresponding eigenvalues, and \(PDP^{-1}\xx\) reconstructs \(A\xx\) using the weights \(DP^{-1}\xx\) to form a linear combination of the columns of \(P\text{.}\) Therefore \(A\xx=PDP^{-1}\xx\) and so \(A = PDP^{-1}\text{.}\)

Example 2.4.5. Diagonalizing a Matrix.

Find matrices \(P\) and \(D\) (if possible) that diagonalize
\begin{equation*} A = \mqty[2 & -1 & -1 \\ -1 & 2 & -1 \\ -1 & -1 & 2]\text{.} \end{equation*}
Solution.
By Theorem 2.4.4, \(A\) is diagonalizble if and only if \(A\) has an eigenbasis. Using Octave we quickly see that the eigenvalues of \(A\) are \(\lambda_1 = 0, \lambda_2 = \lambda_3 = 3\text{.}\) Now we need to \(\smqty[A - 0I & \vb{0}]\) and \(\smqty[A - 3I & \vb{0}]\text{:}\)
\begin{equation*} \mqty[A&\vb{0}] \sim \mqty[1 & 0 & -1 & 0 \\ 0 & 1 & -1 & 0 \\ 0 & 0 & 0 & 0]\text{ and }\mqty[A-3I & \vb{0}] \sim \mqty[1 & 1 & 1 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0]\text{.} \end{equation*}
Therefore
\begin{equation*} \nul(A) = \spn{\qty{\mqty[1\\1\\1]}}\text{ and }\nul(A-3I) = \spn{\left\{\mqty[-1\\1\\0],\mqty[-1\\0\\1]\right\}}\text{.} \end{equation*}
Now we have everything we need to diagonalize \(A\text{.}\) Define
\begin{equation*} P = \mqty[1 & -1 & -1 \\ 1 & 1 & 0 \\ 1 & 0 & 1]\text{ and }D = \mqty[0 & 0 & 0 \\ 0 & 3 & 0 \\ 0 & 0 & 3]\text{.} \end{equation*}
Then \(A = PDP^{-1}\text{.}\)

Subsection Diagonalizations of Symmetric and Hermitian Matrices

Symmetric matrices have particularly nice diagonalizations. First, their eigenvalues must be limited to real numbers.
The proof of Theorem 2.4.6 is relatively simple but requires us to expand our terminology and notation a bit. First, we redefine the inner product so that it also applies to complex vectors in \(\CC^n\text{.}\)

Definition 2.4.7. Complex Inner Product.

Let \(\xx,\yy\in\CC^n\text{.}\) The (complex) inner product of \(\xx\) and \(\yy\) is the (complex) scalar
\begin{equation*} \langle\xx,\yy\rangle = \yy^{*}\xx \end{equation*}
where \(\yy^*\) denotes the conjugate transpose of \(\yy\text{.}\)
Now we expand our definition of symmetric matrices to include the complex case as well.

Definition 2.4.8. Hermitian Matrices.

Let \(A\) denote a square matrix. Then \(A\) is Hermitian if \(A = A^{*}\text{.}\)
Definition 2.4.8 generalizes the definition of a real symmetric matrix since \(A^* = A^T\) if \(A\) only has real entries. The conjugate transpose and inner product also share many properties, including the following.
Now we can prove that real symmetric matrices, and more generally Hermitian matrices, always have real eigenvalues.
Let \(\xx\) denote an eigenvector of \(A\) with eigenvalue \(\lambda\text{.}\) Then
\begin{align*} \lambda\dotprod{\xx,\xx} & = \dotprod{\lambda\xx,\xx} \\ & = \dotprod{A\xx,\xx} \\ & = \dotprod{\xx, A^*\xx} \\ & = \dotprod{\xx, A\xx} \\ & = \dotprod{\xx, \lambda \xx} \\ & = \overline{\lambda}\dotprod{\xx, \xx} \end{align*}
Since \(\dotprod{\xx,\xx} = \norm{\xx}^2\neq0\text{,}\) it follows that \(\lambda = \overline{\lambda}\text{.}\) Therefore \(\lambda\in\RR\text{.}\)
The eigenvectors of a Hermitian matrix also have nice geometric properties.
We need to show that \(\dotprod{\xx,\yy} = 0\text{.}\) Now,
\begin{equation*} \lambda_i\dotprod{\xx,\yy} = \dotprod{A\xx,\yy} = \dotprod{\xx,A\yy} = \lambda_j\dotprod{\xx,\yy} \end{equation*}
or just \((\lambda_i-\lambda_j)\dotprod{\xx,\yy} = 0\text{.}\) Since \(\lambda_i\neq \lambda_j\text{,}\) it follows that \(\dotprod{\xx,\yy} = 0\text{.}\)
Theorem 2.4.11 leads to an extremely useful fact about real symmetric matrices: they can always be orthogonally diagonalized. This means that we can choose an eigenbasis that is also an orthonormal basis. As an example, consider the eigenbasis found in Example 2.4.5, replicated here as columns of the matrix \(P\text{:}\)
\begin{equation*} P = \mqty[1 & -1 & -1 \\ 1 & 1 & 0 \\ 1 & 0 & 1]\text{.} \end{equation*}
Note that the first column is orthogonal to the other two which is guaranteed by Theorem 2.4.11 since these vectors correspond to different eigenvalues.
Although the last two columns are not orthogonal, they can be orthogonalized. One way of doing so is to replace them with the vectors
\begin{equation*} \mqty[-1\\1\\0]\text{ and }\mqty[-\frac{1}{2} \\ -\frac{1}{2} \\ 1] = \mqty[-1\\0\\1]-\frac{1}{2}\mqty[-1\\1\\0]\text{.} \end{equation*}
Both vectors are still eigenvectors with eigenvalue \(0\) and the two of them, together with \(\smqty[1\\1\\1]\text{,}\) still form an eigenbasis of \(\RR^3\text{.}\) However, this new basis is orthogonal.
If we go one step further and normalize the vectors in this eigenbasis we then get the orthonormal eigenbasis
\begin{equation*} \qty{\mqty[\frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{3}}], \mqty[-\frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} \\ 0], \mqty[-\frac{1}{\sqrt{6}} \\ -\frac{1}{\sqrt{6}} \\ \sqrt{\frac{2}{3}}]}\text{.} \end{equation*}
This provides the orthogonal diagonalization
\begin{equation*} U = \mqty[\frac{1}{\sqrt{3}} & -\frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{6}} \\ \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{6}} \\ \frac{1}{\sqrt{3}} & 0 & \sqrt{\frac{2}{3}}]\text{ and }D = \mqty[0 & 0 & 0 \\ 0 & 3 & 0 \\ 0 & 0 & 3] \end{equation*}
which gives
\begin{equation*} A = UDU^{-1} = UDU^T\text{.} \end{equation*}
Such a diagonalization is always possible for real symmetric and Hermitian matrices, a result known as the Spectral Theorem.

Example 2.4.13. A Signal Processing Application.

In the mathematics of signal processing, signals are often represented as particular vectors in \(\RR^n\text{.}\) The problem of signal transmission then reduces to sending a list of numbers \(c_1,\ldots, c_m\) such that the receiver can use these to reconstruct a signal \(\xx\text{.}\) Such a scheme can be implemented by choosing an appropriate \(n\times m\) matrix \(F = \smqty[\vb{f}_1 & \ldots & \vb{f}_m]\) and then computing and transmitting the coordinates of
\begin{equation*} F^T\xx = \mqty[\langle \xx,\vb{f}_i\rangle]_{1\leq i\leq m}\text{.} \end{equation*}
The important properties of the columns of \(F\) can then be encoded in the Gram matrix \(G = F^TF = \mqty[\dotprod{\vb{f}_i,\vb{f}_j}]\text{.}\) Find a \(3\times 4\) matrix \(F\) for which that Gram matrix is
\begin{equation*} G = \mqty[1 & -\frac{1}{3} & -\frac{1}{3} & -\frac{1}{3} \\ -\frac{1}{3} & 1 & -\frac{1}{3} & -\frac{1}{3} \\ -\frac{1}{3} & -\frac{1}{3} & 1 & -\frac{1}{3} \\ -\frac{1}{3} & -\frac{1}{3} & -\frac{1}{3} & 1]\text{.} \end{equation*}
Solution.
This problem can be solved by diagonalizing \(G\text{:}\) if \(G = UDU^T\) then we can define \(F\) by using the rows of \(U\sqrt{D}\text{.}\) So we'll diagonalize \(G\) with the help of the Octave cell below. Doing so, we see that the eigenvalues of \(G\) are in fact nonnegative and so \(U\sqrt{D}\) will only contain real values. Furthermore, \(U\) only has \(3\) nonzero columns since \(G\) only has three nonzero eigenvalues. By removing the zero column from \(U\sqrt{D}\text{,}\) we obtain a \(4\times3\) matrix which we define to be \(F^T\text{.}\) Then \(F^TF = G\text{.}\)
Symmetric matrices are also useful in analyzing quadratic forms, which are expressions of the form
\begin{equation*} \sum_{i,j}c_{ij} x_ix_j \end{equation*}
where the \(x_i\) are variables and the \(c_{ij}\) are the coefficients. Such an expression can be rewritten as \(\xx^T A\xx\) where \(A\) is a symmetric matrix determined from the coefficients.

Example 2.4.14. Analyzing a Quadratic Form.

Describe the curve given by
\begin{equation*} 9x^2 + 6xy + y^2 = 10\text{.} \end{equation*}
Solution.
The left hand side is a quadratic form with variables \(x_1 = x,x_2 = y\) and coefficients
\begin{equation*} c_{11} = 9, c_{12} = c_{21} = \frac{6}{2}=3\text{ and }c_{22} = 1\text{.} \end{equation*}
Therefore we can write \(9x^2 + 6xy + y^2 = \xx^T A\xx\) where
\begin{equation*} \xx = \mqty[x\\y]\text{ and }A = \mqty[9 & 3 \\ 3 & 1]\text{.} \end{equation*}
To help us describe the curve we will “disentangle” the variables \(x\) and \(y\) by diagonalizing \(A\text{.}\)
Since \(A\) is symmetric, we know that \(A\) can be orthogonally diagonalized. One such diagonalization is given by \(A = UDU^T\) where
\begin{equation*} U = \mqty[\frac{1}{\sqrt{10}} & \frac{3}{\sqrt{10}} \\ -\frac{3}{\sqrt{10}} & \frac{1}{\sqrt{10}}]\text{ and } D = \mqty[0 & 0 \\ 0 & 10]\text{.} \end{equation*}
The quadratic form \(\xx^T A\xx\) then becomes
\begin{equation*} \xx^T UDU^T\xx = (U^T\xx)^T D (U^T \xx)\text{.} \end{equation*}
Now we define \(\yy = U^T\xx\) as a change-of-variables. In particular,
\begin{equation*} \yy = \mqty[X\\Y] = \mqty[\frac{1}{\sqrt{10}}x - \frac{3}{\sqrt{10}}y \\ \frac{3}{\sqrt{10}}x + \frac{1}{\sqrt{10}}y]\text{.} \end{equation*}
Our quadratic form is now
\begin{equation*} \yy^T D\yy = 0X^2 + 10Y^2 = 10Y^2 \end{equation*}
and our original equation becomes \(10Y^2 = 10\) or just \(Y = \pm1\text{.}\) Therefore the original equation describes the two different lines
\begin{equation*} \frac{3}{\sqrt{10}}x + \frac{1}{\sqrt{10}}y = -1\text{ and }\frac{3}{\sqrt{10}}x + \frac{1}{\sqrt{10}}y = 1\text{.} \end{equation*}

Subsection Analytic Functions of Matrices

A function \(f(x)\) is analytic at \(x = a\) if it has a power series representation on some interval centered around \(a\text{:}\)
\begin{equation*} f(x) = \sum_{k=0}^\infty c_k(x-a)^k = c_0 + c_1(x-a) + c_2(x-a)^2 + \cdots\text{ for }x\approx a\text{.} \end{equation*}
If \(A\) denotes a square matrix, and if \(f(x)\) is analytic at \(a=0\text{,}\) then we can try to make sense of the expression \(f(A)\) by using the power series for \(f(x)\text{:}\)
\begin{equation*} f(A) = \sum_{k=0}^{\infty}c_k A^k = c_0I + c_1A + c_2A^2 + \cdots\text{,} \end{equation*}
assuming this sum actually exists.

Example 2.4.15. Exponential of a Matrix.

Let
\begin{equation*} A = \mqty[0 & 2 & 3 \\ 0 & 0 & 2 \\ 0 & 0 & 0]\text{.} \end{equation*}
Find \(e^A\text{.}\)
Solution.
By definition,
\begin{equation*} e^A = I + A + \frac{1}{2!}A^2 + \frac{1}{3!}A^3 + \cdots\text{,} \end{equation*}
so we can find \(e^A\) by looking at the powers of \(A\text{.}\) In this case,
\begin{equation*} A^2 = \mqty[0 & 0 & 4 \\ 0 & 0 & 0 \\ 0 & 0 & 0]\text{ and }A^3 = A^4 = \ldots = \mqty[0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0]\text{.} \end{equation*}
Therefore
\begin{equation*} e^A = I + A + \frac{1}{2}A^2 = \mqty[1 & 2 & 5 \\ 0 & 1 & 2 \\ 0 & 0 & 1]\text{.} \end{equation*}
This can be verified using the command expm in Octave as below.

Example 2.4.16. Function of a Diagonal Matrix.

Let \(f(x)\) be a function analytic at \(0\text{.}\) Let \(D\) be a diagonal matrix whose diagonal entries are within the interval of convergence of the power series for \(f(x)\) centered at \(0\text{.}\) Find \(f(D)\text{.}\)
The last example shows that if \(f(x)\) is analytic at \(0\) then it is relatively straightforward to find \(f(D)\text{,}\) assuming that the diagonal entries of \(D\) are within the interval of convergence for the series representation of \(f(x)\) at \(0\text{.}\) Therefore we can find functions of diagonalizable matrices as well.
Let \(f(x) = \sum_{k=0}^{\infty}c_k x^k\) and recall that \(A^k = PD^kP^{-1}\text{.}\) Then
\begin{equation*} f(A) = \sum_{k=0}^{\infty}c_kA^k = P\qty(\sum_{k=0}^{\infty}c_k D^k)P^{-1} = Pf(D)P^{-1}\text{.} \end{equation*}
Theorem 2.4.17 allows for straightforward computations of matrix exponentials of diagonalizable matrices. This is useful in differential equations when solving linear systems of ODEs (see here 3 ).
The proof follows quickly from the fact that \(\dv{}{t}(e^{At}) = Ae^{At}\text{.}\) Using this, we differentiate \(\xx(t) = e^{At}\xx_0\) to get
\begin{equation*} \dv{\xx}{t} = Ae^{At}\xx_0 = A\xx\text{.} \end{equation*}
Furthermore, \(\xx(0) = e^{\vb{0}}\xx_0 = \xx_0\text{.}\) Therefore \(\xx(t) = e^{At}\xx_0\) is a solution of the initial value problem.
Theorems 2.4.17 and 2.4.18 allow us to find solutions of linear systems that involve diagonalizable matrices in terms of the matrix exponential.

Example 2.4.19. Solving a First-Order System.

Solve
\begin{equation*} x^\prime = 3x - 4y \text{ and } y^\prime = -4x + 3y \end{equation*}
with initial condition \(x(0) = 2\) and \(y(0) = -1\text{.}\)
Solution.
This can be solved very easily using the matrix exponential and diagonalization. If we let
\begin{equation*} \xx = \mqty[x \\ y]\text{ and }A = \mqty[3 & -4 \\ -4 & 3] \end{equation*}
then the system can be written as the matrix ODE \(\xx^\prime = A\xx\) with initial condition \(\xx_0 = \smqty[2 \\ 1]\text{.}\) \(A\) is symmetric and can be orthogonally diagonalized by
\begin{equation*} U = \mqty[\frac{\sqrt{2}}{2} & \frac{\sqrt{2}}{2} \\ \frac{\sqrt{2}}{2} & -\frac{\sqrt{2}}{2}]\text{ and } D = \mqty[-1 & 0 \\ 0 & 7] \end{equation*}
as seen in the code cell below this example. Therefore \(A = UDU^T\text{,}\) \(e^{At} = Ue^{Dt}U^T\) and the solution of the system must be
\begin{equation*} \xx = \mqty[\frac{\sqrt{2}}{2} & \frac{\sqrt{2}}{2} \\ \frac{\sqrt{2}}{2} & -\frac{\sqrt{2}}{2}]\mqty[e^{-t} & 0 \\ 0 & e^{7t}]\mqty[\frac{\sqrt{2}}{2} & \frac{\sqrt{2}}{2} \\ \frac{\sqrt{2}}{2} & -\frac{\sqrt{2}}{2}]\mqty[2 \\ -1]\text{.} \end{equation*}
j-oldroyd.github.io/wvwc-differential-equations/systems-of-odes.html