7.4 The singular value decomposition (奇异值分解)

本文为《Linear algebra and its applications》的读书笔记

As we know, not all matrices can be factored as A = P D P − 1 A =PDP^{-1} A=PDP−1 with D D D diagonal. However, a factorization A = Q D P − 1 A = QDP^{-1} A=QDP−1 is possible for any m × n m\times n m×n matrix A A A ! A special factorization of this type, called the singular value decomposition, is one of the most useful matrix factorizations in applied linear algebra.

The singular value decomposition is based on the following property of the ordinary diagonalization that can be imitated for rectangular matrices: The absolute values of the eigenvalues of a symmetric matrix A A A measure the amounts that A A A stretches or shrinks certain vectors (the eigenvectors). If A x = λ x A\boldsymbol x =\lambda \boldsymbol x Ax=λx and ∥ x ∥ = 1 \left\|\boldsymbol x\right\|= 1 ∥x∥=1, then

7.4 The singular value decomposition (奇异值分解)
If λ 1 \lambda_1 λ1 is the eigenvalue with the greatest magnitude, then a corresponding unit eigenvector v 1 \boldsymbol v_1 v1 identifies a direction in which the stretching effect of A A A is greatest. This description of v 1 \boldsymbol v_1 v1 and ∥ λ 1 ∥ \left\|\lambda_1\right\| ∥λ1∥ has an analogue for rectangular matrices that will lead to the singular value decomposition.

EXAMPLE 1
If A = [ 4 11 14 8 7 − 2 ] A=\begin{bmatrix}4& 11& 14\\8& 7& -2\end{bmatrix} A=[4811714−2], then the linear transformation x ↦ A x \boldsymbol x \mapsto A\boldsymbol x x↦Ax maps the unit sphere(球) { x : ∥ x ∥ = 1 } \{\boldsymbol x:\left\|\boldsymbol x\right\|= 1\} {x:∥x∥=1} in R 3 \R^3 R3 onto an ellipse in R 2 \R^2 R2, shown in Figure 1. Find a unit vector x \boldsymbol x x at which the length ∥ A x ∥ \left\|A\boldsymbol x\right\| ∥Ax∥ is maximized, and compute this maximum length.

7.4 The singular value decomposition (奇异值分解)
SOLUTION
Observe that

7.4 The singular value decomposition (奇异值分解)
Also, A T A A^TA ATA is a symmetric matrix. So the problem now is to maximize the quadratic form x T ( A T A ) x \boldsymbol x^T(A^TA)\boldsymbol x xT(ATA)x subject to the constraint ∥ x ∥ = 1 \left\|\boldsymbol x\right\|= 1 ∥x∥=1. By Theorem 6 in Section 7.3, the maximum value is the greatest eigenvalue λ 1 \lambda_1 λ1 of A T A A^TA ATA. Also, the maximum value is attained at a unit eigenvector of A T A A^TA ATA corresponding to λ 1 \lambda_1 λ1.

7.4 The singular value decomposition (奇异值分解)
The eigenvalues of A T A A^TA ATA are λ 1 = 360 , λ 2 = 90 \lambda_1 = 360, \lambda_2 = 90 λ1=360,λ2=90, and λ 3 = 0 \lambda_3 = 0 λ3=0. Corresponding unit eigenvectors are, respectively,

7.4 The singular value decomposition (奇异值分解)
A v 1 A\boldsymbol v_1 Av1 is a point on the ellipse in Figure 1 farthest from the origin.

The Singular Values of an m × n m\times n m×n Matrix

Let A A A be an m × n m \times n m×n matrix. Then A T A A^TA ATA is symmetric and can be orthogonally diagonalized. Let { v 1 , . . . , v n } \{\boldsymbol v_1,...,\boldsymbol v_n\} {v1,...,vn} be an orthonormal basis for R n \R^n Rn consisting of eigenvectors of A T A A^TA ATA, and let { λ 1 , . . . , λ n } \{\lambda_1,...,\lambda_n\} {λ1,...,λn} be the associated eigenvalues of A T A A^TA ATA. Then, for 1 ≤ i ≤ n 1\leq i\leq n 1≤i≤n,

7.4 The singular value decomposition (奇异值分解)
So the eigenvalues of A T A A^TA ATA are all nonnegative. By renumbering, if necessary, we may assume that the eigenvalues are arranged so that

7.4 The singular value decomposition (奇异值分解)
The singular values of A A A are the square roots of the eigenvalues of A T A A^TA ATA, denoted by σ 1 , . . . , σ n \sigma_1,...,\sigma_n σ1,...,σn, and they are arranged in decreasing order. By equation (2), the singular values of A A A are the lengths of the vectors A v 1 , . . . , A v n A\boldsymbol v_1,...,A\boldsymbol v_n Av1,...,Avn.

7.4 The singular value decomposition (奇异值分解)
PROOF
For i ≠ j i\neq j i=j ,

7.4 The singular value decomposition (奇异值分解)
Thus { A v 1 , . . . , A v n } \{A\boldsymbol v_1,...,A\boldsymbol v_n\} {Av1,...,Avn} is an orthogonal set. Furthermore, since the lengths of the vectors A v 1 , . . . , A v n A\boldsymbol v_1,...,A\boldsymbol v_n Av1,...,Avn are the singular values of A A A, and since there are r r r nonzero singular values, A v i ≠ 0 A\boldsymbol v_i\neq\boldsymbol 0 Avi=0 if and only if 1 ≤ i ≤ r 1\leq i\leq r 1≤i≤r. So A v 1 , . . . , A v r A\boldsymbol v_1,...,A\boldsymbol v_r Av1,...,Avr are linearly independent vectors, and they are in C o l A ColA ColA. Finally, for any y = A x \boldsymbol y=A\boldsymbol x y=Ax in C o l A ColA ColA, we can write x = c 1 v 1 + . . . + c n v n \boldsymbol x = c_1\boldsymbol v_1+...+ c_n\boldsymbol v_n x=c1v1+...+cnvn, and

7.4 The singular value decomposition (奇异值分解)
Thus y \boldsymbol y y is in S p a n { A v 1 , . . . , A v r } Span\{A\boldsymbol v_1,...,A\boldsymbol v_r\} Span{Av1,...,Avr}, which shows that { A v 1 , . . . , A v r } \{A\boldsymbol v_1,...,A\boldsymbol v_r\} {Av1,...,Avr} is an (orthogonal) basis for C o l A ColA ColA. Hence r a n k A = d i m C o l A = r rankA = dim ColA= r rankA=dimColA=r.

7.4 The singular value decomposition (奇异值分解)

The Singular Value Decomposition (SVD)

The decomposition of A A A involves an m × n m\times n m×n “diagonal” matrix ∑ \sum ∑ of the form

7.4 The singular value decomposition (奇异值分解)
where D D D is an r × r r\times r r×r diagonal matrix for some r r r not exceeding the smaller of m m m and n n n. (If r r r equals m m m or n n n or both, some or all of the zero matrices do not appear.)

7.4 The singular value decomposition (奇异值分解)
The matrices U U U and V V V are not uniquely determined by A A A. The columns of U U U in such a decomposition are called left singular vectors of A A A, and the columns of V V V are called right singular vectors of A A A.

PROOF
Let λ i \lambda_i λi and v i \boldsymbol v_i vi be as in Theorem 9, so that { A v 1 , . . . , A v r } \{A\boldsymbol v_1,...,A\boldsymbol v_r\} {Av1,...,Avr} is an orthogonal basis for C o l A ColA ColA. Normalize each A v i A\boldsymbol v_i Avi to obtain an orthonormal basis { u 1 , . . . , u r } \{\boldsymbol u_1,...,\boldsymbol u_r\} {u1,...,ur}, where

7.4 The singular value decomposition (奇异值分解)
and

7.4 The singular value decomposition (奇异值分解)
Now extend { u 1 , . . . , u r } \{\boldsymbol u_1,...,\boldsymbol u_r\} {u1,...,ur} to an orthonormal basis { u 1 , . . . , u m } \{\boldsymbol u_1,...,\boldsymbol u_m\} {u1,...,um} of R m \R^m Rm, and let

7.4 The singular value decomposition (奇异值分解)
By construction, U U U and V V V are orthogonal matrices. Also, from (4),

7.4 The singular value decomposition (奇异值分解)
Let D D D be the diagonal matrix with diagonal entries σ 1 , . . . , σ r \sigma_1,...,\sigma_r σ1,...,σr , and let ∑ \sum ∑ be as in (3) above. Then

7.4 The singular value decomposition (奇异值分解)
Since V V V is an orthogonal matrix,

7.4 The singular value decomposition (奇异值分解)

The next two examples focus attention on the internal structure of a singular value decomposition. An efficient and numerically stable algorithm for this decomposition would use a different approach. See the Numerical Note at the end of the section.

EXAMPLE 3
Use the results of Examples 1 to construct a singular value decomposition of A = [ 4 11 14 8 7 − 2 ] A=\begin{bmatrix}4& 11& 14\\8& 7& -2\end{bmatrix} A=[4811714−2]

SOLUTION
A construction can be divided into three steps.

Step 1. Find an orthogonal diagonalization of A T A A^TA ATA.
Step 2. Set up V V V and ∑ \sum ∑. Arrange the eigenvalues of A T A A^TA ATA in decreasing order. In Example 1, the eigenvalues are already listed in decreasing order: 360 , 90 360, 90 360,90, and 0 0 0. The corresponding unit eigenvectors, v 1 , v 2 \boldsymbol v_1, \boldsymbol v_2 v1,v2, and v 3 \boldsymbol v_3 v3, are the right singular vectors of A A A.

7.4 The singular value decomposition (奇异值分解)
The square roots of the eigenvalues are the singular values:

7.4 The singular value decomposition (奇异值分解)

Step 3. Construct U U U. When A A A has rank r r r, the first r r r columns of U U U are the normalized vectors obtained from A v 1 , . . . , A v r A\boldsymbol v_1,...,A\boldsymbol v_r Av1,...,Avr . In this example, A A A has two nonzero singular values, so r a n k A = 2 rankA = 2 rankA=2. Recall that ∥ A v 1 ∥ = σ 1 \left\|A\boldsymbol v_1\right\|=\sigma_1 ∥Av1∥=σ1 and ∥ A v 2 ∥ = σ 2 \left\|A\boldsymbol v_2\right\|=\sigma_2 ∥Av2∥=σ2. Thus

7.4 The singular value decomposition (奇异值分解)
Note that { u 1 , u 2 } \{\boldsymbol u_1,\boldsymbol u_2\} {u1,u2} is already a basis for R 2 \R^2 R2. Thus no additional vectors are needed for U U U. The singular value decomposition of A A A is

7.4 The singular value decomposition (奇异值分解)
EXAMPLE 4
Find a singular value decomposition of

7.4 The singular value decomposition (奇异值分解)
SOLUTION
The eigenvalues of A T A A^TA ATA are 18 and 0, with corresponding unit eigenvectors

7.4 The singular value decomposition (奇异值分解)

To construct U U U, first construct A v 1 A\boldsymbol v_1 Av1 and A v 2 A\boldsymbol v_2 Av2:

7.4 The singular value decomposition (奇异值分解)
The only column found for U U U so far is

7.4 The singular value decomposition (奇异值分解)
The other columns of U U U are found by extending the set { u 1 } \{\boldsymbol u_1\} {u1} to an orthonormal basis for R 3 \R^3 R3. In this case, we need two orthogonal unit vectors u 2 \boldsymbol u_2 u2 and u 3 \boldsymbol u_3 u3 that are orthogonal to u 1 \boldsymbol u_1 u1. Each vector must satisfy u 1 T x = 0 \boldsymbol u_1^T\boldsymbol x= 0 u1Tx=0, which is equivalent to the equation x 1 − 2 x 2 + 2 x 3 = 0 x_1-2x_2+ 2x_3= 0 x1−2x2+2x3=0. A basis for the solution set of this equation is

7.4 The singular value decomposition (奇异值分解)
Apply the Gram–Schmidt process (with normalizations) to { w 1 , w 2 } \{\boldsymbol w_1,\boldsymbol w_2\} {w1,w2}, and obtain

7.4 The singular value decomposition (奇异值分解)

Applications of the Singular Value Decomposition

The SVD is often used to estimate the rank of a matrix, as noted above. Several other numerical applications are described briefly below, and an application to image processing is presented in Section 7.5.

EXAMPLE 5 (The Condition Number (条件数))
Most numerical calculations involving an equation A x = b A\boldsymbol x =\boldsymbol b Ax=b are as reliable as possible when the SVD of A A A is used. The two orthogonal matrices U U U and V V V do not affect lengths of vectors or angles between vectors. Any possible instabilities in numerical calculations are identified in ∑ \sum ∑. If the singular values of A A A are extremely large or small, roundoff errors are almost inevitable, but an error analysis is aided by knowing the entries in ∑ \sum ∑ and V V V .
If A A A is an invertible n × n n\times n n×n matrix, then the ratio σ 1 = σ n \sigma_1=\sigma_n σ1=σn of the largest and smallest singular values gives the condition number of A A A. Exercises 41–43 in Section 2.3 showed how the condition number affects the sensitivity of a solution of A x = b A\boldsymbol x =\boldsymbol b Ax=b to changes (or errors) in the entries of A. (Actually, a “condition number” of A A A can be computed in several ways, but the definition given here is widely used for studying A x = b A\boldsymbol x =\boldsymbol b Ax=b.)

EXAMPLE 6 (Bases for Fundamental Subspaces)
Given an SVD for an m × n m \times n m×n matrix A A A, let u 1 , . . . , u m \boldsymbol u_1,...,\boldsymbol u_m u1,...,um be the left singular vectors, v 1 , . . . , v n \boldsymbol v_1,...,\boldsymbol v_n v1,...,vn the right singular vectors, and σ 1 , . . . , σ n \sigma_1,...,\sigma_n σ1,...,σn the singular values, and let r r r be the rank of A A A. By Theorem 9,
{ u 1 , . . . , u r } ( 5 ) \{\boldsymbol u_1,...,\boldsymbol u_r\}\ \ \ \ (5) {u1,...,ur} (5)

is an orthonormal basis for C o l A ColA ColA.

Recall that ( C o l A ) ⊥ = N u l A T (Col A)^{\perp}= NulA^T (ColA)⊥=NulAT . Hence
{ u r + 1 , . . . , u m } ( 6 ) \{\boldsymbol u_{r+1},...,\boldsymbol u_m\}\ \ \ \ (6) {ur+1,...,um} (6)

is an orthonormal basis for N u l A T NulA^T NulAT .

Since ∥ A v i ∥ = σ i \left\|A\boldsymbol v_i\right\| =\sigma_i ∥Avi∥=σi for 1 ≤ i ≤ n 1\leq i\leq n 1≤i≤n, and σ i \sigma_i σi is 0 if and only if i > r i > r i>r, the vectors v r + 1 , . . . , v n \boldsymbol v_{r+1},...,\boldsymbol v_n vr+1,...,vn span a subspace of N u l A NulA NulA of dimension n − r n - r n−r. By the Rank Theorem, d i m N u l A = n − r a n k A = n − r dim NulA = n - rankA=n-r dimNulA=n−rankA=n−r. It follows that

{ v r + 1 , . . . , v n } ( 7 ) \{\boldsymbol v_{r+1},...,\boldsymbol v_n\}\ \ \ \ (7) {vr+1,...,vn} (7)

is an orthonormal basis for N u l A NulA NulA.

( N u l A ) ⊥ = C o l A T = R o w A (Nul A)^\perp= ColA^T = RowA (NulA)⊥=ColAT=RowA. Hence, from ( 7 ) (7) (7),

{ v 1 , . . . , v r } ( 8 ) \{\boldsymbol v_1,...,\boldsymbol v_r\}\ \ \ \ (8) {v1,...,vr} (8)

is an orthonormal basis for R o w A RowA RowA.

Figure 4 summarizes ( 5 ) – ( 8 ) (5)–(8) (5)–(8), but shows the orthogonal basis { σ 1 u 1 , . . . , σ r u r } \{\sigma_1\boldsymbol u_1,...,\sigma_r\boldsymbol u_r\} {σ1u1,...,σrur} for C o l A ColA ColA instead of the normalized basis, to remind you that A v i = σ i u i A\boldsymbol v_i= \sigma_i \boldsymbol u_i Avi=σiui for 1 ≤ i ≤ r 1\leq i \leq r 1≤i≤r.

7.4 The singular value decomposition (奇异值分解)

The four fundamental subspaces and the concept of singular values provide the final statements of the Invertible Matrix Theorem.

7.4 The singular value decomposition (奇异值分解)

EXAMPLE 7 (Reduced SVD and the Pseudoinverse of A A A (奇异值分解的简化和 A A A 的伪逆))
When ∑ \sum ∑ contains rows or columns of zeros, a more compact decomposition of A A A is possible. Using the notation established above, let r = r a n k A r= rankA r=rankA, and partition U U U and V V V into submatrices whose first blocks contain r r r columns:

7.4 The singular value decomposition (奇异值分解)

Then U r U_r Ur is m × r m\times r m×r and V r V_r Vr is n × r n\times r n×r. (To simplify notation, we consider U m − r U_{m-r} Um−r or V n − r V_{n-r} Vn−r even though one of them may have no columns.) Then partitioned matrix multiplication shows that

7.4 The singular value decomposition (奇异值分解)
This factorization of A A A is called a reduced singular value decomposition of A A A. Since the diagonal entries in D D D are nonzero, D D D is invertible. The following matrix is called the pseudoinverse (伪逆) (also, the Moore–Penrose inverse (穆尔-彭罗斯逆)) of A A A:

7.4 The singular value decomposition (奇异值分解)

Supplementary Exercises 12–14 at the end of the chapter explore some of the properties of the reduced singular value decomposition and the pseudoinverse.

EXAMPLE 8 (Least-Squares Solution)
Given the equation A x = b A\boldsymbol x =\boldsymbol b Ax=b, use the pseudoinverse of A A A to define

7.4 The singular value decomposition (奇异值分解)
Then,

7.4 The singular value decomposition (奇异值分解)
U r U r T b U_rU_r^T\boldsymbol b UrUrTb is the orthogonal projection b ^ \hat\boldsymbol b b^ of b \boldsymbol b b onto C o l A ColA ColA. Thus x ^ \hat\boldsymbol x x^ is a least-squares solution of A x = b A\boldsymbol x =\boldsymbol b Ax=b. In fact, this x ^ \hat \boldsymbol x x^ has the smallest length among all least-squares solutions of A x = b A\boldsymbol x=\boldsymbol b Ax=b. See Supplementary Exercise 14.

7.4 The singular value decomposition (奇异值分解)

目录

The Singular Values of an m × n m\times n m×n Matrix

The Singular Value Decomposition (SVD)

Applications of the Singular Value Decomposition