本文为《Linear algebra and its applications》的读书笔记
目录
As we know, not all matrices can be factored as A = P D P − 1 A =PDP^{-1} A=PDP−1 with D D D diagonal. However, a factorization A = Q D P − 1 A = QDP^{-1} A=QDP−1 is possible for any m × n m\times n m×n matrix A A A ! A special factorization of this type, called the singular value decomposition, is one of the most useful matrix factorizations in applied linear algebra.
The singular value decomposition is based on the following property of the ordinary diagonalization that can be imitated for rectangular matrices: The absolute values of the eigenvalues of a symmetric matrix A A A measure the amounts that A A A stretches or shrinks certain vectors (the eigenvectors). If A x = λ x A\boldsymbol x =\lambda \boldsymbol x Ax=λx and ∥ x ∥ = 1 \left\|\boldsymbol x\right\|= 1 ∥x∥=1, then
If
λ
1
\lambda_1
λ1 is the eigenvalue with the greatest magnitude, then a corresponding unit eigenvector
v
1
\boldsymbol v_1
v1 identifies a direction in which the stretching effect of
A
A
A is greatest. This description of
v
1
\boldsymbol v_1
v1 and
∥
λ
1
∥
\left\|\lambda_1\right\|
∥λ1∥ has an analogue for rectangular matrices that will lead to the singular value decomposition.
EXAMPLE 1
If
A
=
[
4
11
14
8
7
−
2
]
A=\begin{bmatrix}4& 11& 14\\8& 7& -2\end{bmatrix}
A=[4811714−2], then the linear transformation
x
↦
A
x
\boldsymbol x \mapsto A\boldsymbol x
x↦Ax maps the unit sphere(球)
{
x
:
∥
x
∥
=
1
}
\{\boldsymbol x:\left\|\boldsymbol x\right\|= 1\}
{x:∥x∥=1} in
R
3
\R^3
R3 onto an ellipse in
R
2
\R^2
R2, shown in Figure 1. Find a unit vector
x
\boldsymbol x
x at which the length
∥
A
x
∥
\left\|A\boldsymbol x\right\|
∥Ax∥ is maximized, and compute this maximum length.
SOLUTION
Observe that
Also,
A
T
A
A^TA
ATA is a symmetric matrix. So the problem now is to maximize the quadratic form
x
T
(
A
T
A
)
x
\boldsymbol x^T(A^TA)\boldsymbol x
xT(ATA)x subject to the constraint
∥
x
∥
=
1
\left\|\boldsymbol x\right\|= 1
∥x∥=1. By Theorem 6 in Section 7.3, the maximum value is the greatest eigenvalue
λ
1
\lambda_1
λ1 of
A
T
A
A^TA
ATA. Also, the maximum value is attained at a unit eigenvector of
A
T
A
A^TA
ATA corresponding to
λ
1
\lambda_1
λ1.
The eigenvalues of
A
T
A
A^TA
ATA are
λ
1
=
360
,
λ
2
=
90
\lambda_1 = 360, \lambda_2 = 90
λ1=360,λ2=90, and
λ
3
=
0
\lambda_3 = 0
λ3=0. Corresponding unit eigenvectors are, respectively,
A
v
1
A\boldsymbol v_1
Av1 is a point on the ellipse in Figure 1 farthest from the origin.
The Singular Values of an m × n m\times n m×n Matrix
Let A A A be an m × n m \times n m×n matrix. Then A T A A^TA ATA is symmetric and can be orthogonally diagonalized. Let { v 1 , . . . , v n } \{\boldsymbol v_1,...,\boldsymbol v_n\} {v1,...,vn} be an orthonormal basis for R n \R^n Rn consisting of eigenvectors of A T A A^TA ATA, and let { λ 1 , . . . , λ n } \{\lambda_1,...,\lambda_n\} {λ1,...,λn} be the associated eigenvalues of A T A A^TA ATA. Then, for 1 ≤ i ≤ n 1\leq i\leq n 1≤i≤n,
So the eigenvalues of
A
T
A
A^TA
ATA are all nonnegative. By renumbering, if necessary, we may assume that the eigenvalues are arranged so that
The singular values of
A
A
A are the square roots of the eigenvalues of
A
T
A
A^TA
ATA, denoted by
σ
1
,
.
.
.
,
σ
n
\sigma_1,...,\sigma_n
σ1,...,σn, and they are arranged in decreasing order. By equation (2), the singular values of
A
A
A are the lengths of the vectors
A
v
1
,
.
.
.
,
A
v
n
A\boldsymbol v_1,...,A\boldsymbol v_n
Av1,...,Avn.
PROOF
For
i
≠
j
i\neq j
i=j ,
Thus
{
A
v
1
,
.
.
.
,
A
v
n
}
\{A\boldsymbol v_1,...,A\boldsymbol v_n\}
{Av1,...,Avn} is an orthogonal set. Furthermore, since the lengths of the vectors
A
v
1
,
.
.
.
,
A
v
n
A\boldsymbol v_1,...,A\boldsymbol v_n
Av1,...,Avn are the singular values of
A
A
A, and since there are
r
r
r nonzero singular values,
A
v
i
≠
0
A\boldsymbol v_i\neq\boldsymbol 0
Avi=0 if and only if
1
≤
i
≤
r
1\leq i\leq r
1≤i≤r. So
A
v
1
,
.
.
.
,
A
v
r
A\boldsymbol v_1,...,A\boldsymbol v_r
Av1,...,Avr are linearly independent vectors, and they are in
C
o
l
A
ColA
ColA. Finally, for any
y
=
A
x
\boldsymbol y=A\boldsymbol x
y=Ax in
C
o
l
A
ColA
ColA, we can write
x
=
c
1
v
1
+
.
.
.
+
c
n
v
n
\boldsymbol x = c_1\boldsymbol v_1+...+ c_n\boldsymbol v_n
x=c1v1+...+cnvn, and
Thus
y
\boldsymbol y
y is in
S
p
a
n
{
A
v
1
,
.
.
.
,
A
v
r
}
Span\{A\boldsymbol v_1,...,A\boldsymbol v_r\}
Span{Av1,...,Avr}, which shows that
{
A
v
1
,
.
.
.
,
A
v
r
}
\{A\boldsymbol v_1,...,A\boldsymbol v_r\}
{Av1,...,Avr} is an (orthogonal) basis for
C
o
l
A
ColA
ColA. Hence
r
a
n
k
A
=
d
i
m
C
o
l
A
=
r
rankA = dim ColA= r
rankA=dimColA=r.
The Singular Value Decomposition (SVD)
The decomposition of A A A involves an m × n m\times n m×n “diagonal” matrix ∑ \sum ∑ of the form
where
D
D
D is an
r
×
r
r\times r
r×r diagonal matrix for some
r
r
r not exceeding the smaller of
m
m
m and
n
n
n. (If
r
r
r equals
m
m
m or
n
n
n or both, some or all of the zero matrices do not appear.)
The matrices
U
U
U and
V
V
V are not uniquely determined by
A
A
A. The columns of
U
U
U in such a decomposition are called left singular vectors of
A
A
A, and the columns of
V
V
V are called right singular vectors of
A
A
A.
PROOF
Let
λ
i
\lambda_i
λi and
v
i
\boldsymbol v_i
vi be as in Theorem 9, so that
{
A
v
1
,
.
.
.
,
A
v
r
}
\{A\boldsymbol v_1,...,A\boldsymbol v_r\}
{Av1,...,Avr} is an orthogonal basis for
C
o
l
A
ColA
ColA. Normalize each
A
v
i
A\boldsymbol v_i
Avi to obtain an orthonormal basis
{
u
1
,
.
.
.
,
u
r
}
\{\boldsymbol u_1,...,\boldsymbol u_r\}
{u1,...,ur}, where
and
Now extend
{
u
1
,
.
.
.
,
u
r
}
\{\boldsymbol u_1,...,\boldsymbol u_r\}
{u1,...,ur} to an orthonormal basis
{
u
1
,
.
.
.
,
u
m
}
\{\boldsymbol u_1,...,\boldsymbol u_m\}
{u1,...,um} of
R
m
\R^m
Rm, and let
By construction,
U
U
U and
V
V
V are orthogonal matrices. Also, from (4),
Let
D
D
D be the diagonal matrix with diagonal entries
σ
1
,
.
.
.
,
σ
r
\sigma_1,...,\sigma_r
σ1,...,σr , and let
∑
\sum
∑ be as in (3) above. Then
Since
V
V
V is an orthogonal matrix,
The next two examples focus attention on the internal structure of a singular value decomposition. An efficient and numerically stable algorithm for this decomposition would use a different approach. See the Numerical Note at the end of the section.
EXAMPLE 3
Use the results of Examples 1 to construct a singular value decomposition of
A
=
[
4
11
14
8
7
−
2
]
A=\begin{bmatrix}4& 11& 14\\8& 7& -2\end{bmatrix}
A=[4811714−2]
SOLUTION
A construction can be divided into three steps.
Step 1. Find an orthogonal diagonalization of
A
T
A
A^TA
ATA.
Step 2. Set up
V
V
V and
∑
\sum
∑. Arrange the eigenvalues of
A
T
A
A^TA
ATA in decreasing order. In Example 1, the eigenvalues are already listed in decreasing order:
360
,
90
360, 90
360,90, and
0
0
0. The corresponding unit eigenvectors,
v
1
,
v
2
\boldsymbol v_1, \boldsymbol v_2
v1,v2, and
v
3
\boldsymbol v_3
v3, are the right singular vectors of
A
A
A.
The square roots of the eigenvalues are the singular values:
Step 3. Construct
U
U
U. When
A
A
A has rank
r
r
r, the first
r
r
r columns of
U
U
U are the normalized vectors obtained from
A
v
1
,
.
.
.
,
A
v
r
A\boldsymbol v_1,...,A\boldsymbol v_r
Av1,...,Avr . In this example,
A
A
A has two nonzero singular values, so
r
a
n
k
A
=
2
rankA = 2
rankA=2. Recall that
∥
A
v
1
∥
=
σ
1
\left\|A\boldsymbol v_1\right\|=\sigma_1
∥Av1∥=σ1 and
∥
A
v
2
∥
=
σ
2
\left\|A\boldsymbol v_2\right\|=\sigma_2
∥Av2∥=σ2. Thus
Note that
{
u
1
,
u
2
}
\{\boldsymbol u_1,\boldsymbol u_2\}
{u1,u2} is already a basis for
R
2
\R^2
R2. Thus no additional vectors are needed for
U
U
U. The singular value decomposition of
A
A
A is
EXAMPLE 4
Find a singular value decomposition of
SOLUTION
The eigenvalues of
A
T
A
A^TA
ATA are 18 and 0, with corresponding unit eigenvectors
To construct
U
U
U, first construct
A
v
1
A\boldsymbol v_1
Av1 and
A
v
2
A\boldsymbol v_2
Av2:
The only column found for
U
U
U so far is
The other columns of
U
U
U are found by extending the set
{
u
1
}
\{\boldsymbol u_1\}
{u1} to an orthonormal basis for
R
3
\R^3
R3. In this case, we need two orthogonal unit vectors
u
2
\boldsymbol u_2
u2 and
u
3
\boldsymbol u_3
u3 that are orthogonal to
u
1
\boldsymbol u_1
u1. Each vector must satisfy
u
1
T
x
=
0
\boldsymbol u_1^T\boldsymbol x= 0
u1Tx=0, which is equivalent to the equation
x
1
−
2
x
2
+
2
x
3
=
0
x_1-2x_2+ 2x_3= 0
x1−2x2+2x3=0. A basis for the solution set of this equation is
Apply the Gram–Schmidt process (with normalizations) to
{
w
1
,
w
2
}
\{\boldsymbol w_1,\boldsymbol w_2\}
{w1,w2}, and obtain
Applications of the Singular Value Decomposition
The SVD is often used to estimate the rank of a matrix, as noted above. Several other numerical applications are described briefly below, and an application to image processing is presented in Section 7.5.
EXAMPLE 5 (The Condition Number (条件数))
Most numerical calculations involving an equation
A
x
=
b
A\boldsymbol x =\boldsymbol b
Ax=b are as reliable as possible when the SVD of
A
A
A is used. The two orthogonal matrices
U
U
U and
V
V
V do not affect lengths of vectors or angles between vectors. Any possible instabilities in numerical calculations are identified in
∑
\sum
∑. If the singular values of
A
A
A are extremely large or small, roundoff errors are almost inevitable, but an error analysis is aided by knowing the entries in
∑
\sum
∑ and
V
V
V .
If
A
A
A is an invertible
n
×
n
n\times n
n×n matrix, then the ratio
σ
1
=
σ
n
\sigma_1=\sigma_n
σ1=σn of the largest and smallest singular values gives the condition number of
A
A
A. Exercises 41–43 in Section 2.3 showed how the condition number affects the sensitivity of a solution of
A
x
=
b
A\boldsymbol x =\boldsymbol b
Ax=b to changes (or errors) in the entries of A. (Actually, a “condition number” of
A
A
A can be computed in several ways, but the definition given here is widely used for studying
A
x
=
b
A\boldsymbol x =\boldsymbol b
Ax=b.)
EXAMPLE 6 (Bases for Fundamental Subspaces)
Given an SVD for an
m
×
n
m \times n
m×n matrix
A
A
A, let
u
1
,
.
.
.
,
u
m
\boldsymbol u_1,...,\boldsymbol u_m
u1,...,um be the left singular vectors,
v
1
,
.
.
.
,
v
n
\boldsymbol v_1,...,\boldsymbol v_n
v1,...,vn the right singular vectors, and
σ
1
,
.
.
.
,
σ
n
\sigma_1,...,\sigma_n
σ1,...,σn the singular values, and let
r
r
r be the rank of
A
A
A. By Theorem 9,
{
u
1
,
.
.
.
,
u
r
}
(
5
)
\{\boldsymbol u_1,...,\boldsymbol u_r\}\ \ \ \ (5)
{u1,...,ur} (5)
is an orthonormal basis for C o l A ColA ColA.
Recall that
(
C
o
l
A
)
⊥
=
N
u
l
A
T
(Col A)^{\perp}= NulA^T
(ColA)⊥=NulAT . Hence
{
u
r
+
1
,
.
.
.
,
u
m
}
(
6
)
\{\boldsymbol u_{r+1},...,\boldsymbol u_m\}\ \ \ \ (6)
{ur+1,...,um} (6)
is an orthonormal basis for N u l A T NulA^T NulAT .
Since ∥ A v i ∥ = σ i \left\|A\boldsymbol v_i\right\| =\sigma_i ∥Avi∥=σi for 1 ≤ i ≤ n 1\leq i\leq n 1≤i≤n, and σ i \sigma_i σi is 0 if and only if i > r i > r i>r, the vectors v r + 1 , . . . , v n \boldsymbol v_{r+1},...,\boldsymbol v_n vr+1,...,vn span a subspace of N u l A NulA NulA of dimension n − r n - r n−r. By the Rank Theorem, d i m N u l A = n − r a n k A = n − r dim NulA = n - rankA=n-r dimNulA=n−rankA=n−r. It follows that
{ v r + 1 , . . . , v n } ( 7 ) \{\boldsymbol v_{r+1},...,\boldsymbol v_n\}\ \ \ \ (7) {vr+1,...,vn} (7)
is an orthonormal basis for N u l A NulA NulA.
( N u l A ) ⊥ = C o l A T = R o w A (Nul A)^\perp= ColA^T = RowA (NulA)⊥=ColAT=RowA. Hence, from ( 7 ) (7) (7),
{ v 1 , . . . , v r } ( 8 ) \{\boldsymbol v_1,...,\boldsymbol v_r\}\ \ \ \ (8) {v1,...,vr} (8)
is an orthonormal basis for R o w A RowA RowA.
Figure 4 summarizes ( 5 ) – ( 8 ) (5)–(8) (5)–(8), but shows the orthogonal basis { σ 1 u 1 , . . . , σ r u r } \{\sigma_1\boldsymbol u_1,...,\sigma_r\boldsymbol u_r\} {σ1u1,...,σrur} for C o l A ColA ColA instead of the normalized basis, to remind you that A v i = σ i u i A\boldsymbol v_i= \sigma_i \boldsymbol u_i Avi=σiui for 1 ≤ i ≤ r 1\leq i \leq r 1≤i≤r.
The four fundamental subspaces and the concept of singular values provide the final statements of the Invertible Matrix Theorem.
EXAMPLE 7 (Reduced SVD and the Pseudoinverse of
A
A
A (奇异值分解的简化和
A
A
A 的伪逆))
When
∑
\sum
∑ contains rows or columns of zeros, a more compact decomposition of
A
A
A is possible. Using the notation established above, let
r
=
r
a
n
k
A
r= rankA
r=rankA, and partition
U
U
U and
V
V
V into submatrices whose first blocks contain
r
r
r columns:
Then U r U_r Ur is m × r m\times r m×r and V r V_r Vr is n × r n\times r n×r. (To simplify notation, we consider U m − r U_{m-r} Um−r or V n − r V_{n-r} Vn−r even though one of them may have no columns.) Then partitioned matrix multiplication shows that
This factorization of
A
A
A is called a reduced singular value decomposition of
A
A
A. Since the diagonal entries in
D
D
D are nonzero,
D
D
D is invertible. The following matrix is called the pseudoinverse (伪逆) (also, the Moore–Penrose inverse (穆尔-彭罗斯逆)) of
A
A
A:
Supplementary Exercises 12–14 at the end of the chapter explore some of the properties of the reduced singular value decomposition and the pseudoinverse.
EXAMPLE 8 (Least-Squares Solution)
Given the equation
A
x
=
b
A\boldsymbol x =\boldsymbol b
Ax=b, use the pseudoinverse of
A
A
A to define
Then,
U
r
U
r
T
b
U_rU_r^T\boldsymbol b
UrUrTb is the orthogonal projection
b
^
\hat\boldsymbol b
b^ of
b
\boldsymbol b
b onto
C
o
l
A
ColA
ColA. Thus
x
^
\hat\boldsymbol x
x^ is a least-squares solution of
A
x
=
b
A\boldsymbol x =\boldsymbol b
Ax=b. In fact, this
x
^
\hat \boldsymbol x
x^ has the smallest length among all least-squares solutions of
A
x
=
b
A\boldsymbol x=\boldsymbol b
Ax=b. See Supplementary Exercise 14.