The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression - PDF

The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression The SVD is the most generally applicable of the orthogonal-diagonal-orthogonal type matrix decompositions Every

Please download to get full document.

View again

of 41
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.

Government Documents

Publish on:

Views: 22 | Pages: 41

Extension: PDF | Download: 0

The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression The SVD is the most generally applicable of the orthogonal-diagonal-orthogonal type matrix decompositions Every matrix, even nonsquare, has an SVD The SVD contains a great deal of information and is very useful as a theoretical and practical tool ******************************************************************* 1 Preliminaries Unless otherwise indicated, all vectors are column vectors u 1 u R n u 2 = u =. Rn 1 u n 1 Definition 1.1 Let u R n, so that u =(u 1,u 2,...u n ) T. The (Euclidean) norm of u is defined as ( n ) 1/2 u 2 = u u u2 n = u 2 j j=1 Definition 1.2 A vector u R n is a unit vector or normalized if u 2 = 1 Definition 1.3 Let A =(a ij ) R m n. The transpose A T (a ji ) R n m. of A is the matrix Example 1.4 ( ) T = Definition 1.5 (Matrix Multiplication) Let A R m n,b R n p. Then the product AB is defined element-wise as (AB) ij = n a ik b kj k=1 and the matrix AB R m p Definition 1.6 Let u, v R n. Then the inner product of u and v, written u, v is defined as n u, v = u j v j = u T v j=1 Note that this notation permits us to write matrix multiplication as entry-wise inner products of the rows and columns of the matrices If we denote the i th row of A by i A and the j th column of B by B j we have (AB) ij = ( i A) T,B j = i AB j 3 Example 1.7 ( ) = 1 (2) + 1 (0) + 0 (6) 1 (3) + 1 ( 2)+0 ( 3) 3 (2) + 2 (0) + 1 (6) 3 (3) + 2 ( 2)+1 ( 3) = ( ) Definition 1.8 Two vectors u, v R n are orthogonal if u, v = u T v = ( ) v 2 u 1 u 2 u n. v 1 v n = u 1 v 1 + u 2 v u n v n = 0 If u, v are orthogonal and both u 2 =1and v 2 =1, then we say u and v are orthonormal 4 Recall that the n-dimensional identity matrix is I n = We ll write I for the identity matrix when the size is clear from the context. Definition 1.9 A square matrix Q R n n is orthogonal if Q T Q = I. This definition means that the columns of an orthogonal matrix A are mutually orthogonal unit vectors in R n Alternatively, the columns of A are an orthonormal basis for R n Now Definition 1.9 shows that Q T is the left-inverse of Q 5 But since matrix multiplication is associative, Q T is the right-inverse (and hence the inverse) of Q - indeed, let P be a right-inverse of Q (so that QP = I); then (Q T Q)P = Q T (QP ) IP = Q T I P = Q T The SVD is applicable to even nonsquare matrices with complex entries, but for clarity we will restrict our initial treatment to real square matrices ******************************************************************* 2 Structure of the SVD Definition 2.1 Let A R n n. Then the (full) singular value decomposition of A is σ σ 2 0 (V 1 ) T A = UΣV T (V 2 ) T = U 1 U 2 U m 0 σ n (V n ) T 0 0 where U, V are orthogonal matrices and Σ is diagonal The σ i s are the singular values of A, by convention arranged in nonincreasing order σ 1 σ 2 σ n 0; the columns of U are termed left singular vectors of A; the columns of V are called right singular vectors of A 6 Since U and V are orthogonal matrices, the columns of each form orthonormal (mutually orthogonal, all of length 1) bases for R n We can use these bases to illuminate the fundamental property of the SVD: For the equation Ax = b, the SVD makes every matrix diagonal by selecting the right bases for the range and domain Let b, x R n such that Ax = b, and expand b in the columns of U and x in the columns of V to get b = U T b, x = V T x. 7 Then we have b = Ax U T b = U T Ax = U T (UΣV T )x = (U T U)Σ(V T x) = IΣx = Σx or b = Ax b =Σx Let y R n, then the action of left multiplication of y by A (computing z = Ay) is decomposed by the SVD into three steps z = Ay = (UΣV T ) y = UΣ(V T y) = UΣ c (c := V T y) = Uw (w := Σ c) 8 c = V T y is the analysis step, in which the components of y, in the basis of R n given by the columns of V, are computed w =Σc is the scaling step in which the components c i, i {1, 2,...,n} are dilated z = Uw is the synthesis step, in which z is assembled by scaling each of the R n -basis vectors u i by w i and summing 9 So how do we find the matrices U, Σ, and V in the SVD of some A R n n? Since V T V = I = U T U, A = UΣV T yields AV = U Σ and (1) U T A = ΣV T or, taking transposes A T U = V Σ (2) Or, for each j {1, 2,...,n}, Av j = σ j u j from Equation 1 (3) A T u j = σ j v j from Equation 2 (4) Now we multiply Equation 3 by A T to get 10 A T Av j = A T σ j u j = σ j A T u j = σ 2 j v j So the v j s are the eigenvectors of A T A with corresponding eigenvalues σ 2 j Note that (A T A) ij = i AA j or A T A = 1AA 1 1 AA 2 1AA n 2AA 1 2 AA naa 1 naa n (5) A T A is a matrix of inner products of columns of A - often called the Gram matrix of A We ll see the Gram matrix again when considering applications 11 Let s do an example: A = = A T = = A T A = To find the eigenvectors v and the corresponding eigenvalues λ for B := A T A, we solve Bx = λx (B λi)x =0 for λ and x 12 The standard technique for finding such λ and v is to first note that we are looking for the λ that make the matrix B λi = λ λ λ = 3 λ λ λ singular This is most easily done by solving det(b λi) =0: 3 λ λ λ = (3 λ)(1 λ)(2 λ) 2+λ = λ 3 +6λ 2 10λ +4 = 0 σ 2 1 = λ 1 = 2+ 2 σ 2 2 = λ 2 = 2 σ 2 3 = λ 3 = Now (for a gentle first step) we ll find a vector v 2 so that A T Av 2 =2v 2 We do this by finding a basis for the nullspace of A T A 2I = = Certainly any vector of the form 0 0, t t R, is mapped to zero by A T A 2I 0 So we can set v 2 = To find v 1 we find a basis for the nullspace of A T A (2 + 2)I = which row-reduces ( R2 (1 + 2)R1+R2, then R3 R2 )to So any vector of the form s ( 1+ 2)s 0 is mapped to zero by A T A (2 + 2)I so v 1 = spans the nullspace of AT A λ 1 I, but v 1 =1 15 So we set v 1 = v 1 v 1 = We could find v 3 in a similar manner, but in this particular case there s a quicker way... v 3 = (v 1 ) 2 (v 1 ) 1 0 = Certainly v 3 v 2 and by construction v 3 v 1 - recall the theorem from linear algebra symmetric matrices must have orthogonal eigenvectors 16 We ve found V = v 1 v 2 v 3 = And of course Σ= Now, how do we find U? 17 If σ n 0, Σ is invertible and U = AV Σ 1 So we have U = = ( 2 1) = Figure 1: The columns of A in the unit sphere 19 Figure 2: The columns of U in the unit sphere 20 Figure 3: The columns of V in the unit sphere 21 Figure 4: The columns of Σ in the ellipse formed by Σ acting on the unit sphere by left-multiplication Figure 5: The columns of AV = UΣ in the ellipse formed by A acting on the unit sphere by left-multiplication Note that the columns of U and V are orthogonal (as are, of course, the columns of Σ) 22 Note that in practice, the SVD is computed more efficiently than by the direct method we used here; usually by (OK, get ready for the gratuitous mathspeak) reducing A to bidiagonal form U 1 BV T 1 by elementary reflectors or Givens rotations and directly computing the SVD of B (= U 2 ΣV T 2 ) then the SVD of A is ( U 1 U 2 )Σ(V T 2 V T 1 ) If σ n =0, then A is singular and the entire process above must be modified slightly but carefully. If r is the rank of A (the number of nonzero rows of the row-echelon form of A) then n r singular values of A are zero (equivalently if there are n r zero rows in the row-echelon form of A), so Σ 1 is not defined, and we define the pseudo-inverse Σ + of Σ as Σ + = diag(σ 1 1,σ 1 2,..., σ 1 r, 0,..., 0) 23 Thus we can define the first r columns of U via AV Σ + and to complete U we choose any n r orthonormal vectors which are also orthogonal to span{u 1,u 2,...,u r }, via, for example, Gram-Schmidt Recall that the SVD is defined for even nonsquare matrices In this case, the above process is modified to permit U and V to have different sizes If A R m n, then U R m m Σ R m n V R n n 24 In the case m n: a 11 a 12 a 1n a 21 a 22 a 2n A =..... a m1 a mn = u 11 u 12 u 1n u 1m u 21 u 22 u 2n u 2m..... u m1 u mn u mm σ σ v 11 v 12 v 1n v 0 σ 21 v 22. n v n1 v nn 0 0 or, in another incarnation of the SVD (the reduced SVD) u 11 u 12 u 1n u 21 u 22. A =.... u m1 u mn σ σ σ n v 11 v 12 v 1n v 21 v v n1 v nn where the matrix U is no longer square (so it can t be orthogonal) but still has orthonormal columns 25 If m n: A = a 11 a 12 a 1m a 1n a 21 a 22 a 2n..... a m1 a mn = u 11 u 12 u 1n u 21 u u n1 u nn σ σ σ n 0 0 v 11 v 12 v 1n v 1m v 21 v 22 v 2n v 2m..... v n1 v n2 v nn v nm..... v m1 v mn v mm 26 In which case the reduced SVD is A = u 11 u 12 u 1n u 21 u σ σ v 11 v 12 v 1n v 1m v 21 v 22 v 2n v 2m..... u n1 u nn 0 σ n v n1 v n2 v nn v nm ******************************************************************* 3 Properties of the SVD Recall r is the rank of A; the number of nonzero singular values of A range (A) = span {u 1,u 2,...,u r } range (A T ) = span {v 1,v 2,...,v r } null (A) = span {v r+1,v r+2,..., v n } null (A T ) = span {u r+1,u r+2,..., u m } 27 For A R n n, det A = n i=1 σ i The SVD of an m n matrix A leads to an easy proof that the image of the unit sphere S n 1 under left-multiplication by A is a hyperellipse with semimajor axes of length σ 1,σ 2,...,σ n The condition number of an m n matrix A, with m n, is κ(a) = σ 1 σ n Used in numerics, κ(a) is a measure of how close A is to being singular with respect to floating-point computation The 2-norm of A is A 2 := sup { } Ax 2 x 2 =1 The Frobenius norm of A is A F := ( m i=1 n j=1 ) 1/2 a 2 ij 28 We have A 2 = σ 1 and A F = σ σ σ2 n since both matrix norms are invariant under orthogonal transformations (multiplication by orthogonal matrices) Note that although the singular values of A are uniquely determined, the left and right singular vectors are only determined up to a sequence of sign choices for the columns of either U or V So the SVD is not generally unique, there are 2 (max m,n) possible SVD s for a given matrix A If we fix signs for, say, column 1 of V, then the sign for column 1 of U is determined - recall AV = UΣ 29 4 Symmetric Orthogonalization For nonsingular A, the matrix L := UV T is called the symmetric orthogonalization of the matrix A L is unique since any sequence of sign choices for the columns of V determines a sequence of signs for the columns of U L ij = U i1 (V T ) 1j + U i2 (V T ) 2j + U i3 (V T ) 3j + + U in (V T ) ni = U i1 V j1 + U i2 V j2 + U i3 V j3 + + U in V jn Like Gram-Schmidt orthogonalization, it takes as input a linearly independent set (the columns of A) and outputs an orthonormal set 30 (Classical) Gram-Schmidt is unstable due to repeated subtractions; Modifed Gram-Schmidt remedies this But occasionally we want to disturb the original set of vectors as little as possible Theorem 4.1 Over all orthogonal matrices Q,, A Q F Q = L. is minimized when 31 Figure 6: The columns of L := UV T and the columns of A 32 5 Applications of the SVD Symmetric Orthogonalization was invented by a Swedish chemist, Per-Olov Löwdin, for the purpose of orthogonalizing hybrid electron orbitals Also has application in 4G wireless communication standard, Orthogonal Frequency-Division Multiplexing (OFDM) Nonorthogonal carrier waves with ideal properties, good time-frequency localization, orthogonalized in this manner have maximal TF-localization among all orthogonal carriers Carrier waves are continuous (complex-valued) functions and not matrices, but there is an inner product defined for pairs of carrier waves via integration With that inner product, the Gram matrix of the set of carrier waves can be computed 33 The symmetrically orthogonalized Gram matrix is then used to provide coefficients for linear combinations of the carrier waves These linear combinations are orthogonal (hence suitable for OFDM) and optimally TF-localized The SVD also has a natural application to finding the least squares solution to Ax = b (i.e., a vector x with minimal Ax b 2 ) where Ax = b is inconsistent (e.g., A R m n,m n,r= n) But perhaps the most visually striking property of the SVD comes from an application in image compression 34 We can rewrite Σ as σ σ = 0 σ n σ } 0 {{ 0 } Σ σ } {{ 0 } Σ } {{ σ n } Σ n = Σ 1 + Σ Σ n Now consider the SVD A = U 1 U 2 U n ( Σ 1 + Σ Σ n ) (V 1 ) T (V 2 ) T. (V n ) T 35 and focus on, say, the first term U 1 U 2 U n ( Σ 1 ) (V 1 ) T (V 2 ) T. (V n ) T = σ 1U (V 1 ) T (V 2 ) T. (V n ) T = σ 1U (V 1 ) T 0. 0 = σ 1 U 1 (V 1 ) T In general UΣ k V T = σ k U k (V k ) T So A = n σ j U j (V j ) T j=1 which is an expression of A as a sum of rank-one matrices 36 In this representation of A, we can consider partial sums For any k with 1 k n, define A (k) = k σ j U j (V j ) T j=1 This amounts to discarding the smallest n k singular values and their corresponding singular vectors, and storing only the V j s and the s j U j s Theorem 5.1 Among all rank-k matrices P, A P F is minimized for P = A (k) Theorem 5.1 says that the k th partial sum of A (n) captures as much of the energy of A as possible 37 Example Consider the 320-by-200-pixel image below This is stored as a matrix of grayscale values, between 0 (black) and 1 (white), denoted by A clown We can take the SVD of A clown 38 By Theorem 5.1, A (k) clown is the best rank-k approximation to A clown, measured by the Frobenius norm Storage required for A (k) clown is a total of ( ) k bytes for storing σ 1u 1 through σ k u k and v 1 through v k = 64, 000 bytes required to store A clown explicitly Now consider the rank-20 approximation to the original image, and the difference between the images 39 Figure 7: Rank-20 approximation A (20) clown and A A(20) clown 40 The original image took 64 kb, while the low-rank approximation required ( ) 20 = 10.4 kb, a compression ratio of.1625 The SVD can also make you rich - but that s a topic for another time... For further investigation, see Numerical Linear Algebra by Trefethen Applied Numerical Linear Algebra by Demmel Matrix Analysis by Horn and Johnson Matrix Computations by Golub and van Loan 41
Related Search
Similar documents
View more...
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks