Unit II: Least Squares, Determinants and Eigenvalues
14 Orthogonal Vectors and Subspaces(正交向量与子空间)
The row space of a matrix’ is orthogonal to its nullspace, and its column space is orthogonal to its left nullspace.
row space column space dimension r dimension r ⊥ ⊥ nullspace left nullspace N ( A T ) dimension n − r dimension m − r \begin{array}{cc}\text{row space}&\text{column space}\\\text{dimension }r&\text{dimension }r\\\bot&\bot\\\text{nullspace}&\text{left nullspace}N(A^T)\\\text{dimension }n-r&\text{dimension }m-r\end{array}
row space dimension r ⊥ nullspace dimension n − r column space dimension r ⊥ left nullspace N ( A T ) dimension m − r
Orthogonal vectors
正交 (orthogonal)是垂直(perpendicular)的另一种说法
两向量正交则点积为0,x T y = 0 x^Ty=0 x T y = 0 (可由勾股定理Pythagorean theorem证明)
所有向量都与零向量正交
Orthogonal subspaces
子空间S S S 与T T T 正交意味着S S S 中的任意向量与T T T 中的任意向量均正交
仅包含零向量的子空间与其他子空间正交
行空间与零空间正交,列空间与左零空间正交 (根据A x = 0 Ax=0 A x = 0 ,y T A = 0 y^TA=0 y T A = 0 显然可得)
零空间包含所有与行空间正交的向量,是行空间的正交补集 ,反之亦然
N ( A T A ) = N ( A ) N(A^TA) = N(A) N ( A T A ) = N ( A )
当A x = b Ax = b A x = b 无解时,转变为通过求解A T A x ^ = A T b A^TA\hat{x}=A^Tb A T A x ^ = A T b ,求解原方程的最优解
A T A A^TA A T A 是一个n n n 阶对称方阵,当A A A 的列向量线性无关时,A T A A^TA A T A 可逆
N ( A T A ) = N ( A ) N(A^TA) = N(A) N ( A T A ) = N ( A ) ,r ( A T A ) = r ( A ) r(A^TA) = r(A) r ( A T A ) = r ( A )
15 Projections onto Subspaces(子空间投影)
As we know, the equation A x = b Ax = b A x = b may have no solution. The vector A x Ax A x is always in the column space of A A A , and b b b is unlikely to be in the column space. So, we project b b b onto a vector p p p in the column space of A A A and solve A x ^ = p A\hat x = p A x ^ = p .
Projection matrix
对于一维的情况 —— 将向量b b b 投影到向量a a a 所在直线上:
由上图可知,投影向量p p p 即为所求,设p = x a p=xa p = x a ,a a a 与b − p b-p b − p 正交,则a T ( b − x a ) = 0 a^T(b-xa)=0 a T ( b − x a ) = 0 ,x a T a = a T b xa^Ta=a^Tb x a T a = a T b ,x = a T b a T a x=\frac{a^Tb}{a^Ta} x = a T a a T b ,所以p = a x = a a T b a T a p=ax=a\frac{a^Tb}{a^Ta} p = a x = a a T a a T b
可以将投影变换表示为矩阵形式 —— p = P b p=Pb p = P b ,由于p = a x = a a T b a T a = a a T a T a b p=ax=a\frac{a^Tb}{a^Ta}=\frac{aa^T}{a^Ta}b p = a x = a a T a a T b = a T a a a T b ,所以投影矩阵 P = a a T a T a P=\frac{aa^T}{a^Ta} P = a T a a a T
投影矩阵P P P 是一个秩1的对称矩阵,其列空间即为a a a 所在直线,两条重要性质 —— P T = P P^T=P P T = P ,P 2 = P P^2=P P 2 = P
Why project?
A x = b Ax=b A x = b 可能无解,而A x Ax A x 一定在A A A 的列空间内,此时我们可以将b b b 投影到A A A 的列空间,进而求解近似解A x ^ = p A\hat{x}=p A x ^ = p
Projection in higher dimensions
在R 3 R^3 R 3 中,将向量b b b 投影到平面,该平面的基为a 1 , a 2 a_1,a_2 a 1 , a 2 ,则该平面为A = [ a 1 a 2 ] A=\begin{bmatrix} a_1&a_2 \end{bmatrix} A = [ a 1 a 2 ] 的列空间
投影向量p = x ^ 1 a 1 + x ^ 2 a 2 = A x ^ p = \hat x_1a_1 + \hat x_2a_2 = A\hat x p = x ^ 1 a 1 + x ^ 2 a 2 = A x ^ ,b − p b-p b − p 与平面的两个基正交,则A T ( b − A x ^ ) = 0 A^T(b-A\hat x)=0 A T ( b − A x ^ ) = 0 (一些联想:b − A x ^ b-A\hat x b − A x ^ 在A A A 的左零空间中,故正交于A A A 的列空间),A T A x ^ = A T b A^TA\hat x=A^Tb A T A x ^ = A T b ,x ^ = ( A T A ) − 1 A T b \hat x=(A^TA)^{-1}A^Tb x ^ = ( A T A ) − 1 A T b
p = A x ^ = A ( A T A ) − 1 A T b p=A\hat x=A(A^TA)^{-1}A^Tb p = A x ^ = A ( A T A ) − 1 A T b ,所以投影矩阵P = A ( A T A ) − 1 A T P=A(A^TA)^{-1}A^T P = A ( A T A ) − 1 A T
两条重要性质P T = P P^T=P P T = P ,P 2 = P P^2=P P 2 = P 保持不变
Least Squares
给定一个点集{ ( t , b ) } \{(t,b)\} {( t , b )} ,拟合一条与点集最接近的直线b = C + D t b=C+Dt b = C + D t —— 最小二乘 (Least Squares)
例如,点集为{ ( 1 , 1 ) , ( 2 , 2 ) , ( 3 , 2 ) } \{(1, 1), (2, 2), (3, 2)\} {( 1 , 1 ) , ( 2 , 2 ) , ( 3 , 2 )} ,则
[ 1 1 1 2 1 3 ] [ C D ] = [ 1 2 2 ] \left[\begin{array}{cc}1&1\\1&2\\1&3\end{array}\right]\quad\left[\begin{array}{c}C\\D\end{array}\right]\quad=\quad\left[\begin{array}{c}1\\2\\2\end{array}\right] 1 1 1 1 2 3 [ C D ] = 1 2 2
A x = b Ax=b A x = b ,拟合直线并不经过所有点,所以方程组无解,为求最小二乘最优解,改为计算A T A x ^ = A T b A^TA\hat x=A^Tb A T A x ^ = A T b
16 Projection Matrices and Least Squares(投影矩阵,最小二乘)
By “closest” line we mean one that minimizes the error represented by the distance from the points to the line. We measure that error by adding up the squares of these distances.
Projections
P = A ( A T A ) − 1 A T P=A(A^TA)^{-1}A^T P = A ( A T A ) − 1 A T 是将b b b 投影到A A A 的列空间的投影矩阵,如果b b b 在A A A 列空间中,则P b = b Pb=b P b = b ;如果b b b 与A A A 列空间正交(在A A A 左零空间N ( A T ) N(A^T) N ( A T ) 中),P b = 0 Pb=0 P b = 0
b b b 在A A A 列空间上的投影矩阵为P P P ,则b b b 在A A A 左零空间N ( A T ) N(A^T) N ( A T ) 上的投影矩阵为I − P I-P I − P ,b = P b + ( I − P ) b b=Pb+(I-P)b b = P b + ( I − P ) b ,b b b 为两个投影向量之和
Least squares
A x = b Ax=b A x = b 无解,求解A x ^ = p A\hat x=p A x ^ = p (A T A x ^ = A T b A^TA\hat x=A^Tb A T A x ^ = A T b ),p p p 为b b b 在A A A 列空间上的投影向量,e = b − p e=b-p e = b − p 为误差向量(同时也是b b b 在N ( A T ) N(A^T) N ( A T ) 上的投影向量),故e e e 与p p p 正交,同时e e e 也与A A A 的列空间正交
最小二乘的优化目标是最小化∣ ∣ e ∣ ∣ 2 = ∣ ∣ A x − b ∣ ∣ 2 ||e||^2=||Ax-b||^2 ∣∣ e ∣ ∣ 2 = ∣∣ A x − b ∣ ∣ 2
b b b 中各元素对应于点集中各点的纵坐标,p p p 中各元素对应于拟合直线上各点的纵坐标,e e e 中各元素对应各点的残差
The matrix A T A A^TA A T A
"A T A A^TA A T A 是一个n n n 阶对称方阵,当A A A 的列向量线性无关时,A T A A^TA A T A 可逆"的证明:只需证明当A T A x = 0 A^TAx=0 A T A x = 0 ,一定有x = 0 x=0 x = 0 —— 当A T A x = 0 A^TAx=0 A T A x = 0 ,x T A T A x = 0 x^TA^TAx=0 x T A T A x = 0 ,即( A x ) T ( A x ) = 0 (Ax)^T(Ax)=0 ( A x ) T ( A x ) = 0 ,则一定有A x = 0 Ax=0 A x = 0 ,由于A A A 列满秩,则一定有x = 0 x=0 x = 0 ,证毕
只要A A A 的列向量无关 ,就可以使用线性回归(最小二乘)求解A x = b Ax=b A x = b 的近似解,所以重点在于确保A A A 的列向量无关,一种方式是使A A A 的列向量正交
17 Orthogonal Matrices and Gram-Schmidt(正交矩阵,施密特正交化)
Using an orthonormal basis or a matrix with orthonormal columns makes calculations much easier. The Gram-Schmidt process starts with any basis and produces an orthonormal basis that spans the same space as the original basis.
Orthonormal matrix
标准正交向量 (orthonormal vectors)q 1 , q 2 , . . . , q n q_1,q_2,...,q_n q 1 , q 2 , ... , q n —— 单位化+相互正交,可以表示为:
q i T q j = { 0 if i ≠ j 1 if i = j q_i^Tq_j=\left\{\begin{array}{ll}0&\text{if}\ i\neq j\\1&\text{if}\ i=j\end{array}\right. q i T q j = { 0 1 if i = j if i = j
标准正交向量必线性无关
正交矩阵 (需要是方阵)Q = [ q 1 . . . q n ] Q=\left[\begin{array}{cccc}{q_{1}}&{...}&{q_{n}}\\\end{array}\right] Q = [ q 1 ... q n ] ,列向量为标准正交基,Q T Q = I Q^TQ=I Q T Q = I ,Q T = Q − 1 Q^T=Q^{-1} Q T = Q − 1
对于P = A ( A T A ) − 1 A T P=A(A^TA)^{-1}A^T P = A ( A T A ) − 1 A T ,如果A A A 为列向量是标准正交基的矩阵Q Q Q ,则P = Q Q T P=QQ^T P = Q Q T ,如果Q Q Q 为方阵,则P = I P=I P = I
对于A T A x ^ = A T b A^TA\hat x=A^Tb A T A x ^ = A T b ,如果A A A 为正交矩阵Q Q Q ,则Q T Q x ^ = Q T b Q^TQ\hat x=Q^Tb Q T Q x ^ = Q T b ,由于Q T Q = I Q^TQ=I Q T Q = I ,所以x ^ = Q T b \hat x=Q^Tb x ^ = Q T b ,q i T b q_i^Tb q i T b 即为x x x 在正交基q i q_i q i 方向上的投影,正交矩阵大大简化了计算
Gram-Schmidt
如有3个不相关的变量a , b , c a,b,c a , b , c ,设求得的正交向量为A , B , C A,B,C A , B , C ,标准正交基为q 1 = A ∣ ∣ A ∣ ∣ , q 2 = B ∣ ∣ B ∣ ∣ , q 3 = C ∣ ∣ C ∣ ∣ q_1=\frac{A}{||A||},q_2=\frac{B}{||B||},q_3=\frac{C}{||C||} q 1 = ∣∣ A ∣∣ A , q 2 = ∣∣ B ∣∣ B , q 3 = ∣∣ C ∣∣ C ,计算方式如下:
A = a A=a A = a
B = b − A T b A T A A B=b-\frac{A^Tb}{A^TA}A B = b − A T A A T b A ,b b b 减掉在A A A 方向上的投影,得到与A A A 正交的向量
C = c − A T c A T A A − B T c B T B B C=c-\frac{A^Tc}{A^TA}A-\frac{B^Tc}{B^TB}B C = c − A T A A T c A − B T B B T c B ,c c c 减掉在A , B A,B A , B 方向上的投影,得到与A , B A,B A , B 正交的向量
消元法以矩阵运算的形式表示为A = L U A=LU A = LU ,施密特正交化也可以用矩阵运算的形式表示出来 —— A = Q R A=QR A = QR ,R R R 是一个上三角矩阵,R = Q T A R=Q^TA R = Q T A
18 Properties of Determinants(行列式及其性质)
The determinant encodes a lot of information about the matrix; the matrix is invertible exactly when the determinant is non-zero.
Determinants
每个方阵都有一个度量 —— 行列式(determinant)det A , ∣ A ∣ \det A,|A| det A , ∣ A ∣ ,行列式中包含了很多该方阵的信息,包括可逆、特征值等
Properties
二阶行列式的计算 —— 对角线法则∣ a b c d ∣ = a d − b c \left|\begin{array}{cc}a&b\\c&d\end{array}\right|=ad-bc a c b d = a d − b c
行列式的一系列性质:
det I = 1 \det I=1 det I = 1
交换矩阵的两行/两列,行列式取其相反数(置换矩阵的行列式为± 1 \pm1 ± 1 )
∣ t a t b c d ∣ = t ∣ a b c d ∣ \left|\begin{array}{cc}ta&tb\\c&d\end{array}\right|=t\left|\begin{array}{cc}a&b\\c&d\end{array}\right| t a c t b d = t a c b d ,∣ a + a ′ b + b ′ c d ∣ = ∣ a b c d ∣ + ∣ a ′ b ′ c d ∣ \left|\begin{array}{cc}a+a'&b+b'\\c&d\end{array}\right|=\left|\begin{array}{cc}a&b\\c&d\end{array}\right|+\left|\begin{array}{cc}a'&b'\\c&d\end{array}\right| a + a ′ c b + b ′ d = a c b d + a ′ c b ′ d
如果矩阵中有两行/两列相等,行列式为0
消法变换不改变行列式
如果矩阵中有一行/列全为0,行列式为0
上/下三角矩阵的行列式为各行主元(对角元)的乘积(product of pivots)
奇异矩阵的行列式为0(充要 )
det A B = ( det A ) ( det B ) \det AB=(\det A)(\det B) det A B = ( det A ) ( det B ) ,det A − 1 = 1 det A \det A^{-1}=\frac{1}{\det A} det A − 1 = d e t A 1
det A T = det A \det A^T=\det A det A T = det A (A = L U , ∣ A T ∣ = ∣ U T L T ∣ = ∣ L T ∣ ∣ U T ∣ = ∣ L ∣ ∣ U ∣ = ∣ A ∣ A=LU,|A^T|=|U^TL^T|=|L^T||U^T|=|L||U|=|A| A = LU , ∣ A T ∣ = ∣ U T L T ∣ = ∣ L T ∣∣ U T ∣ = ∣ L ∣∣ U ∣ = ∣ A ∣ )
It’s time to learn some (rather messy) formulas for computing it.
利用性质三将行列式分解为n ! n! n ! 个非零的类置换矩阵 (每行每列只保留一个非零元素)之和,从而得到行列式的计算公式 —— det A = ∑ n ! t e r m s ± a 1 α a 2 β a 3 γ . . . a n ω \det A=\sum_{n!~terms}\pm a_{1\alpha}a_{2\beta}a_{3\gamma}...a_{n\omega} det A = ∑ n ! t er m s ± a 1 α a 2 β a 3 γ ... a nω ,( α , β , γ , . . . ω ) (\alpha,\beta,\gamma,...\omega) ( α , β , γ , ... ω ) 是( 1 , 2 , 3 , . . . , n ) (1, 2, 3, ..., n) ( 1 , 2 , 3 , ... , n ) 的置换,正负号由置换的次数(逆序数)决定
代数余子式 (Cofactor)旨在用小的矩阵的行列式来描述(计算)一个大的矩阵的行列式
a i j a_{ij} a ij 的代数余子式C i j C_{ij} C ij 的绝对值等于将矩阵第i i i 行和第j j j 列抹去后得到的n − 1 n-1 n − 1 阶矩阵的行列式,其正负取决于i + j i+j i + j 的奇偶性(奇负偶正)
利用代数余子式求行列式 —— det A = a 11 C 11 + a 12 C 12 + ⋯ + a 1 n C 1 n \det A=a_{11}C_{11}+a_{12}C_{12}+\cdots+a_{1n}C_{1n} det A = a 11 C 11 + a 12 C 12 + ⋯ + a 1 n C 1 n
Tridiagonal matrix
三对角矩阵 (Tridiagonal matrix)指非零元素仅位于对角线上或邻近对角线的矩阵,如A 4 = [ 1 1 0 0 1 1 1 0 0 1 1 1 0 0 1 1 ] A_4=\left[\begin{array}{cccc}1&1&0&0\\1&1&1&0\\0&1&1&1\\0&0&1&1\end{array}\right] A 4 = 1 1 0 0 1 1 1 0 0 1 1 1 0 0 1 1
重要规律 :∣ A n ∣ = ∣ A n − 1 ∣ − ∣ A n − 2 ∣ |A_n|=|A_{n-1}|-|A_{n-2}| ∣ A n ∣ = ∣ A n − 1 ∣ − ∣ A n − 2 ∣ (利用代数余子式展开证明)
20 Cramer’s Rule, Inverse Matrix, and Volume(克莱姆法则,逆矩阵,体积)
If you know the coordinates for the corners of a box, then computing the volume of the box is as easy as calculating a determinant.
A − 1 = 1 det A C T A^{-1}=\frac{1}{\det A}C^T A − 1 = d e t A 1 C T ,C C C 是A A A 的代数余子式按下标顺序构成的矩阵,C T C^T C T 也被称为伴随矩阵 (通过A C T = ( det A ) I AC^{T}=(\det A)I A C T = ( det A ) I 证明)
Cramer’s Rule for x = A − 1 b x = A^{−1}b x = A − 1 b
A x = b Ax=b A x = b ,则x = A − 1 b = 1 det A C T b x=A^{-1}b=\frac{1}{\det A}C^Tb x = A − 1 b = d e t A 1 C T b ,x x x 中的每个元素可以表示为x i = det B i det A x_i=\frac{\det B_i}{\det A} x i = d e t A d e t B i ,B i B_i B i 为A A A 矩阵第i i i 列用b b b 替换掉后得到的矩阵(对于代数余子式按列展开的构造),这种通过计算n + 1 n+1 n + 1 个行列式来解方程组的方式,即为克莱姆法则 (Cramer’s Rule)
消元法的效率更高,但克莱姆法则为解方程组提供了一种新的视角
∣ det A ∣ |\det A| ∣ det A ∣ = volume of box
∣ det A ∣ |\det A| ∣ det A ∣ 是各边为A A A 列向量的长方体盒子的体积(证明该长方体的体积与行列式具有相同的性质,即可证得二者等价),对于二维则行列式表示面积,利用行列式的这一意义,可以简化面积/体积的计算
21 Eigenvalues and Eigenvectors(特征值,特征向量)
Once we’ve found an eigenvalue λ \lambda λ , we can use elimination to find the nullspace of A − λ I A-\lambda I A − λ I . The vectors in that nullspace are eigenvectors of A with eigenvalue λ \lambda λ .
Eigenvectors and eigenvalues
矩阵A A A 可看作是一种变换(函数),作用于非零向量x x x 后得到A x Ax A x ,如果A x Ax A x 平行于x x x ,则x x x 为A A A 的特征向量 ,A x = λ x Ax=\lambda x A x = λ x ,λ \lambda λ 为该特征向量对应的特征值
当λ = 0 \lambda=0 λ = 0 ,A x = 0 Ax=0 A x = 0 ,特征值0 0 0 (如果存在)对应的特征向量构成A A A 的零空间;如果A A A 是奇异矩阵,则A x = 0 Ax=0 A x = 0 有非零解,则λ = 0 \lambda=0 λ = 0 为A A A 的特征值
对于一个投影矩阵P P P ,一定存在对应特征值为1的特征向量,如果P P P 列空间不满秩,在P P P 的左零空间中,也一定存在对应特征值为0的特征向量,与P P P 的列空间正交
det ( A − λ I ) = 0 \det(A-\lambda I)=0 det ( A − λ I ) = 0
A x = λ x Ax=\lambda x A x = λ x ,( A − λ I ) x = 0 (A-\lambda I)x=0 ( A − λ I ) x = 0 ,若x ≠ 0 x\neq0 x = 0 ,( A − λ I ) (A-\lambda I) ( A − λ I ) 为奇异矩阵,则∣ A − λ I ∣ = 0 |A-\lambda I|=0 ∣ A − λ I ∣ = 0 ,这就是A A A 的特征方程 ,由此可以求出A A A 的n n n 个特征值(可能存在重根),进一步求解( A − λ I ) x = 0 (A-\lambda I)x=0 ( A − λ I ) x = 0 得到对应的特征向量x x x
A A A 的特征值为λ \lambda λ ,特征向量为x x x ,则A + k I A+kI A + k I 的特征值为λ + k \lambda+k λ + k ,特征向量仍为x x x (( A + k I ) x = A x + k x = ( λ + k ) x (A+kI)x=Ax+kx=(\lambda+k)x ( A + k I ) x = A x + k x = ( λ + k ) x )
矩阵的行列式等于特征值之积,迹 (trace)等于特征值之和
22 Diagonalization and Powers of A A A (对角化,A A A 的幂)
When a sequence evolves over time according to the rules of a first order system, the eigenvalues of the matrix of that system determine the long term behavior of the series. To get an exact formula for the series we find the eigenvectors of the matrix and then solve for the coefficients c 1 c_1 c 1 , c 2 c_2 c 2 , …
Diagonalizing a matrix S − 1 A S = Λ S^{-1}AS=\Lambda S − 1 A S = Λ
如果A A A 有n n n 个线性无关的特征向量x i x_i x i ,S = [ x 1 x 2 ⋯ x n ] S=[\begin{array}{ccccc}x_{1}&x_{2}&\cdots&x_{n}\end{array}] S = [ x 1 x 2 ⋯ x n ] ,则S S S 为可逆的方阵,所以:
A S = A [ x 1 x 2 ⋯ x n ] = [ λ 1 x 1 λ 2 x 2 ⋯ λ n x n ] = S [ λ 1 0 ⋯ 0 0 λ 2 0 ⋮ ⋱ ⋮ 0 ⋯ 0 λ n ] = S Λ \begin{aligned}
AS=A\left[\begin{array}{ccccc}x_{1}&x_{2}&\cdots&x_{n}\\\end{array}\right]=\left[\begin{array}{ccccc}\lambda_{1}x_{1}&\lambda_{2}x_{2}&\cdots&\lambda_{n}x_{n}\end{array}\right]=S\left[\begin{array}{cccc}\lambda_1&0&\cdots&0\\0&\lambda_2&&0\\\vdots&&\ddots&\vdots\\0&\cdots&0&\lambda_n\end{array}\right]=S\Lambda
\end{aligned} A S = A [ x 1 x 2 ⋯ x n ] = [ λ 1 x 1 λ 2 x 2 ⋯ λ n x n ] = S λ 1 0 ⋮ 0 0 λ 2 ⋯ ⋯ ⋱ 0 0 0 ⋮ λ n = S Λ 故S − 1 A S = Λ S^{-1}AS=\Lambda S − 1 A S = Λ ,A = S Λ S − 1 A=S\Lambda S^{-1} A = S Λ S − 1 ,所以A n x = λ n x A^nx=\lambda^nx A n x = λ n x ,A n = S Λ n S − 1 A^n=S\Lambda^nS^{-1} A n = S Λ n S − 1 ,当A A A 有n n n 个线性无关的特征向量时(不同的特征值对应的特征向量一定无关,重特征值的情况需要验证),相似对角化 提供了一种很好的计算矩阵幂的方式
Difference equations u k + 1 = A u k u_{k+1} = Au_k u k + 1 = A u k
一阶差分方程 u k + 1 = A u k u_{k+1} = Au_k u k + 1 = A u k 的解为u k = A k u 0 u_k = A^ku_0 u k = A k u 0 ,如果u 0 = c 1 x 1 + c 2 x 2 + . . . + c n x n = S c u_0=c_1x_1+c_2x_2+...+c_nx_n=Sc u 0 = c 1 x 1 + c 2 x 2 + ... + c n x n = S c (x i x_i x i 为A A A 的特征向量),则u k = A k u 0 = c 1 λ 1 k x 1 + c 2 λ 2 k x 2 + . . . + c n λ n k x n = Λ k S c u_k=A^ku_0=c_1\lambda_1^kx_1+c_2\lambda_2^kx_2+...+c_n\lambda_n^kx_n=\Lambda^kSc u k = A k u 0 = c 1 λ 1 k x 1 + c 2 λ 2 k x 2 + ... + c n λ n k x n = Λ k S c ,A A A 的特征值决定了该序列的增长速度
以斐波那契数列 为例,F k + 2 = F k + 1 + F k F_{k+2}=F_{k+1}+F_{k} F k + 2 = F k + 1 + F k ,二阶差分,可以转化为一阶差分:u k = [ F k + 1 F k ] u_{k}=\left[\begin{array}{c}{F_{k+1}}\\{F_{k}}\end{array}\right] u k = [ F k + 1 F k ] ,F k + 2 = F k + 1 + F k F k + 1 = F k + 1 \begin{array}{rcl}F_{k+2}&=&F_{k+1}+F_k\\F_{k+1}&=&F_{k+1}\end{array} F k + 2 F k + 1 = = F k + 1 + F k F k + 1 ,则u k + 1 = [ 1 1 1 0 ] u k u_{k+1}=\left[\begin{array}{cc}{1}&{1}\\{1}&{0}\end{array}\right]u_{k} u k + 1 = [ 1 1 1 0 ] u k (太妙了!!!),所以A = [ 1 1 1 0 ] A=\left[\begin{array}{cc}{1}&{1}\\{1}&{0}\end{array}\right] A = [ 1 1 1 0 ] ,进一步计算u k = A k u 0 = Λ k S c u_k=A^ku_0=\Lambda^kSc u k = A k u 0 = Λ k S c 得到斐波那契数列的通项公式 —— F k = 1 5 ( 1 + 5 2 ) k − 1 5 ( 1 − 5 2 ) k F_k=\frac{1}{\sqrt{5}}\left(\frac{1+\sqrt{5}}{2}\right)^k-\frac{1}{\sqrt{5}}\left(\frac{1-\sqrt{5}}{2}\right)^k F k = 5 1 ( 2 1 + 5 ) k − 5 1 ( 2 1 − 5 ) k
23 Differential Equations and e A t e^{At} e A t (微分方程,exp(At))
The two “pure” terms e λ 1 t x 1 e^{\lambda_1t}x_1 e λ 1 t x 1 and e λ 2 t x 2 e^{\lambda_2t}x_2 e λ 2 t x 2 are analogous to the terms λ i k x i \lambda_i^k x_i λ i k x i we saw in the solution c 1 λ 1 k x 2 + c 2 λ 2 k x 2 + . . . + c n λ n k x n c_1\lambda_1^k x_2+c_2\lambda_2^k x_2+...+c_n\lambda_n^k x_n c 1 λ 1 k x 2 + c 2 λ 2 k x 2 + ... + c n λ n k x n to the difference equation u k + 1 = A u k u_{k+1} = Au_k u k + 1 = A u k .
Differential equations d u d t = A u \frac{du}{dt}=Au d t d u = A u
对于微分方程组 d u 1 d t = − u 1 + 2 u 2 d u 2 d t = u 1 − 2 u 2 \begin{aligned}\frac{du_1}{dt}&=-u_1+2u_2\\\frac{du_2}{dt}&=u_1-2u_2\end{aligned} d t d u 1 d t d u 2 = − u 1 + 2 u 2 = u 1 − 2 u 2 ,初始条件为u ( 0 ) = [ 1 0 ] u(0)=\left[\begin{array}{c}1\\0\end{array}\right] u ( 0 ) = [ 1 0 ] ,该微分方程可以表示为d u d t = A u \frac{du}{dt}=Au d t d u = A u ,A = [ − 1 2 1 − 2 ] A=\left[\begin{array}{rr}-1&2\\1&-2\end{array}\right] A = [ − 1 1 2 − 2 ] ,通过求解A A A 的特征值λ 1 , λ 2 \lambda_1,\lambda_2 λ 1 , λ 2 ,以及对应的特征向量x 1 , x 2 x_1,x_2 x 1 , x 2 ,代入解的结构u ( t ) = c 1 e λ 1 t x 1 + c 2 e λ 2 t x 2 u(t)=c_1e^{\lambda_1t}x_1+c_2e^{\lambda_2t}x_2 u ( t ) = c 1 e λ 1 t x 1 + c 2 e λ 2 t x 2 后,根据初始条件求得c 1 , c 2 c_1,c_2 c 1 , c 2 ,得到微分方程的解u ( t ) = 1 3 [ 2 1 ] + 1 3 e − 3 t [ 1 − 1 ] u(t)=\frac13\left[\begin{array}{r}2\\1\end{array}\right]+\frac13e^{-3t}\left[\begin{array}{r}1\\-1\end{array}\right] u ( t ) = 3 1 [ 2 1 ] + 3 1 e − 3 t [ 1 − 1 ] ,故该微分方程描述的系统的稳定状态为u ( ∞ ) = [ 2 / 3 1 / 3 ] u(\infty)=\left[\begin{array}{c}2/3\\1/3\end{array}\right] u ( ∞ ) = [ 2/3 1/3 ] (u = e λ 1 t x 1 u=e^{\lambda_1t}x_1 u = e λ 1 t x 1 是d u d t = A u \frac{du}{dt}=Au d t d u = A u 的解的证明:A u = e λ 1 t A x 1 = λ 1 e λ 1 t x 1 = d u d t Au=e^{\lambda_1t}Ax_1=\lambda_1e^{\lambda_1t}x_1=\frac{du}{dt} A u = e λ 1 t A x 1 = λ 1 e λ 1 t x 1 = d t d u )
微分方程所描述的系统的一些性质:
稳定性 (Stability):当所有R e ( λ ) < 0 Re(\lambda)<0 R e ( λ ) < 0 时,u ( t ) → 0 u(t)\to0 u ( t ) → 0
稳态 (Steady state):一个特征值为0,其他特征值实数部分均为负
发散 (Blow up):存在R e ( λ ) > 0 Re(\lambda)>0 R e ( λ ) > 0
将初始条件u ( 0 ) = [ 1 0 ] u(0)=\left[\begin{array}{c}1\\0\end{array}\right] u ( 0 ) = [ 1 0 ] 代入求解c 1 , c 2 c_1,c_2 c 1 , c 2 时,可以表示为S c = u ( 0 ) Sc=u(0) S c = u ( 0 ) ,S S S 是特征向量组成的矩阵,由此,设u = S v u=Sv u = S v ,则代入d u d t = A u \frac{du}{dt}=Au d t d u = A u 得到S d v d t = A S v S\frac{dv}{dt}=ASv S d t d v = A S v ,则d v d t = S − 1 A S v = Λ v \frac{dv}{dt}=S^{-1}ASv=\Lambda v d t d v = S − 1 A S v = Λ v ,即d v i d t = λ i v i \frac{dv_i}{dt}=\lambda_iv_i d t d v i = λ i v i ,通过特征值和特征向量实现了u i u_i u i 间关系的解耦 (在原微分方程组中它们相互耦合),通解为v ( t ) = e Λ t v ( 0 ) , u ( t ) = S e Λ t S − 1 u ( 0 ) = e A t u ( 0 ) v(t)=e^{\Lambda t}v(0),u(t)=Se^{\Lambda t}S^{-1}u(0)=e^{At}u(0) v ( t ) = e Λ t v ( 0 ) , u ( t ) = S e Λ t S − 1 u ( 0 ) = e A t u ( 0 )
Matrix exponential e A t e^{At} e A t
e A t = I + A t + ( A t ) 2 2 + ( A t ) 3 6 + ⋯ e^{At}=I+At+\frac{(At)^2}2+\frac{(At)^3}6+\cdots e A t = I + A t + 2 ( A t ) 2 + 6 ( A t ) 3 + ⋯ ,以泰勒级数的方式展开,类似地,( I − A t ) − 1 = I + A t + ( A t ) 2 + ( A t ) 3 + ⋯ (I-At)^{-1}=I+At+(At)^2+(At)^3+\cdots ( I − A t ) − 1 = I + A t + ( A t ) 2 + ( A t ) 3 + ⋯
e A t = I + A t + ( A t ) 2 2 + ( A t ) 3 6 + ⋯ = S S − 1 + S Λ S − 1 t + S Λ 2 S − 1 2 t 2 + S Λ 3 S − 1 6 t 3 + ⋯ = S e Λ t S − 1 e^{At}=I+At+\frac{(At)^2}2+\frac{(At)^3}6+\cdots=SS^{-1}+S\Lambda S^{-1}t+\frac{S\Lambda^{2}S^{-1}}2t^{2}+\frac{S\Lambda^{3}S^{-1}}6t^{3}+\cdots=Se^{\Lambda t}S^{-1} e A t = I + A t + 2 ( A t ) 2 + 6 ( A t ) 3 + ⋯ = S S − 1 + S Λ S − 1 t + 2 S Λ 2 S − 1 t 2 + 6 S Λ 3 S − 1 t 3 + ⋯ = S e Λ t S − 1 ,当A A A 可以相似对角化时(存在n n n 个线性无关的特征向量)成立
Second order
对于二阶线性微分方程y ′ ′ + b y ′ + k y = 0 \begin{aligned}y''+by'+ky=0\end{aligned} y ′′ + b y ′ + k y = 0 ,可以采用类似一阶差分处理斐波那契数列的方式,来构造一阶线性微分方程组来求解:设u = [ y ′ y ] u=\left[\begin{array}{c}y'\\y\end{array}\right] u = [ y ′ y ] ,则u ′ = [ y ′ ′ y ′ ] = [ − b − k 1 0 ] [ y ′ y ] = A u u'=\left[\begin{array}{c}y''\\y'\end{array}\right]=\left[\begin{array}{cc}-b&-k\\1&0\end{array}\right]\left[\begin{array}{c}y'\\y\end{array}\right]=Au u ′ = [ y ′′ y ′ ] = [ − b 1 − k 0 ] [ y ′ y ] = A u
24 Markov Matrices; Fourier Series(马尔科可夫矩阵,傅里叶级数)
When dealing with systems of differential equations, eigenvectors with the eigenvalue λ = 0 \lambda=0 λ = 0 represented steady states. Here we’re dealing with powers of matrices and get a steady state when λ = 1 \lambda=1 λ = 1 is an eigenvalue.
Markov matrices
前景回顾:A T A^T A T 的特征值与A A A 的特征值一致(∣ A − λ x ∣ = 0 , ∣ A T − λ x ∣ = ∣ ( A − λ x ) T ∣ = 0 |A-\lambda x|=0,|A^T-\lambda x|=|(A-\lambda x)^T|=0 ∣ A − λ x ∣ = 0 , ∣ A T − λ x ∣ = ∣ ( A − λ x ) T ∣ = 0 )
形如A = [ . 1 . 01 . 3 . 2 . 99 . 3 . 7 0 . 4 ] A=\left[\begin{array}{rrr}.1&.01&.3\\.2&.99&.3\\.7&0&.4\end{array}\right] A = .1 .2 .7 .01 .99 0 .3 .3 .4 ,所有的元素非负,且每列之和均为1的矩阵,为马尔可夫矩阵 (Markov matrices),它在概率论中有重要作用,同时对幂运算封闭
对于一个微分方程组系统,特征值为0代表着稳态;而对于矩阵的幂,特征值为1代表着稳态,马尔可夫矩阵保证了一个特征值是1(A − I A-I A − I 是奇异矩阵,∣ A − I ∣ = 0 |A-I|=0 ∣ A − I ∣ = 0 ),其他特征值的绝对值均小于1(u k = A k u 0 = c 1 λ 1 k x 1 + c 2 λ 2 k x 2 + ⋅ ⋅ ⋅ + c n λ n k x n u_k=A^ku_0=c_1\lambda_1^kx_1+c_2\lambda_2^kx_2+\cdot\cdot\cdot+c_n\lambda_n^kx_n u k = A k u 0 = c 1 λ 1 k x 1 + c 2 λ 2 k x 2 + ⋅ ⋅ ⋅ + c n λ n k x n ,如果λ 1 = 1 \lambda_1=1 λ 1 = 1 且其他特征值的绝对值均小于1,则稳态是c 1 x 1 c_1x_1 c 1 x 1 )
一个实际的例子,对于u k + 1 = A u k u_{k+1}=Au_k u k + 1 = A u k ,A A A 是一个马尔可夫矩阵,u u u 表示各城市的人口数量,A A A 中的元素则表示人口的迁移率,a i j a_{ij} a ij 表示每年有多少(百分比)人从i i i 城市迁移到j j j 城市,这种情况下通过求解A A A 的特征值1对应的特征向量即可得知稳态时城市间人口的比例
在实际的应用场景中,马尔可夫矩阵可能是要求每行之和均为1(上面的定义的转置)
Fourier series and projections
正交基q 1 , q 2 , . . . , q n q_1,q_2,...,q_n q 1 , q 2 , ... , q n ,向量v v v 可以表示为v = x 1 q 1 + x 2 q 2 + . . . + x n q n v=x_1q_1+x_2q_2+...+x_nq_n v = x 1 q 1 + x 2 q 2 + ... + x n q n ,x i = q i T v x_i=q_i^Tv x i = q i T v (利用正交的性质求解 —— q i T v = x i q i T q i = x i q_i^Tv=x_iq_i^Tq_i=x_i q i T v = x i q i T q i = x i ),利用矩阵表示 —— v = Q x v=Qx v = Q x ,x = Q − 1 v = Q T v x=Q^{-1}v=Q^Tv x = Q − 1 v = Q T v
傅里叶级数 (Fourier series)就是基于一组正交基,如上例所示,f ( x ) = a 0 + a 1 cos x + b 1 sin x + a 2 cos 2 x + b 2 sin 2 x + ⋯ f(x)=a_0+a_1\cos x+b_1\sin x+a_2\cos2x+b_2\sin2x+\cdots f ( x ) = a 0 + a 1 cos x + b 1 sin x + a 2 cos 2 x + b 2 sin 2 x + ⋯ ,1 , cos x , sin x , cos 2 x , sin 2 x , ⋯ 1,\cos x,\sin x,\cos 2x,\sin 2x,\cdots 1 , cos x , sin x , cos 2 x , sin 2 x , ⋯ 正交(函数正交 —— f T g = ∫ 0 2 π f ( x ) g ( x ) d x = 0 f^Tg=\int_0^{2\pi}f(x)g(x)dx=0 f T g = ∫ 0 2 π f ( x ) g ( x ) d x = 0 ,2 π 2\pi 2 π 为函数周期),a i = 1 π ∫ 0 2 π f ( x ) cos i x d x a_i=\frac{1}{\pi}\int_{0}^{2\pi}f(x)\cos ixdx a i = π 1 ∫ 0 2 π f ( x ) cos i x d x (利用正交的性质求解 —— ∫ 0 2 π f ( x ) cos i x d x = ∫ 0 2 π ( a 0 + a 1 cos x + b 1 sin x + a 2 cos 2 x + ⋯ ) cos i x d x = 0 + ∫ 0 2 π a i cos 2 i x d x + 0 + 0 + ⋯ = a i π \int_{0}^{2\pi}f(x)\cos ixdx=\int_0^{2\pi}(a_0+a_1\cos x+b_1\sin x+a_2\cos2x+\cdots)\cos ixdx=0+\int_0^{2\pi}a_i\cos^2ixdx+0+0+\cdots=a_{i}\pi
∫ 0 2 π f ( x ) cos i x d x = ∫ 0 2 π ( a 0 + a 1 cos x + b 1 sin x + a 2 cos 2 x + ⋯ ) cos i x d x = 0 + ∫ 0 2 π a i cos 2 i x d x + 0 + 0 + ⋯ = a i π )
25 Quiz 2 Review