2014年巴西世界杯_田径世界杯 - dutugo.com

瑞利商(Rayleigh Quotient)及瑞利定理(Rayleigh-Ritz theorem)的证明

注 数学系列为本人学习笔记,水平有限,错误在所难免,请读者不吝指正。

证明主体部分来自下面的链接。 https://www.planetmath.org/RayleighRitzTheorem

先来看几个基本概念

复平面(Complex Plane)

考虑形如

a

+

b

i

a+bi

a+bi 的复数,该数代表复平面上的一个点。复平面中

x

x

x 轴代表实数部分,

y

y

y 轴代表虚数部分,这样

a

+

b

i

a+bi

a+bi 在复平面上就代表坐标为

(

a

,

b

)

(a,b)

(a,b) 的一个点。复数

a

+

b

i

a + bi

a+bi 也可以看作在复平面上以原点

(

0

,

0

)

(0,0)

(0,0) 为出发点,以

(

a

,

b

)

(a,b)

(a,b) 为终点的向量。这样,对于复数的加减就相当于对复平面上的向量进行加减。

复共轭(complex conjugate) 定义复数

z

=

a

+

b

i

z=a+bi

z=a+bi 的共轭

z

z^*

z∗ 为

z

=

a

b

i

z^* = a - bi

z∗=a−bi 。

两个有用的公式

z

1

×

z

2

=

(

z

1

×

z

2

)

(1)

z^*_1 \times z^*_2 = (z_1 \times z_2)^* \tag{1}

z1∗​×z2∗​=(z1​×z2​)∗(1)

z

1

+

z

2

=

(

z

1

+

z

2

)

(2)

z^*_1 + z^*_2 = (z_1 + z_2)^* \tag{2}

z1∗​+z2∗​=(z1​+z2​)∗(2) 例如 ,

z

1

=

3

+

2

i

z_1 = 3 + 2i

z1​=3+2i,

z

2

=

1

i

z_2 = 1 - i

z2​=1−i,则

z

1

×

z

2

=

(

3

2

i

)

×

(

1

+

i

)

=

5

+

i

z

1

×

z

2

=

(

3

+

2

i

)

×

(

1

i

)

=

5

i

z

1

+

z

2

=

(

3

2

i

)

+

(

1

+

i

)

=

4

i

z

1

+

z

2

=

(

3

+

2

i

)

+

(

1

i

)

=

4

+

i

z^*_1 \times z^*_2 = (3-2i) \times (1 + i) = 5 +i \\ z_1 \times z_2 = (3+2i) \times (1-i) = 5 - i \\ z^*_1 + z^*_2 = (3-2i) + (1+i) = 4 - i \\ z_1 + z_2 = (3+2i) + (1-i) = 4 + i

z1∗​×z2∗​=(3−2i)×(1+i)=5+iz1​×z2​=(3+2i)×(1−i)=5−iz1∗​+z2∗​=(3−2i)+(1+i)=4−iz1​+z2​=(3+2i)+(1−i)=4+i

矩阵特征值和特征向量的共轭 如果

A

\bf A

A 是实数矩阵,并且

A

x

=

λ

x

{\bf Ax} = \lambda {\bf x}

Ax=λx 那么

A

x

=

λ

x

{\bf A}{\bf x}^* = \lambda^* {\bf x}^*

Ax∗=λ∗x∗

复数和其共轭相乘或相加得实数 即

z

+

z

R

z

×

z

R

z + z^* \in {\Bbb R} \\ z \times z^* \in {\Bbb R}

z+z∗∈Rz×z∗∈R 一些有用的公式

(

a

+

b

i

)

2

=

a

2

+

b

2

(

a

+

b

i

)

(

a

b

i

)

=

a

2

+

b

2

1

a

+

b

i

=

1

a

+

b

i

a

b

i

a

b

i

=

a

b

i

a

2

+

b

2

\begin{aligned} |(a+bi)|^2 & = a^2 + b^2 \\[2ex] (a+bi)(a-bi) & = a^2 + b^2 \\[2ex] \frac{1}{a+bi} & = \frac{1}{a+bi} \frac{a - bi}{a - bi} = \frac{a-bi}{a^2 + b^2} \end{aligned}

∣(a+bi)∣2(a+bi)(a−bi)a+bi1​​=a2+b2=a2+b2=a+bi1​a−bia−bi​=a2+b2a−bi​​ 在单位元上,即

a

2

+

b

2

=

1

a^2+b^2 = 1

a2+b2=1 时,

(

a

+

b

i

)

1

=

a

b

i

(a+bi)^{-1} = a - bi

(a+bi)−1=a−bi,即

1

/

z

=

z

1/z = z^*

1/z=z∗ 。

复数的绝对值

z

=

a

+

b

i

=

a

2

+

b

2

2

|z| = |a+bi| = \sqrt[2]{a^2 + b^2}

∣z∣=∣a+bi∣=2a2+b2

z

|z|

∣z∣ 通常还被记为

r

r

r 。当

a

2

+

b

2

=

1

a^2+b^2 = 1

a2+b2=1 时,

r

r

r 就是单位圆的半径。

z

z

z 和

x

x

x 轴的夹角记为

θ

\theta

θ,

z

z

z 平方后与

x

x

x 轴的夹角变为

2

θ

2\theta

2θ。

复数的指数形式

z

=

r

cos

θ

+

i

r

sin

θ

=

r

e

i

θ

z

n

=

r

n

cos

n

θ

+

i

r

n

sin

n

θ

=

r

n

e

i

n

θ

z = r\cos\theta + ir\sin\theta = re^{i\theta} \\ z^n = r^n\cos n\theta + ir^n\sin n\theta = r^ne^{in\theta}

z=rcosθ+irsinθ=reiθzn=rncosnθ+irnsinnθ=rneinθ 设

z

=

r

cos

θ

+

i

r

sin

θ

z' = r'\cos\theta' + ir'\sin\theta'

z′=r′cosθ′+ir′sinθ′,则

z

×

z

=

(

r

cos

θ

+

i

r

sin

θ

)

×

(

r

cos

θ

+

i

r

sin

θ

)

=

r

r

(

cos

(

θ

+

θ

)

+

i

sin

(

θ

+

θ

)

)

z \times z' = (r\cos\theta + ir\sin\theta) \times (r'\cos\theta' + ir'\sin\theta') \\ = rr'(\cos(\theta + \theta')+i\sin(\theta + \theta'))

z×z′=(rcosθ+irsinθ)×(r′cosθ′+ir′sinθ′)=rr′(cos(θ+θ′)+isin(θ+θ′))

厄米特矩阵(Hermitian Matrix)

对于实数向量

x

\bf x

x,其长度平方(length squared)为

x

1

2

+

x

2

2

+

+

x

n

2

x_1^2 + x_2^2 + \cdots + x_n^2

x12​+x22​+⋯+xn2​。但对于复数向量

z

\bf z

z,长度平方就不是

z

1

2

+

z

2

2

+

+

z

n

2

z^2_1 + z^2_2 + \cdots + z_n^2

z12​+z22​+⋯+zn2​,比如向量

(

1

,

i

)

(1, i)

(1,i),如果还按照实数向量长度平方的定义,则

1

2

+

i

2

=

0

1^2 + i^2=0

12+i2=0 。如果这么定义,那么一个非零向量的长度平方就有可能是

0

0

0,这不是一个好的定义。并且这么定义,长度平方还有可能是复数。因此对于复数向量

z

\bf z

z,我们定义

z

T

z

=

z

2

{\bf z}^{*T}{\bf z} = ||{\bf z}||^2

z∗Tz=∣∣z∣∣2 。 我们记

z

T

=

z

H

{\bf z}^{*T} = {\bf z}^H

z∗T=zH,例如

A

=

[

1

i

0

1

+

i

]

{\bf A} = \begin{bmatrix} 1 & i \\ 0 & 1 + i \end{bmatrix}

A=[10​i1+i​] 则

A

H

=

[

1

0

i

1

i

]

{\bf A}^H = \begin{bmatrix} 1 & 0 \\ -i & 1 - i \end{bmatrix}

AH=[1−i​01−i​],即

A

H

{\bf A}^H

AH 为对

A

{\bf A}

A 转置后再取其复共轭。

对于实向量,

x

T

x

=

x

2

{\bf x}^T{\bf x} = ||{\bf x}||^2

xTx=∣∣x∣∣2 ,对于复向量,

z

H

z

=

z

2

{\bf z}^H{\bf z} = ||{\bf z}||^2

zHz=∣∣z∣∣2 。考虑

x

T

x

{\bf x}^T{\bf x}

xTx 就是

x

\bf x

x 和其自身的内积,我们定义复向量

u

\bf u

u 和

v

\bf v

v 的内积为

u

H

v

{\bf u}^H{\bf v}

uHv,即

u

H

v

=

[

u

1

,

u

2

,

,

u

n

]

[

v

1

v

2

v

n

]

=

u

1

v

1

+

u

2

v

2

+

+

u

n

v

n

{\bf u}^H{\bf v} = [u^*_1, u^*_2, \cdots, u^*_n] \begin{bmatrix} v_1 \\ v_2 \\ \vdots \\ v_n \end{bmatrix} = u^*_1v_1 + u^*_2v_2 + \cdots + u^*_nv_n

uHv=[u1∗​,u2∗​,⋯,un∗​]⎣⎢⎢⎢⎡​v1​v2​⋮vn​​⎦⎥⎥⎥⎤​=u1∗​v1​+u2∗​v2​+⋯+un∗​vn​ 请注意,对于复向量,

u

H

v

{\bf u}^H{\bf v}

uHv 和

v

H

u

{\bf v}^H{\bf u}

vHu 是不等价的。事实上,

v

H

u

{\bf v}^H{\bf u}

vHu 是

u

H

v

{\bf u}^H{\bf v}

uHv 的复共轭。

方阵对角化 设

n

n

n 维方阵

A

\bf A

A 有

n

n

n 个线性独立的特征向量

x

1

,

x

2

,

,

x

n

\bf x_1, \bf x_2, \cdots , \bf x_n

x1​,x2​,⋯,xn​,现在将这些特征向量作为特征矩阵

X

\bf X

X 的列向量,那么

X

1

A

X

{\bf X}^{-1}{\bf AX}

X−1AX 即是特征值矩阵

Λ

\bf \Lambda

Λ 。即

X

1

A

X

=

Λ

=

[

λ

1

λ

n

]

{\bf X}^{-1}{\bf AX} = {\bf \Lambda} = \begin{bmatrix} \lambda_1 & & \\ & \ddots & \\ & & \lambda_n \end{bmatrix}

X−1AX=Λ=⎣⎡​λ1​​⋱​λn​​⎦⎤​

正交基 我们说列向量

q

1

,

q

2

,

,

q

n

q_1, q_2, \ldots, q_n

q1​,q2​,…,qn​ 是正交的,如果

q

i

T

q

j

=

{

0

,

for

i

j

1

,

for

i

=

j

q_i^Tq_j = \begin{cases} 0, & \text {for $i \neq j$} \\ 1, & \text{for $i = j$} \end{cases}

qiT​qj​={0,1,​for i​=jfor i=j​ 列向量

q

1

,

q

2

,

,

q

n

q_1, q_2, \ldots, q_n

q1​,q2​,…,qn​ 组成的矩阵

Q

\bf Q

Q 有如下性质

Q

T

Q

=

I

,

which means

Q

T

=

Q

1

{\bf Q}^T{\bf Q} = {\bf I}, \quad \text{ which means} \quad {\bf Q}^T = {\bf Q}^{-1}

QTQ=I, which meansQT=Q−1

厄米特矩阵 实对称矩阵

S

\bf S

S 可以写成

S

=

Q

Λ

Q

1

{\bf S}={\bf Q\Lambda Q}^{-1}

S=QΛQ−1 的形式,且

S

T

=

S

{\bf S}^T = {\bf S}

ST=S。复对称矩阵

S

\bf S

S,则有

S

H

=

S

{\bf S}^H = {\bf S}

SH=S 。当

S

H

=

S

{\bf S}^H = {\bf S}

SH=S 时,我们称矩阵

S

\bf S

S 为厄米特矩阵(Hermitian Matrix)。

如果

S

=

S

H

{\bf S} = {\bf S}^H

S=SH,并且

z

\bf z

z 为实或者复列向量,则

z

H

S

z

{\bf z}^H{\bf Sz}

zHSz 为实数。

每一个 Hermitian 矩阵的特征值都是实数。

Hermitian 矩阵的特征向量相互正交,即

S

z

=

λ

z

S

y

=

β

y

λ

β

}

y

H

z

=

0

\left. \begin{array}{l} {\bf Sz} = \lambda{\bf z} \\ {\bf Sy} = \beta{\bf y} \\ \lambda \neq \beta \end{array} \right\} \implies {\bf y}^H{\bf z} = 0

Sz=λzSy=βyλ​=β​⎭⎬⎫​⟹yHz=0

瑞利定理(Rayleigh theorem)

以下参考 https://www.planetmath.org/RayleighRitzTheorem 定义瑞利商(Rayleigh quotient)为

R

(

A

,

x

)

=

x

H

A

x

x

H

x

R({\bf A,x}) = \frac{{\bf x}^H{\bf Ax}}{{\bf x}^H{\bf x}}

R(A,x)=xHxxHAx​ 其中,

x

\bf x

x 为非零向量,

A

\bf A

A 为

n

×

n

n \times n

n×n Hermitian Matrix,

A

\bf A

A 的特征向量即是函数

R

(

A

,

x

)

R({\bf A,x})

R(A,x) 的驻点(critical point),特征向量相对应的特征值即为函数在该驻点的值。由此,我们可知

R

(

A

,

x

)

R({\bf A,x})

R(A,x) 的最大值等于矩阵

A

\bf A

A 最大的特征值,而最小值等于矩阵

A

\bf A

A 的最小的特征值,即

λ

m

i

n

x

H

A

x

x

H

x

λ

m

a

x

\lambda_{min} \leq \frac{{\bf x}^H{\bf Ax}}{{\bf x}^H{\bf x}} \leq \lambda_{max}

λmin​≤xHxxHAx​≤λmax​ 当向量

x

\bf x

x 是标准正交基时,即满足

x

H

x

=

1

{\bf x}^H{\bf x}=1

xHx=1 时,瑞利熵为

R

(

A

,

x

)

=

x

H

A

x

R({\bf A,x}) = {\bf x}^H{\bf Ax}

R(A,x)=xHAx

证明

首先,根据 Hermitian Matrix 性质,

x

H

A

x

{\bf x}^H{\bf Ax}

xHAx 为实数,

x

H

x

{\bf x}^H{\bf x}

xHx 显然是实数,因而

R

(

A

,

x

)

R({\bf A,x})

R(A,x) 为实数。

现在求

R

(

A

,

x

)

R({\bf A,x})

R(A,x) 的驻点

x

\overline{\bf x}

x,我们将瑞利熵简写为

R

(

x

)

R({\bf x})

R(x),即求解方程

d

R

(

x

)

d

x

=

0

T

\frac{dR(\overline{\bf x})}{d{\bf x}} = {\bf 0}^T

dxdR(x)​=0T 令

x

=

x

R

+

i

x

I

{\bf x} = {\bf x}^{R} + i{\bf x}^{I}

x=xR+ixI

x

R

{\bf x}^R

xR 和

x

I

{\bf x}^I

xI 分别是

x

\bf x

x 的实部和虚部,则有

d

R

(

x

)

d

x

=

d

R

(

x

)

d

x

R

+

i

d

R

(

x

)

d

x

I

\frac{dR({\bf x})}{d{\bf x}} = \frac{dR({\bf x})}{d{\bf x}^R} + i\frac{dR({\bf x})}{d{\bf x}^I}

dxdR(x)​=dxRdR(x)​+idxIdR(x)​ 因此,有

d

R

(

x

)

d

x

R

=

d

R

(

x

)

d

x

I

=

0

T

(0)

\frac{dR(\overline{\bf x})}{d{\bf x}^R} = \frac{dR(\overline{\bf x})}{d{\bf x}^I} = {\bf 0}^T \tag{0}

dxRdR(x)​=dxIdR(x)​=0T(0) 根据微分法则

d

R

(

x

)

d

x

R

=

d

d

x

R

(

x

H

A

x

x

H

x

)

=

d

(

x

H

A

x

)

d

x

R

(

x

H

x

)

x

H

A

x

d

(

x

H

x

)

d

x

R

(

x

H

x

)

2

=

d

(

x

H

A

x

)

d

x

R

R

(

x

)

d

(

x

H

x

)

d

x

R

x

H

x

(1)

\begin{aligned} \frac{dR({\bf x})}{d{\bf x}^R} & = \frac{d}{d{\bf x}^R}(\frac{{\bf x}^H{\bf Ax}}{{\bf x}^H{\bf x}}) \\[2ex] & = \frac{\cfrac{d({\bf x}^H{\bf Ax})}{d{\bf x}^R}({\bf x}^H{\bf x}) - {\bf x}^H{\bf Ax} \cfrac{d({\bf x}^H{\bf x})}{d{\bf x}^R}}{({\bf x}^H{\bf x})^2} \\[2ex] & = \frac{\cfrac{d({\bf x}^H{\bf Ax})}{d{\bf x}^R} - R({\bf x}) \cfrac{d({\bf x}^H{\bf x})}{d{\bf x}^R} }{{\bf x}^H{\bf x}} \end{aligned} \tag{1}

dxRdR(x)​​=dxRd​(xHxxHAx​)=(xHx)2dxRd(xHAx)​(xHx)−xHAxdxRd(xHx)​​=xHxdxRd(xHAx)​−R(x)dxRd(xHx)​​​(1) 根据矩阵微分法则

d

(

x

H

A

x

)

d

x

R

=

x

H

A

d

x

d

x

R

+

x

T

A

T

d

x

d

x

R

=

x

H

A

+

x

T

A

T

=

x

H

A

+

(

x

H

A

H

)

\begin{aligned} \frac{d({\bf x}^H{\bf Ax})}{d{\bf x}^R} & = {\bf x}^H{\bf A} \frac{d{\bf x}}{d{\bf x}^R} + {\bf x}^T{\bf A}^T \frac{d{\bf x}^*}{d{\bf x}^R} \\ \\ & = {\bf x}^H{\bf A} + {\bf x}^T{\bf A}^T \\ & = {\bf x}^H{\bf A} + ({\bf x}^H{\bf A}^H)^* \end{aligned}

dxRd(xHAx)​​=xHAdxRdx​+xTATdxRdx∗​=xHA+xTAT=xHA+(xHAH)∗​ 又因为

A

=

A

H

{\bf A} = {\bf A}^H

A=AH,所以上式变为

x

H

A

+

(

x

H

A

)

=

2

(

x

H

A

)

R

(2)

{\bf x}^H{\bf A} + ({\bf x}^H{\bf A})^* = 2({\bf x}^H{\bf A})^R \tag{2}

xHA+(xHA)∗=2(xHA)R(2) (注:矩阵微分参考手册 http://www.ee.ic.ac.uk/hp/staff/dmb/matrix/calculus.html ) 类似的,我们可以得到

d

(

x

H

x

)

d

x

R

=

2

(

x

H

)

R

(3)

\frac{d({\bf x}^H{\bf x})}{d{\bf x}^R} = 2({\bf x}^H)^R \tag{3}

dxRd(xHx)​=2(xH)R(3) 将

(

2

)

(

3

)

(2)、(3)

(2)、(3) 代入

(

1

)

(1)

(1) 得

d

R

(

x

)

d

x

R

=

2

(

x

H

A

)

R

R

(

x

)

(

x

H

)

R

x

H

x

\frac{d R({\bf x})}{d{\bf x}^R} = 2 \frac{({\bf x}^H{\bf A})^R - R({\bf x})({\bf x}^H)^R}{{\bf x}^H{\bf x}}

dxRdR(x)​=2xHx(xHA)R−R(x)(xH)R​ 根据

(

0

)

(0)

(0) 式,我们有

0

T

=

(

x

H

A

)

R

R

(

x

)

(

x

H

)

R

{\bf 0}^T = (\overline{\bf x}^H{\bf A})^R - R(\overline{\bf x})(\overline{\bf x}^H)^R

0T=(xHA)R−R(x)(xH)R 即

0

=

(

(

x

H

A

)

R

R

(

x

)

(

x

H

)

R

)

T

=

(

A

T

x

)

R

R

(

x

)

(

x

)

R

=

(

(

A

H

x

)

)

R

R

(

x

)

(

x

)

R

=

(

(

A

x

)

)

R

R

(

x

)

(

x

)

R

=

(

(

A

x

)

)

R

R

(

x

)

(

x

)

R

\begin{aligned} {\bf 0} & = ((\overline{\bf x}^H{\bf A})^R - R(\overline{\bf x})(\overline{\bf x}^H)^R)^T \\ & = ({\bf A}^T\overline{\bf x}^*)^R - R(\overline{\bf x})(\overline{\bf x}^*)^R \\ & = (({\bf A}^H\overline{\bf x})^*)^R - R(\overline{\bf x})(\overline{\bf x}^*)^R \\ & = (({\bf A}\overline{\bf x})^*)^R - R(\overline{\bf x})(\overline{\bf x}^*)^R \\ & = (({\bf A}\overline{\bf x}))^R - R(\overline{\bf x})(\overline{\bf x})^R \end{aligned}

0​=((xHA)R−R(x)(xH)R)T=(ATx∗)R−R(x)(x∗)R=((AHx)∗)R−R(x)(x∗)R=((Ax)∗)R−R(x)(x∗)R=((Ax))R−R(x)(x)R​ 由于

R

(

x

)

R(\bf x)

R(x) 为实数,因此

0

=

(

A

x

R

(

x

)

x

)

R

(I)

{\bf 0} = ({\bf A}\overline{\bf x} - R(\overline{\bf x})\overline{\bf x})^R \tag{I}

0=(Ax−R(x)x)R(I) 接下来看

d

R

(

x

)

/

d

x

I

dR({\bf x})/d{\bf x}^I

dR(x)/dxI 根据微分法则

d

R

(

x

)

d

x

I

=

d

d

x

I

(

x

H

A

x

x

H

x

)

=

d

(

x

H

A

x

)

d

x

I

(

x

H

x

)

x

H

A

x

d

(

x

H

x

)

d

x

I

(

x

H

x

)

2

=

d

(

x

H

A

x

)

d

x

I

R

(

x

)

d

(

x

H

x

)

d

x

I

x

H

x

(4)

\begin{aligned} \frac{dR({\bf x})}{d{\bf x}^I} & = \frac{d}{d{\bf x}^I}(\frac{{\bf x}^H{\bf Ax}}{{\bf x}^H{\bf x}}) \\[2ex] & = \frac{\cfrac{d({\bf x}^H{\bf Ax})}{d{\bf x}^I}({\bf x}^H{\bf x}) - {\bf x}^H{\bf Ax} \cfrac{d({\bf x}^H{\bf x})}{d{\bf x}^I}}{({\bf x}^H{\bf x})^2} \\[2ex] & = \frac{\cfrac{d({\bf x}^H{\bf Ax})}{d{\bf x}^I} - R({\bf x}) \cfrac{d({\bf x}^H{\bf x})}{d{\bf x}^I} }{{\bf x}^H{\bf x}} \end{aligned} \tag{4}

dxIdR(x)​​=dxId​(xHxxHAx​)=(xHx)2dxId(xHAx)​(xHx)−xHAxdxId(xHx)​​=xHxdxId(xHAx)​−R(x)dxId(xHx)​​​(4) 根据矩阵微分法则

d

(

x

H

A

x

)

d

x

I

=

x

H

A

d

x

d

x

I

+

x

T

A

T

d

x

d

x

I

=

i

x

H

A

i

x

T

A

T

=

i

x

H

A

(

x

H

A

H

)

\begin{aligned} \frac{d({\bf x}^H{\bf Ax})}{d{\bf x}^I} & = {\bf x}^H{\bf A} \frac{d{\bf x}}{d{\bf x}^I} + {\bf x}^T{\bf A}^T \frac{d{\bf x}^*}{d{\bf x}^I} \\ \\ & = i{\bf x}^H{\bf A} - i{\bf x}^T{\bf A}^T \\ & = i{\bf x}^H{\bf A} - ({\bf x}^H{\bf A}^H)^* \end{aligned}

dxId(xHAx)​​=xHAdxIdx​+xTATdxIdx∗​=ixHA−ixTAT=ixHA−(xHAH)∗​ 因为

A

=

A

H

{\bf A} = {\bf A}^H

A=AH,我们有

d

(

x

H

A

x

)

d

x

I

=

i

(

x

H

A

(

x

H

A

)

)

=

i

(

2

i

(

x

H

A

)

I

)

=

2

(

x

H

A

)

I

(5)

\frac{d({\bf x}^H{\bf Ax})}{d{\bf x}^I} = i({\bf x}^H{\bf A} - ({\bf x}^H{\bf A})^*) = i(2i({\bf x}^H{\bf A})^I) = -2({\bf x}^H{\bf A})^I \tag{5}

dxId(xHAx)​=i(xHA−(xHA)∗)=i(2i(xHA)I)=−2(xHA)I(5) 类似的,我们有

d

(

x

H

x

)

d

x

I

=

i

x

H

i

x

T

=

i

(

x

H

(

x

H

)

)

=

i

(

2

i

(

x

H

)

I

)

=

2

(

x

H

)

I

(6)

\frac{d({\bf x}^H{\bf x})}{d{\bf x}^I} = i{\bf x}^H - i{\bf x}^T = i({\bf x}^H - ({\bf x}^H)^*) = i(2i({\bf x}^H)^I) = -2({\bf x}^H)^I \tag{6}

dxId(xHx)​=ixH−ixT=i(xH−(xH)∗)=i(2i(xH)I)=−2(xH)I(6) 将

(

5

)

(

6

)

(5)、(6)

(5)、(6) 代入

(

4

)

(4)

(4),得

d

R

(

x

)

d

x

I

=

2

(

x

H

A

)

I

R

(

x

)

(

x

H

)

I

x

H

x

\frac{dR({\bf x})}{d{\bf x}^I} = -2 \frac{({\bf x}^H{\bf A})^I - R({\bf x})({\bf x}^H)^I}{{\bf x}^H{\bf x}}

dxIdR(x)​=−2xHx(xHA)I−R(x)(xH)I​ 根据

(

0

)

(0)

(0) 式,我们有

0

T

=

(

x

H

A

)

I

R

(

x

)

(

x

H

)

I

{\bf 0}^T = (\overline{\bf x}^H{\bf A})^I - R(\overline{\bf x})(\overline{\bf x}^H)^I

0T=(xHA)I−R(x)(xH)I 即

0

=

(

(

x

H

A

)

I

R

(

x

)

(

x

H

)

I

)

T

=

(

A

T

x

)

I

R

(

x

)

(

x

)

I

=

(

(

A

H

x

)

)

I

R

(

x

)

(

x

)

I

=

(

(

A

x

)

)

I

R

(

x

)

(

x

)

I

=

(

A

x

)

I

+

R

(

x

)

(

x

)

I

\begin{aligned} {\bf 0} & = ((\overline{\bf x}^H{\bf A})^I - R(\overline{\bf x})(\overline{\bf x}^H)^I)^T \\ & = ({\bf A}^T\overline{\bf x}^*)^I - R(\overline{\bf x})(\overline{\bf x}^*)^I \\ & = (({\bf A}^H\overline{\bf x})^*)^I - R(\overline{\bf x})(\overline{\bf x}^*)^I \\ & = (({\bf A}\overline{\bf x})^*)^I - R(\overline{\bf x})(\overline{\bf x}^*)^I \\ & = -({\bf A}\overline{\bf x})^I + R(\overline{\bf x})(\overline{\bf x})^I \end{aligned}

0​=((xHA)I−R(x)(xH)I)T=(ATx∗)I−R(x)(x∗)I=((AHx)∗)I−R(x)(x∗)I=((Ax)∗)I−R(x)(x∗)I=−(Ax)I+R(x)(x)I​ 因为

R

(

x

)

R(\overline{\bf x})

R(x) 为实数,所以

0

=

(

(

A

x

)

R

(

x

)

(

x

)

)

I

(II)

{\bf 0} = (({\bf A}\overline{\bf x}) - R(\overline{\bf x})(\overline{\bf x}))^I \tag{II}

0=((Ax)−R(x)(x))I(II) 根据

(

I

)

(

I

I

)

(I)、(II)

(I)、(II) 两式,可知

A

x

R

(

x

)

(

x

)

=

0

{\bf A}\overline{\bf x} - R(\overline{\bf x})(\overline{\bf x}) = {\bf 0}

Ax−R(x)(x)=0 而这正是我们要证明的。

参考资料

https://www.planetmath.org/RayleighRitzTheoremGilbert Strang, Introduction to Linear Algebra, Fifth Edition, 清华大学出版社