答張盛東──關於 Hessian 矩陣與多變量函數的泰勒展開式

$\displaystyle \nabla f=\begin{bmatrix} \frac{\partial f}{\partial x_1}\\[0.5em] \frac{\partial f}{\partial x_2}\\[0.5em] \vdots\\[0.5em] \frac{\partial f}{\partial x_n} \end{bmatrix}$

$\displaystyle F(\mathbf{x})=\begin{bmatrix} f_1(x_1,\ldots,x_n)\\ f_2(x_1,\ldots,x_n)\\ \vdots\\ f_m(x_1,\ldots,x_n) \end{bmatrix}$

$\displaystyle J(F)=\begin{bmatrix} (\nabla f_1)^T\\ (\nabla f_2)^T\\ \vdots\\ (\nabla f_m)^T \end{bmatrix}=\begin{bmatrix} \frac{\partial f_1}{\partial x_1}&\frac{\partial f_1}{\partial x_2}&\cdots&\frac{\partial f_1}{\partial x_n}\\[0.3em] \frac{\partial f_2}{\partial x_1}&\frac{\partial f_2}{\partial x_2}&\cdots&\frac{\partial f_2}{\partial x_n}\\[0.3em] \vdots\\[0.3em] \frac{\partial f_m}{\partial x_1}&\frac{\partial f_m}{\partial x_2}&\cdots&\frac{\partial f_m}{\partial x_n} \end{bmatrix}$

\displaystyle\begin{aligned} H(f)&=J(\nabla f)=\begin{bmatrix} \left(\nabla\frac{\partial f}{\partial x_1}\right)^T\\[0.3em] \left(\nabla\frac{\partial f}{\partial x_2}\right)^T\\[0.3em] \vdots\\[0.3em] \left(\nabla\frac{\partial f}{\partial x_n}\right)^T \end{bmatrix}=\begin{bmatrix} \frac{\partial}{\partial x_1}\left(\frac{\partial f}{\partial x_1}\right)&\frac{\partial}{\partial x_2}\left(\frac{\partial f}{\partial x_1}\right)&\cdots&\frac{\partial}{\partial x_n}\left(\frac{\partial f}{\partial x_1}\right)\\[1em] \frac{\partial}{\partial x_1}\left(\frac{\partial f}{\partial x_2}\right)&\frac{\partial}{\partial x_2}\left(\frac{\partial f}{\partial x_2}\right)&\cdots&\frac{\partial}{\partial x_n}\left(\frac{\partial f}{\partial x_2}\right)\\ \vdots&\vdots&\ddots&\vdots\\ \frac{\partial}{\partial x_1}\left(\frac{\partial f}{\partial x_n}\right)&\frac{\partial}{\partial x_2}\left(\frac{\partial f}{\partial x_n}\right)&\cdots&\frac{\partial}{\partial x_n}\left(\frac{\partial f}{\partial x_n}\right) \end{bmatrix}\\ &=\begin{bmatrix} \frac{\partial^2f}{\partial x_1\partial x_1}&\frac{\partial^2f}{\partial x_1\partial x_2}&\cdots&\frac{\partial^2f}{\partial x_1\partial x_n}\\[1em] \frac{\partial^2 f}{\partial x_2\partial x_1}&\frac{\partial^2 f}{\partial x_2\partial x_2}&\cdots&\frac{\partial^2 f}{\partial x_2\partial x_n}\\ \vdots&\vdots&\ddots&\vdots\\ \frac{\partial^2 f}{\partial x_n\partial x_1}&\frac{\partial^2f}{\partial x_n\partial x_2}&\cdots&\frac{\partial^2 f}{\partial x_n\partial x_n} \end{bmatrix}.\end{aligned}

$\displaystyle \frac{\partial^2f}{\partial x_i\partial x_j}=\frac{\partial}{\partial x_i}\left(\frac{\partial f}{\partial x_j}\right)=\frac{\partial}{\partial x_j}\left(\frac{\partial f}{\partial x_i}\right)=\frac{\partial^2f}{\partial x_j\partial x_i}$

$\displaystyle \frac{\partial^2f}{\partial x^2}=\frac{\partial}{\partial x}\frac{\partial f}{\partial x}$

$\displaystyle H(f)=\begin{bmatrix} \nabla\frac{\partial f}{\partial x_1}&\nabla\frac{\partial f}{\partial x_2}&\cdots&\nabla\frac{\partial f}{\partial x_n} \end{bmatrix}$

$\displaystyle H(f)=\begin{bmatrix} \frac{\partial}{\partial x_1}\\[0.5em] \frac{\partial}{\partial x_2}\\[0.5em] \vdots\\[0.5em] \frac{\partial}{\partial x_n} \end{bmatrix}\begin{bmatrix} \frac{\partial f}{\partial x_1}&\frac{\partial f}{\partial x_2}&\cdots&\frac{\partial f}{\partial x_n} \end{bmatrix}=\nabla\nabla^T f$

$\displaystyle f(y)=f(x)+f'(x)(y-x)+\frac{f''(x)}{2!}(y-x)^2+\frac{f'''(x)}{3!}(y-x)^3+\cdots$

$a=y-x$。將 $y=x+a$ 代入上式可得另一種表達式：

$\displaystyle f(x+a)=f(x)+f'(x)a+\frac{f''(x)}{2!}a^2+\frac{f'''(x)}{3!}a^3+\cdots$

$\displaystyle \phi(t)=\sum_{k=0}^\infty \frac{\phi^{(k)}(0)}{k!}t^k$

$\displaystyle \phi'=\sum_i\frac{\partial f}{\partial y_i}\frac{\partial y_i}{\partial t}=\sum_i\frac{\partial f}{\partial y_i}a_i =\left(\sum_ia_i\frac{\partial}{\partial y_i}\right)f=\left(\mathbf{a}^T\nabla_\mathbf{y}\right)f$

\displaystyle\begin{aligned} \phi''&=\sum_{j}\frac{\partial\phi'}{\partial y_j}\frac{\partial y_j}{\partial t}=\sum_{j}\frac{\partial}{\partial y_j}\left(\sum_{i}\frac{\partial f}{\partial y_i}a_i\right)a_j\\ &=\sum_i\sum_j\frac{\partial^2f}{\partial y_i\partial y_j}a_ia_j=\left(\sum_i\sum_ja_ia_j\frac{\partial^2}{\partial y_i\partial y_j}\right)f\\ &=\left(\sum_ia_i\frac{\partial}{\partial y_i}\right)\left(\sum_ja_j\frac{\partial}{\partial y_j}\right)f=\left(\mathbf{a}^T\nabla_{\mathbf{y}}\right)^2f .\end{aligned}

$\displaystyle f(\mathbf{x}+\mathbf{a})=\phi(1)=\sum_{k=0}^\infty\frac{1}{k!}\phi^{(k)}(0)=\sum_{k=0}^{\infty}\frac{1}{k!}\left(\mathbf{a}^T\nabla\right)^kf(\mathbf{x})$

$\displaystyle f(\mathbf{x}+\mathbf{a})=f(\mathbf{x})+\sum_ia_i\frac{\partial f}{\partial x_i}+\frac{1}{2!}\sum_{i,j}a_{i}a_{j}\frac{\partial^2f}{\partial x_i\partial x_j}+\frac{1}{3!}\sum_{i,j,k}a_ia_ja_k\frac{\partial^3f}{\partial x_i\partial x_j\partial x_k}+\cdots$

$\displaystyle f(\mathbf{x}+\mathbf{a})=f(\mathbf{x})+\mathbf{a}^T\nabla f+\frac{1}{2!}\mathbf{a}^TH(f)\mathbf{a}+\cdots$

$\displaystyle \begin{bmatrix} a_1 &\cdots&a_n \end{bmatrix}\begin{bmatrix} \frac{\partial^2f}{\partial x_1\partial x_1}&\cdots&\frac{\partial^2f}{\partial x_1\partial x_n}\\[0.5em] \vdots&\ddots&\vdots\\[0.5em] \frac{\partial^2f}{\partial x_n\partial x_1}&\cdots&\frac{\partial^2f}{\partial x_n\partial x_n} \end{bmatrix}\begin{bmatrix} a_1\\ \vdots\\ a_n \end{bmatrix}=\sum_{i=1}^n\sum_{j=1}^na_ia_j\frac{\partial^2f}{\partial x_i\partial x_j}=\left(\sum_{i=1}^na_i\frac{\partial}{\partial x_i}\right)^2f$

\displaystyle\begin{aligned} \mathbf{a}^TH(f)\mathbf{a}&=\mathbf{a}^T\begin{bmatrix} \nabla\frac{\partial f}{\partial x_1}&\cdots&\nabla\frac{\partial f}{\partial x_n} \end{bmatrix}\mathbf{a}\\ &=\begin{bmatrix} \mathbf{a}^T\nabla\frac{\partial f}{\partial x_1}&\cdots&\mathbf{a}^T\nabla\frac{\partial f}{\partial x_n} \end{bmatrix}\begin{bmatrix} a_1\\ \vdots\\ a_n \end{bmatrix}\\ &=\sum_{i=1}^n\mathbf{a}^T\nabla\frac{\partial f}{\partial x_i}a_i=\mathbf{a}^T\nabla\sum_{i=1}^na_i\frac{\partial f}{\partial x_i}\\ &=(\mathbf{a}^T\nabla)(\mathbf{a}^T\nabla)f=(\mathbf{a}^T\nabla)^2f.\end{aligned}

$\displaystyle (\mathbf{a}^T\nabla)(\mathbf{a}^T\nabla)f=(\mathbf{a}^T\nabla)(\nabla^T\mathbf{a})f=\mathbf{a}^T\nabla\nabla^T\mathbf{a}f$

[1] 本文所稱的外積是指 outer product，而非兩個三維向量的 cross product (也稱外積或向量積)。對於 $n$ 維行向量 (column vector) $\mathbf{x}$$\mathbf{y}$，外積定義為 $n\times n$ 階矩陣 $\mathbf{x}\mathbf{y}^T$

This entry was posted in 特別主題, 答讀者問 and tagged , , , . Bookmark the permalink.

1. 張盛東 說：

多謝老師指點。