## 矩陣導數

$f(x_1,\ldots,x_n)$ 為一多變量可導函數，或記為 $f(\mathbf{x})$，其中 $\mathbf{x}=(x_1,\ldots,x_n)^T$。我們定義 $f$ 的梯度 (gradient) 為下列 $n$ 維向量：

$\displaystyle \nabla f=\frac{\partial f}{\partial \mathbf{x}}=\begin{bmatrix} \displaystyle\frac{\partial f}{\partial x_1}\\[0.8em] \displaystyle\frac{\partial f}{\partial x_2}\\[0.8em] \vdots\\[0.8em] \displaystyle\frac{\partial f}{\partial x_n} \end{bmatrix}$

$\displaystyle J=\begin{bmatrix} \displaystyle\frac{\partial f_1}{\partial\mathbf{x}}&\displaystyle\frac{\partial f_2}{\partial\mathbf{x}}&\cdots&\displaystyle\frac{\partial f_m}{\partial\mathbf{x}} \end{bmatrix}^T=\begin{bmatrix} \displaystyle\frac{\partial f_1}{\partial x_1}&\displaystyle\frac{\partial f_1}{\partial x_2}&\cdots&\displaystyle\frac{\partial f_1}{\partial x_n}\\[0.8em] \displaystyle\frac{\partial f_2}{\partial x_1}&\displaystyle\frac{\partial f_2}{\partial x_2}&\cdots&\displaystyle\frac{\partial f_2}{\partial x_n}\\[0.8em] \vdots&\vdots&\ddots&\vdots\\[0.8em] \displaystyle\frac{\partial f_m}{\partial x_1}&\displaystyle\frac{\partial f_m}{\partial x_2}&\cdots&\displaystyle\frac{\partial f_m}{\partial x_n} \end{bmatrix}$

$\displaystyle H=\begin{bmatrix} \displaystyle\frac{\partial}{\partial\mathbf{x}}\frac{\partial f}{\partial x_1}&\displaystyle\frac{\partial}{\partial\mathbf{x}}\frac{\partial f}{\partial x_2}&\cdots&\displaystyle\frac{\partial}{\partial\mathbf{x}}\frac{\partial f}{\partial x_n} \end{bmatrix}^T=\begin{bmatrix} \displaystyle\frac{\partial^2f}{\partial x_1\partial x_1}&\displaystyle\frac{\partial^2f}{\partial x_2\partial x_1}&\cdots&\displaystyle\frac{\partial^2f}{\partial x_n\partial x_1}\\[1em] \displaystyle\frac{\partial^2 f}{\partial x_1\partial x_2}&\displaystyle\frac{\partial^2 f}{\partial x_2\partial x_2}&\cdots&\displaystyle\frac{\partial^2 f}{\partial x_n\partial x_2}\\ \vdots&\vdots&\ddots&\vdots\\ \displaystyle\frac{\partial^2 f}{\partial x_1\partial x_n}&\displaystyle\frac{\partial^2f}{\partial x_2\partial x_n}&\cdots&\displaystyle\frac{\partial^2 f}{\partial x_n\partial x_n} \end{bmatrix}$

$\begin{array}{ccccc} \hbox{type}&\vline&\hbox{scalar}&\hbox{vector}&\hbox{matrix}\\ \hline \hbox{scalar}&\vline&\displaystyle\frac{\partial y}{\partial x}&\displaystyle\frac{\partial\mathbf{y}}{\partial x}&\displaystyle\frac{\partial Y}{\partial x}\\[0.8em] \hbox{vector}&\vline&\displaystyle\frac{\partial y}{\partial\mathbf{x}}&\displaystyle\frac{\partial\mathbf{y}}{\partial\mathbf{x}}&\\[0.8em] \hbox{matrix}&\vline&\displaystyle\frac{\partial y}{\partial X}& & \\ [0.8em]\hline \end{array}$

$\displaystyle \left(\frac{\partial\mathbf{y}}{\partial\mathbf{x}}\right)_{ij}\equiv\frac{\partial y_i}{\partial x_j},~~i=1,\ldots,m,~~j=1,\ldots,n$

$\displaystyle \frac{\partial\mathbf{y}}{\partial\mathbf{x}}=\begin{bmatrix} \displaystyle\frac{\partial\mathbf{y}}{\partial x_1}&\displaystyle\frac{\partial\mathbf{y}}{\partial x_2}&\cdots&\displaystyle\frac{\partial\mathbf{y}}{\partial x_n} \end{bmatrix}=\begin{bmatrix} \displaystyle\frac{\partial y_1}{\partial\mathbf{x}}\\[0.8em] \displaystyle\frac{\partial y_2}{\partial\mathbf{x}}\\[0.8em] \vdots\\[0.8em] \displaystyle\frac{\partial y_m}{\partial\mathbf{x}} \end{bmatrix}=\begin{bmatrix} \displaystyle\frac{\partial y_1}{\partial x_1}&\displaystyle\frac{\partial y_1}{\partial x_2}&\cdots&\displaystyle\frac{\partial y_1}{\partial x_m}\\[0.8em] \displaystyle\frac{\partial y_2}{\partial x_1}&\displaystyle\frac{\partial y_2}{\partial x_2}&\cdots&\displaystyle\frac{\partial y_2}{\partial x_m}\\[0.8em] \vdots&\vdots&\ddots&\vdots\\[0.8em] \displaystyle\frac{\partial y_m}{\partial x_1}&\displaystyle\frac{\partial y_m}{\partial x_2}&\cdots&\displaystyle\frac{\partial y_m}{\partial x_n} \end{bmatrix}$

$\displaystyle \left(\frac{\partial\mathbf{y}}{\partial\mathbf{x}}\right)_{ij}\equiv\frac{\partial y_j}{\partial x_i},~~i=1,\ldots,n,~~j=1,\ldots,m$

$\displaystyle \frac{\partial\mathbf{y}}{\partial\mathbf{x}}=\begin{bmatrix} \displaystyle\frac{\partial y_1}{\partial\mathbf{x}}&\displaystyle\frac{\partial y_2}{\partial\mathbf{x}}&\cdots&\displaystyle\frac{\partial y_m}{\partial\mathbf{x}} \end{bmatrix}=\begin{bmatrix} \displaystyle\frac{\partial\mathbf{y}}{\partial x_1}\\[0.8em] \displaystyle\frac{\partial\mathbf{y}}{\partial x_2}\\[0.8em] \vdots\\[0.8em] \displaystyle\frac{\partial\mathbf{y}}{\partial x_n} \end{bmatrix}=\begin{bmatrix} \displaystyle\frac{\partial y_1}{\partial x_1}&\displaystyle\frac{\partial y_2}{\partial x_1}&\cdots&\displaystyle\frac{\partial y_m}{\partial x_1}\\[0.8em] \displaystyle\frac{\partial y_1}{\partial x_2}&\displaystyle\frac{\partial y_2}{\partial x_2}&\cdots&\displaystyle\frac{\partial y_m}{\partial x_2}\\[0.8em] \vdots&\vdots&\ddots&\vdots\\[0.8em] \displaystyle\frac{\partial y_1}{\partial x_n}&\displaystyle\frac{\partial y_2}{\partial x_n}&\cdots&\displaystyle\frac{\partial y_m}{\partial x_n} \end{bmatrix}$

$\displaystyle \left(\frac{\partial y}{\partial X}\right)_{ij}\equiv\frac{\partial y}{\partial x_{ij}},~~i=1,\ldots,m,~~j=1,\ldots,n$

$\displaystyle \left(\frac{\partial Y}{\partial x}\right)_{ij}\equiv\frac{\partial y_{ij}}{\partial x},~~i=1,\ldots,m,~~j=1,\ldots,n$

(VV-1) $\displaystyle\frac{\partial\mathbf{a}}{\partial\mathbf{x}}=0$

(VV-2) $\displaystyle\frac{\partial\mathbf{x}}{\partial\mathbf{x}}=I$

(VV-3) $\displaystyle\frac{\partial A\mathbf{x}}{\partial\mathbf{x}}=A^T$

$\displaystyle \left(\frac{\partial A\mathbf{x}}{\partial\mathbf{x}}\right)_{ij}=\frac{\partial \sum_{k}a_{jk}x_k}{\partial x_i}=\sum_ka_{jk}\frac{\partial x_k}{\partial x_i}=\sum_{k}a_{jk}\delta_{ik}=a_{ji}=(A^T)_{ij}$

(VV-4) $\displaystyle\frac{\partial\mathbf{x}^TA}{\partial\mathbf{x}}=A$

$\displaystyle \left(\frac{\partial \mathbf{x}^TA}{\partial\mathbf{x}}\right)_{ij}=\frac{\partial \sum_{k}x_ka_{kj}}{\partial x_i}=\sum_ka_{kj}\frac{\partial x_k}{\partial x_i}=\sum_{k}a_{kj}\delta_{ik}=a_{ij}=(A)_{ij}$

(VV-5) $\displaystyle\frac{\partial a\mathbf{u}}{\partial\mathbf{x}}=a\frac{\partial\mathbf{u}}{\partial\mathbf{x}}$

$\displaystyle \left(\frac{\partial a\mathbf{u}}{\partial\mathbf{x}}\right)_{ij}=\frac{\partial au_j}{\partial x_i}=a\frac{\partial u_j}{\partial{x}_i}=a\left(\frac{\partial\mathbf{u}}{\partial\mathbf{x}}\right)_{ij}=\left(a\frac{\partial\mathbf{u}}{\partial\mathbf{x}}\right)_{ij}$

(VV-6) $\displaystyle\frac{\partial A\mathbf{u}}{\partial\mathbf{x}}=\frac{\partial\mathbf{u}}{\partial\mathbf{x}}A^T$

\displaystyle\begin{aligned} \left(\frac{\partial A\mathbf{u}}{\partial\mathbf{x}}\right)_{ij}&=\frac{\partial \sum_{k}a_{jk}u_k}{\partial x_i}=\sum_ka_{jk}\frac{\partial u_k}{\partial x_i}\\ &=\sum_k\left(\frac{\partial\mathbf{u}}{\partial\mathbf{x}}\right)_{ik}(A^T)_{kj}=\left(\frac{\partial\mathbf{u}}{\partial\mathbf{x}}A^T\right)_{ij}.\end{aligned}

(VV-7) $\displaystyle\frac{\partial(\mathbf{u}+\mathbf{v})}{\partial\mathbf{x}}=\frac{\partial\mathbf{u}}{\partial\mathbf{x}}+\frac{\partial\mathbf{v}}{\partial\mathbf{x}}$

\displaystyle\begin{aligned} \left(\frac{\partial (\mathbf{u}+\mathbf{v})}{\partial\mathbf{x}}\right)_{ij}&=\frac{\partial(u_j+v_j)}{\partial x_i}=\frac{\partial u_j}{\partial x_i}+\frac{\partial v_j}{\partial x_i}\\ &=\left(\frac{\partial\mathbf{u}}{\partial\mathbf{x}}\right)_{ij}+\left(\frac{\partial\mathbf{v}}{\partial\mathbf{x}}\right)_{ij}=\left(\frac{\partial\mathbf{u}}{\partial\mathbf{x}}+\frac{\partial\mathbf{v}}{\partial\mathbf{x}}\right)_{ij}.\end{aligned}

(VV-8) $\displaystyle\frac{\partial\mathbf{f}(\mathbf{u})}{\partial\mathbf{x}}=\frac{\partial\mathbf{u}}{\partial\mathbf{x}}\frac{\partial\mathbf{f}(\mathbf{u})}{\partial\mathbf{u}}$

\displaystyle\begin{aligned} \left(\frac{\partial\mathbf{f}(\mathbf{u})}{\partial\mathbf{x}}\right)_{ij} &=\frac{\partial f_j(\mathbf{u})}{\partial x_i}=\frac{\partial\mathbf{u}}{\partial x_i}\frac{\partial f_j}{\partial\mathbf{u}}=\sum_k\frac{\partial u_k}{\partial x_i}\frac{\partial f_j}{\partial u_k}\\ &=\sum_k\left(\frac{\partial\mathbf{u}}{\partial\mathbf{x}}\right)_{ik}\left(\frac{\partial\mathbf{f}(\mathbf{u})}{\partial\mathbf{u}}\right)_{kj}=\left(\frac{\partial\mathbf{u}}{\partial\mathbf{x}}\frac{\partial\mathbf{f}(\mathbf{u})}{\partial\mathbf{u}}\right)_{ij}. \end{aligned}

(SV-1) $\displaystyle\frac{\partial{a}}{\partial\mathbf{x}}=\mathbf{0}$

(SV-2) $\displaystyle\frac{\partial au}{\partial\mathbf{x}}=a\frac{\partial{u}}{\partial\mathbf{x}}$

(SV-3) $\displaystyle\frac{\partial(u+v)}{\partial\mathbf{x}}=\frac{\partial u}{\partial\mathbf{x}}+\frac{\partial v}{\partial\mathbf{x}}$

(SV-4) $\displaystyle\frac{\partial uv}{\partial\mathbf{x}}=u\frac{\partial v}{\partial\mathbf{x}}+v\frac{\partial u}{\partial\mathbf{x}}$

\displaystyle\begin{aligned} \left(\frac{\partial uv}{\partial\mathbf{x}}\right)_i&=\frac{\partial uv}{\partial x_i}=u\frac{\partial v}{\partial x_i}+v\frac{\partial u}{\partial x_i}=u\left(\frac{\partial v}{\partial\mathbf{x}}\right)_i+v\left(\frac{\partial u}{\partial\mathbf{x}}\right)_i\\ &=\left(u\frac{\partial v}{\partial\mathbf{x}}+v\frac{\partial u}{\partial\mathbf{x}}\right)_i.\end{aligned}

(SV-5) $\displaystyle\frac{\partial f(u)}{\partial\mathbf{x}}=\frac{\partial f(u)}{\partial u}\frac{\partial u}{\partial\mathbf{x}}$

(SV-6) $\displaystyle\frac{\partial\mathbf{u}^T\mathbf{v}}{\partial\mathbf{x}}=\frac{\partial\mathbf{u}}{\partial\mathbf{x}}\mathbf{v}+\frac{\partial\mathbf{v}}{\partial\mathbf{x}}\mathbf{u}$

\displaystyle\begin{aligned} \frac{\partial\mathbf{u}^T\mathbf{v}}{\partial\mathbf{x}}&=\frac{\partial\sum_ku_kv_k}{\partial \mathbf{x}}=\sum_k\frac{\partial u_kv_k}{\partial\mathbf{x}}=\sum_k\left(u_k\frac{\partial v_k}{\partial\mathbf{x}}+v_k\frac{\partial u_k}{\partial\mathbf{x}}\right)\\ &=\sum_k\frac{\partial v_k}{\partial\mathbf{x}}u_k+\sum_k\frac{\partial u_k}{\partial\mathbf{x}}v_k=\frac{\partial\mathbf{v}}{\partial\mathbf{x}}\mathbf{u}+\frac{\partial\mathbf{u}}{\partial\mathbf{x}}\mathbf{v}.\end{aligned}

(SV-7) $\displaystyle\frac{\partial\mathbf{u}^TA\mathbf{v}}{\partial\mathbf{x}}=\frac{\partial\mathbf{u}}{\partial\mathbf{x}}A\mathbf{v}+\frac{\partial\mathbf{v}}{\partial\mathbf{x}}A^T\mathbf{u}$

$\displaystyle \frac{\partial\mathbf{u}^TA\mathbf{v}}{\partial\mathbf{x}}=\frac{\partial\mathbf{u}^T(A\mathbf{v})}{\partial\mathbf{x}}=\frac{\partial\mathbf{u}}{\partial\mathbf{x}}(A\mathbf{v})+\frac{\partial(A\mathbf{v})}{\partial\mathbf{x}}\mathbf{u}=\frac{\partial\mathbf{u}}{\partial\mathbf{x}}A\mathbf{v}+\frac{\partial\mathbf{v}}{\partial\mathbf{x}}A^T\mathbf{u}$

(SV-8) $\displaystyle\frac{\partial\mathbf{a}^T\mathbf{x}}{\partial\mathbf{x}}=\frac{\partial\mathbf{x}^T\mathbf{a}}{\partial\mathbf{x}}=\mathbf{a}$

$\displaystyle \frac{\partial\mathbf{a}^T\mathbf{x}}{\partial\mathbf{x}}=\frac{\partial\mathbf{a}}{\partial\mathbf{x}}\mathbf{x}+\frac{\partial\mathbf{x}}{\partial\mathbf{x}}\mathbf{a}=0\mathbf{x}+I\mathbf{a}=\mathbf{a}$

(SV-9) $\displaystyle\frac{\partial\mathbf{b}^TA\mathbf{x}}{\partial\mathbf{x}}=A^T\mathbf{b}$

(SV-10) $\displaystyle\frac{\partial\mathbf{x}^TA\mathbf{x}}{\partial\mathbf{x}}=(A+A^T)\mathbf{x}$

$\displaystyle \frac{\partial\mathbf{x}^TA\mathbf{x}}{\partial\mathbf{x}}=\frac{\partial\mathbf{x}}{\partial\mathbf{x}}A\mathbf{x}+\frac{\partial\mathbf{x}}{\partial\mathbf{x}}A^T\mathbf{x}=IA\mathbf{x}+IA^T\mathbf{x}=(A+A^T)\mathbf{x}$

(SV-11) $\displaystyle\frac{\partial\mathbf{x}^T\mathbf{x}}{\partial\mathbf{x}}=2\mathbf{x}$

(SV-12) $\displaystyle\frac{\partial\mathbf{a}^T\mathbf{x}\mathbf{x}^T\mathbf{b}}{\partial\mathbf{x}}=(\mathbf{a}\mathbf{b}^T+\mathbf{b}\mathbf{a}^T)\mathbf{x}$

\displaystyle\begin{aligned} \frac{\partial\mathbf{a}^T\mathbf{x}\mathbf{x}^T\mathbf{b}}{\partial\mathbf{x}}&=\frac{\partial(\mathbf{a}^T\mathbf{x})(\mathbf{x}^T\mathbf{b})}{\partial\mathbf{x}}=\mathbf{a}^T\mathbf{x}\frac{\partial\mathbf{x}^T\mathbf{b}}{\partial\mathbf{x}}+\mathbf{x}^T\mathbf{b}\frac{\partial\mathbf{a}^T\mathbf{x}}{\partial\mathbf{x}}\\ &=(\mathbf{a}^T\mathbf{x})\mathbf{b}+(\mathbf{x}^T\mathbf{b})\mathbf{a}=\mathbf{b}(\mathbf{a}^T\mathbf{x})+\mathbf{a}(\mathbf{b}^T\mathbf{x})\\ &=(\mathbf{b}\mathbf{a}^T+\mathbf{a}\mathbf{b}^T)\mathbf{x}.\end{aligned}

(SV-13) $\displaystyle\frac{\partial^2\mathbf{x}^TA\mathbf{x}}{\partial\mathbf{x}^2}=A+A^T$

$\displaystyle\frac{\partial^2\mathbf{x}^TA\mathbf{x}}{\partial\mathbf{x}^2}=\frac{\partial}{\partial\mathbf{x}}\left(\frac{\partial\mathbf{x}^TA\mathbf{x}}{\partial\mathbf{x}}\right)=\frac{\partial(A+A^T)\mathbf{x}}{\partial\mathbf{x}}=(A+A^T)^T=A^T+A$

$\displaystyle \max_{\mathbf{x}}\Vert A\mathbf{x}-\mathbf{b}\Vert^2$

\displaystyle\begin{aligned} \Vert A\mathbf{x}-\mathbf{b}\Vert^2&=(A\mathbf{x}-\mathbf{b})^T(A\mathbf{x}-\mathbf{b})\\ &=\mathbf{x}^TA^TA\mathbf{x}-\mathbf{x}^TA^T\mathbf{b}-\mathbf{b}^TA\mathbf{x}+\mathbf{b}^T\mathbf{b}. \end{aligned}

\displaystyle\begin{aligned} \frac{\partial \Vert A\mathbf{x}-\mathbf{b}\Vert^2}{\partial \mathbf{x}}&=\frac{\partial\mathbf{x}^TA^TA\mathbf{x}}{\partial\mathbf{x}}-\frac{\partial\mathbf{x}^TA^T\mathbf{b}}{\partial\mathbf{x}}-\frac{\partial\mathbf{b}^TA\mathbf{x}}{\partial\mathbf{x}}+\frac{\partial\mathbf{b}^T\mathbf{b}}{\partial\mathbf{x}}\\ &=(A^TA+(A^TA)^T)\mathbf{x}-A^T\mathbf{b}-A^T\mathbf{b}\\ &=2A^TA\mathbf{x}-2A^T\mathbf{b}. \end{aligned}

(VS-1) $\displaystyle\frac{\partial\mathbf{a}}{\partial x}=\mathbf{0}^T$

(VS-2) $\displaystyle\frac{\partial a\mathbf{u}}{\partial x}=a\frac{\partial\mathbf{u}}{\partial x}$

(VS-3) $\displaystyle\frac{\partial A\mathbf{u}}{\partial x}=\frac{\partial\mathbf{u}}{\partial x}A^T$

(VS-4) $\displaystyle\frac{\partial(\mathbf{u}+\mathbf{v})}{\partial x}=\frac{\partial\mathbf{u}}{\partial x}+\frac{\partial\mathbf{v}}{\partial x}$

(VS-5) $\displaystyle\frac{\partial\mathbf{u}^T}{\partial x}=\left(\frac{\partial\mathbf{u}}{\partial x}\right)^T$

(VS-6) $\displaystyle\frac{\partial f(\mathbf{u})}{\partial x}=\frac{\partial\mathbf{u}}{\partial x}\frac{\partial f(\mathbf{u})}{\partial\mathbf{u}}$

(SM-1) $\displaystyle\frac{\partial a}{\partial X}=0$

(SM-2) $\displaystyle\frac{\partial au}{\partial X}=a\frac{\partial u}{\partial X}$

(SM-3) $\displaystyle\frac{\partial (u+v)}{\partial X}=\frac{\partial u}{\partial X}+\frac{\partial v}{\partial X}$

(SM-4) $\displaystyle\frac{\partial uv}{\partial X}=u\frac{\partial v}{\partial X}+v\frac{\partial u}{\partial X}$

(SM-5) $\displaystyle\frac{\partial f(u)}{\partial X}=\frac{\partial f(u)}{\partial u}\frac{\partial u}{\partial X}$

(SM-6) $\displaystyle\frac{\partial \mathbf{a}^TX\mathbf{b}}{\partial X}=\mathbf{a}\mathbf{b}^T$

\displaystyle\begin{aligned} \left(\frac{\partial \mathbf{a}^TX\mathbf{b}}{\partial X}\right)_{ij}&=\frac{\partial\sum_k\sum_la_{k}x_{kl}b_{l}}{\partial x_{ij}}=\sum_{k}\sum_la_kb_l\frac{\partial x_{kl}}{\partial x_{ij}}\\ &=\sum_{k}\sum_la_kb_l\delta_{ik}\delta_{jl}=a_ib_j=\left(\mathbf{a}\mathbf{b}^T\right)_{ij}. \end{aligned}

(SM-7) $\displaystyle\frac{\partial \mathbf{a}^TX^T\mathbf{b}}{\partial X}=\mathbf{b}\mathbf{a}^T$

$\displaystyle \frac{\partial \mathbf{a}^TX^T\mathbf{b}}{\partial X}=\frac{\partial(\mathbf{a}^TX^T\mathbf{b})^T}{\partial X}=\frac{\partial \mathbf{b}^TX\mathbf{a}}{\partial X}=\mathbf{b}\mathbf{a}^T$

(SM-8) $\displaystyle\frac{\partial \mathbf{a}^TX\mathbf{a}}{\partial X}=\frac{\partial \mathbf{a}^TX^T\mathbf{a}}{\partial X}=\mathbf{a}\mathbf{a}^T$

(SM-9) $\displaystyle\frac{\partial \mathbf{a}^TX^TX\mathbf{b}}{\partial X}=X(\mathbf{a}\mathbf{b}^T+\mathbf{b}\mathbf{a}^T)$

\displaystyle\begin{aligned} \left(\frac{\partial \mathbf{a}^TX^TX\mathbf{b}}{\partial X}\right)_{ij}&=\frac{\partial\sum_k\sum_l\sum_pa_{k}x_{pk}x_{pl}b_{l}}{\partial x_{ij}}\\ &=\sum_{k}\sum_l\sum_pa_kb_l\frac{\partial x_{pk}x_{pl}}{\partial x_{ij}}\\ &=\sum_{k}\sum_l\sum_pa_kb_l\left(x_{pk}\frac{\partial x_{pl}}{\partial x_{ij}}+x_{pl}\frac{\partial x_{pk}}{\partial x_{ij}}\right)\\ &=\sum_{k}\sum_l\sum_pa_kb_l(x_{pk}\delta_{ip}\delta_{jl}+x_{pl}\delta_{ip}\delta_{jk})\\ &=\sum_kx_{ik}a_kb_j+\sum_lx_{il}b_la_j=\left(X\mathbf{a}\mathbf{b}^T\right)_{ij}+\left(X\mathbf{b}\mathbf{a}^T\right)_{ij}\\ &=(X(\mathbf{a}\mathbf{b}^T+\mathbf{b}\mathbf{a}^T))_{ij}. \end{aligned}

(MS-1) $\displaystyle\frac{\partial A}{\partial x}=0$

(MS-2) $\displaystyle\frac{\partial aU}{\partial x}=a\frac{\partial U}{\partial x}$

(MS-3) $\displaystyle\frac{\partial (U+V)}{\partial x}=\frac{\partial U}{\partial x}+\frac{\partial V}{\partial x}$

(MS-4) $\displaystyle\frac{\partial (UV)}{\partial x}=U\frac{\partial V}{\partial x}+\frac{\partial U}{\partial x}V$

\displaystyle\begin{aligned} \left(\frac{\partial UV}{\partial x}\right)_{ij}&=\frac{\partial\sum_ku_{ik}v_{kj}}{\partial x}=\sum_k\frac{\partial u_{ik}v_{kj}}{\partial x}\\ &=\sum_ku_{ik}\frac{\partial v_{kj}}{\partial x}+\sum_k\frac{\partial u_{ik}}{\partial x}v_{kj}\\ &=\sum_k(U)_{ik}\left(\frac{\partial V}{\partial x}\right)_{kj}+\sum_k\left(\frac{\partial U}{\partial x}\right)_{ik}(V)_{kj}\\ &=U\frac{\partial V}{\partial x}+\frac{\partial U}{\partial x}V. \end{aligned}

(MS-5) $\displaystyle\frac{\partial AUB}{\partial x}=A\frac{\partial U}{\partial x}B$

\displaystyle\begin{aligned} \left(\frac{\partial AUB}{\partial x}\right)_{ij}&=\frac{\partial\sum_k\sum_la_{ik}u_{kl}b_{lj}}{\partial x}=\sum_k\sum_la_{ik}\frac{\partial u_{kl}}{\partial x}b_{lj}\\ &=\sum_k\sum_l\left(A\right)_{ik}\left(\frac{\partial U}{\partial x}\right)_{kl}\left(B\right)_{lj}=\left(A\frac{\partial U}{\partial x}B\right)_{ij}. \end{aligned}

(MS-6) $\displaystyle\frac{\partial U^{-1}}{\partial x}=-U^{-1}\frac{\partial U}{\partial x}U^{-1}$

\displaystyle\begin{aligned} \frac{\partial UU^{-1}}{\partial x}&=\frac{\partial I}{\partial x}=0\\ &=U\frac{\partial U^{-1}}{\partial x}+\frac{\partial U}{\partial x}U^{-1}. \end{aligned}

(MS-7) $\displaystyle\frac{\partial e^{xA}}{\partial x}=Ae^{xA}=e^{xA}A$

$\displaystyle e^{xA}=I+xA+\frac{1}{2}(xA)^2+\cdots$

\displaystyle\begin{aligned} \frac{\partial e^{xA}}{\partial x}&=\frac{\partial\sum_{k=0}^\infty(k!)^{-1}(xA)^k}{\partial x}=\sum_{k=0}^\infty\frac{1}{k!}\frac{\partial x^kA^k}{\partial x}\\ &=\sum_{k=1}^\infty\frac{1}{(k-1)!}x^{k-1}A^k=Ae^{xA}=e^{xA}A.\end{aligned}

[1] 維基百科：Matrix Calculus
[2] 本文選取的恆等式主要來自The Matrix CookbookThe Matrix Reference Manual

This entry was posted in 特別主題 and tagged , , , , , . Bookmark the permalink.

### 13 則回應給 矩陣導數

1. Watt Lin 說：

請問老師：
(1) 這次談到的題材，與「流形」(Manifold)有沒有關聯？
(2) 多變量可導函數 f(x1, x2, x3, …….., xn) 的散度(divergence)及旋度(curl)，有沒有定義？

• ccjou 說：

本文主要內容是以矩陣(向量可以看作nx1階矩陣)來表達一次或二次多變量函數的導數(或將許多單變量函數表示為矩陣)，這與流形沒有甚麼關係。散度與旋度是向量分析 (vector calculus, vector analysis) 的內容，向量分析是一個專門領域，與線性代數的矩陣運算有很大的不同。請參考維基百科介紹：
http://en.wikipedia.org/wiki/Vector_calculus

• Watt Lin 說：

感謝老師的說明。

2. 陳威丞 說：

http://zh.wikipedia.org/zh-tw/%E6%AC%A7%E5%87%A0%E9%87%8C%E5%BE%97%E7%A9%BA%E9%97%B4
老師~~
這個網頁文章中的實數座標空間的部分裡面所描述的標準基那裏有一個算式我搞不太清楚~~
大概是在第十行的式子部分,裡頭說X=summation各元素乘上標準基,很奇怪的是X不是應該是一個座標嗎~???
為什麼又可以表示一個純量的和,兩者是在定義上有甚麼不一樣嗎～??
還是有甚麼地方我看錯或者是搞混了~~??

• 陳威丞 說：

哈~~老師抱歉我看懂了!!!是我自己沒看清楚XD

3. 張盛東 說：

老師，我在internet上看到幾種對矩陣函數關於矩陣的導數的定義（dF(x)/dX），這裡的X是矩陣, F(X)是一矩陣函數。不知老師對矩陣函數的關於矩陣的導數是否有研究？

• ccjou 說：

我對矩陣函數$Y=F(X)$關於$X$的導數沒啥研究，因為我自己並未使用這個工具(多變量統計可能需要)。過去我曾經大概介紹Kronecker product (https://ccjou.wordpress.com/2011/02/16/kronecker-%E7%A9%8D/)，但沒有再繼續討論tensor product。這個主題對線性代數讀者可能稍嫌冷僻，不過還是可以將它放入未來的書寫計劃中。

• 張盛東 說：

感謝老師回答。

4. abc 說：

請問一下(∂xAx^T)/∂x 及 ∂xA/∂x怎麼計算?

• ccjou 說：

請先確定x是1*n還是n*1，你可以在上文找到解答。

5. 陈宇 說：

请问老师的联系方式，个人有一些问题请教。

• ccjou 說：

請至本站留言版。

• 陈宇 說：

回去又看了一遍帖子，问题解决了，多谢。