## 共變異數矩陣與常態分布

$\displaystyle \mathcal{N}(x\vert\mu,\sigma^2)=\frac{1}{\sqrt{2\pi}\sigma}\exp\left\{-\frac{(x-\mu)^2}{2\sigma^2}\right\}$

$\displaystyle \mathcal{N}(\mathbf{x}\vert\boldsymbol{\mu},\Sigma)=\frac{1}{(2\pi)^{n/2}\vert\Sigma\vert^{1/2}}\exp\left\{-\frac{1}{2}(\mathbf{x}-\boldsymbol{\mu})^T\Sigma^{-1}(\mathbf{x}-\boldsymbol{\mu})\right\}$

$\Delta^2=(\mathbf{x}-\boldsymbol{\mu})^T\Sigma^{-1}(\mathbf{x}-\boldsymbol{\mu})$

• 特徵值 $\lambda_1,\ldots,\lambda_n$ 是實數，
• 單位特徵向量 $\mathbf{q}_1,\ldots,\mathbf{q}_n$ 組成一個單範正交集 (orthonormal set)，即 $\mathbf{q}_i^T\mathbf{q}_j=1$$i=j$$\mathbf{q}_i^T\mathbf{q}_j=0$$i\neq j$

$Q=\begin{bmatrix} \mathbf{q}_1&\cdots&\mathbf{q}_n \end{bmatrix}$$\Lambda=\mathrm{diag}(\lambda_1,\ldots,\lambda_n)$。不難驗證 $Q$ 是一個實正交 (orthogonal) 矩陣，滿足 $Q^TQ=I$。共變異數矩陣 $\Sigma$ 可正交對角化如下：

$\Sigma=Q\Lambda Q^T=\begin{bmatrix} \mathbf{q}_1&\cdots&\mathbf{q}_n \end{bmatrix}\begin{bmatrix} \lambda_1&&\\ &\ddots&\\ &&\lambda_n \end{bmatrix}\begin{bmatrix} \mathbf{q}_1^T\\ \vdots\\ \mathbf{q}_n^T \end{bmatrix}=\displaystyle\sum_{i=1}^n\lambda_i\mathbf{q}_i\mathbf{q}_i^T$

$\Sigma^{-1}=(Q\Lambda Q^T)^{-1}=Q\Lambda^{-1}Q^T=\displaystyle\sum_{i=1}^n\frac{1}{\lambda_i}\mathbf{q}_i\mathbf{q}_i^T$

$\Delta^2=(\mathbf{x}-\boldsymbol{\mu})^TQ\Lambda^{-1}Q^T(\mathbf{x}-\boldsymbol{\mu})=\mathbf{y}^T\Lambda^{-1}\mathbf{y}=\displaystyle\sum_{i=1}^n\frac{y_i^2}{\lambda_i}$

$\mathbf{x}-\boldsymbol{\mu}=Q\mathbf{y}=\begin{bmatrix} \mathbf{q}_1&\cdots&\mathbf{q}_n \end{bmatrix}\begin{bmatrix} y_1\\ \vdots\\ y_n \end{bmatrix}=y_1\mathbf{q}_1+\cdots+y_n\mathbf{q}_n$

$\displaystyle \left(\frac{y_1}{\sqrt{\lambda_1}}\right)^2+\left(\frac{y_2}{\sqrt{\lambda_2}}\right)^2=1$

\begin{aligned} p(\mathbf{y})&=\frac{1}{(2\pi)^{n/2}\prod_{i=1}^n\lambda_i^{1/2}}\exp\left\{-\frac{1}{2}\sum_{i=1}^n\frac{y_i^2}{\lambda_i}\right\}\\ &=\prod_{i=1}^n\frac{1}{\sqrt{2\pi\lambda_i}}\exp\left\{-\frac{y_i^2}{2\lambda_i}\right\}, \end{aligned}

$\displaystyle \int p(\mathbf{y})d\mathbf{y}=\prod_{i=1}^n\int_{-\infty}^{\infty}\frac{1}{\sqrt{2\pi\lambda_i}}\exp\left\{-\frac{y_i^2}{2\lambda_i}\right\}dy_i=1$

$J=\begin{bmatrix} \displaystyle\frac{\partial x_1}{\partial y_1}&\displaystyle\frac{\partial x_1}{\partial y_2}&\cdots&\displaystyle\frac{\partial x_1}{\partial y_n}\\[1em] \displaystyle\frac{\partial x_2}{\partial y_1}&\displaystyle\frac{\partial x_2}{\partial y_2}&\cdots&\displaystyle\frac{\partial x_2}{\partial y_n}\\ \vdots&\vdots&\ddots&\vdots\\ \displaystyle\frac{\partial x_n}{\partial y_1}&\displaystyle\frac{\partial x_n}{\partial y_2}&\cdots&\displaystyle\frac{\partial x_n}{\partial y_n} \end{bmatrix}=\begin{bmatrix} q_{11}&q_{12}&\cdots&q_{1n}\\ q_{21}&q_{22}&\cdots&q_{2n}\\ \vdots&\vdots&\ddots&\vdots\\ q_{n1}&q_{n2}&\cdots&q_{nn} \end{bmatrix}=Q$

$(\det J)^2=(\det Q)(\det Q)=(\det Q)(\det Q^T)=\det(QQ^T)=\det I=1$

$\displaystyle \int \mathcal{N}(\mathbf{x}\vert\boldsymbol{\mu},\Sigma)d\mathbf{x}=\int p(\mathbf{y})\vert\det J\vert d\mathbf{y}=\int p(\mathbf{y})d\mathbf{y}=1$

\displaystyle \begin{aligned} E[x]&=\frac{1}{\sqrt{2\pi}\sigma}\int\exp\left\{-\frac{(x-\mu)^2}{2\sigma^2}\right\}xdx\\ &=\frac{1}{\sqrt{2\pi}\sigma}\int\exp\left\{-\frac{w^2}{2\sigma^2}\right\}(w+\mu)dw\\ &=\mu\frac{1}{\sqrt{2\pi}\sigma}\int\exp\left\{-\frac{w^2}{2\sigma^2}\right\}dw=\mu. \end{aligned}

$\displaystyle \int\exp\left\{-\frac{(x-\mu)^2}{2\sigma^2}\right\}dx=\sqrt{2\pi}\sigma$

$\sigma$ 求導數，可得

$\displaystyle \int\frac{(x-\mu)^2}{\sigma^3}\exp\left\{-\frac{(x-\mu)^2}{2\sigma^2}\right\}dx=\sqrt{2\pi}$

\displaystyle \begin{aligned} E[\mathbf{x}]&=\frac{1}{(2\pi)^{n/2}\vert\Sigma\vert^{1/2}}\int\exp\left\{-\frac{1}{2}(\mathbf{x}-\boldsymbol{\mu})^T\Sigma^{-1}(\mathbf{x}-\boldsymbol{\mu})\right\}\mathbf{x}d\mathbf{x}\\ &=\frac{1}{(2\pi)^{n/2}\vert\Sigma\vert^{1/2}}\int\exp\left\{-\frac{1}{2}\mathbf{w}^T\Sigma^{-1}\mathbf{w}\right\}(\mathbf{w}+\boldsymbol{\mu})d\mathbf{w}\\ &=\frac{1}{(2\pi)^{n/2}\vert\Sigma\vert^{1/2}}\left(\int\exp\left\{-\frac{1}{2}\mathbf{w}^T\Sigma^{-1}\mathbf{w}\right\}\mathbf{w}d\mathbf{w}+\boldsymbol{\mu}\int\exp\left\{-\frac{1}{2}\mathbf{w}^T\Sigma^{-1}\mathbf{w}\right\}d\mathbf{w}\right). \end{aligned}

$\displaystyle E[\mathbf{x}]=\boldsymbol{\mu}\int\mathcal{N}(\mathbf{w}\vert\mathbf{0},\Sigma)d\mathbf{w}=\boldsymbol{\mu}$

\displaystyle \begin{aligned} E\left[\mathbf{x}\mathbf{x}^T\right]&=\frac{1}{(2\pi)^{n/2}\vert\Sigma\vert^{1/2}}\int\exp\left\{-\frac{1}{2}(\mathbf{x}-\boldsymbol{\mu})^T\Sigma^{-1}(\mathbf{x}-\boldsymbol{\mu})\right\}\mathbf{x}\mathbf{x}^Td\mathbf{x}\\ &=\frac{1}{(2\pi)^{n/2}\vert\Sigma\vert^{1/2}}\int\exp\left\{-\frac{1}{2}\mathbf{w}^T\Sigma^{-1}\mathbf{w}\right\}(\mathbf{w}+\boldsymbol{\mu})(\mathbf{w}+\boldsymbol{\mu})^Td\mathbf{w}. \end{aligned}

\displaystyle \begin{aligned} ~~&\frac{1}{(2\pi)^{n/2}\vert\Sigma\vert^{1/2}}\int\exp\left\{-\frac{1}{2}\mathbf{w}^T\Sigma^{-1}\mathbf{w}\right\}\mathbf{w}\mathbf{w}^Td\mathbf{w}\\ &=\sum_{i=1}^n\sum_{j=1}^n\mathbf{q}_i\mathbf{q}_j^T\frac{1}{(2\pi)^{n/2}(\lambda_1\cdots\lambda_n)^{1/2}}\int\exp\left\{-\sum_{k=1}^n\frac{v_k^2}{2\lambda_k}\right\}v_iv_jd\mathbf{v}\\ &=\sum_{i=1}^n\mathbf{q}_i\mathbf{q}_i^T\left(\prod_{k=1\atop k\neq i}^n\frac{1}{(2\pi\lambda_k)^{1/2}}\int\exp\left\{-\frac{v_k^2}{2\lambda_k}\right\}dv_k\cdot\frac{1}{(2\pi\lambda_i)^{1/2}}\int\exp\left\{-\frac{v_i^2}{2\lambda_i}\right\}v_i^2dv_i\right)\\ &=\sum_{i=1}^n\mathbf{q}_i\mathbf{q}_i^T\lambda_i=\Sigma. \end{aligned}

$\displaystyle E\left[\mathbf{x}\mathbf{x}^T\right]=\boldsymbol{\mu}\boldsymbol{\mu}^T+\Sigma$

$\displaystyle \hbox{cov}\left[\mathbf{x}\right]=E\left[(\mathbf{x}-\boldsymbol{\mu})(\mathbf{x}-\boldsymbol{\mu})^T\right]$

\begin{aligned} \hbox{cov}[\mathbf{x}]&= E\left[\mathbf{x}\mathbf{x}^T-\mathbf{x}\boldsymbol{\mu}^T-\boldsymbol{\mu}\mathbf{x}^T+\boldsymbol{\mu}\boldsymbol{\mu}^T\right]\\ &=E\left[\mathbf{x}\mathbf{x}^T\right]-E[\mathbf{x}]\boldsymbol{\mu}^T-\boldsymbol{\mu}E\left[\mathbf{x}\right]^T+\boldsymbol{\mu}\boldsymbol{\mu}^T\\ &=E\left[\mathbf{x}\mathbf{x}^T\right]-\boldsymbol{\mu}\boldsymbol{\mu}^T=\Sigma,\end{aligned}

Cholesky 分解與極分解

$\Delta^2=(\mathbf{x}-\boldsymbol{\mu})^T(B^{-1})^TB^{-1}(\mathbf{x}-\boldsymbol{\mu})=\Vert B^{-1}(\mathbf{x}-\boldsymbol{\mu})\Vert^2=\Vert\Lambda^{-1/2}Q^T(\mathbf{x}-\boldsymbol{\mu})\Vert^2$

$\mathbf{z}=\Lambda^{-1/2}Q^T(\mathbf{x}-\boldsymbol{\mu})$，即有 $\Delta^2=\mathbf{z}^T\mathbf{z}$，隨機向量 $\mathbf{z}$ 的機率密度函數變成

$\displaystyle p(\mathbf{z})=\frac{1}{(2\pi)^{n/2}}\exp\left\{-\frac{1}{2}\mathbf{z}^T\mathbf{z}\right\}=\prod_{i=1}^n\frac{1}{\sqrt{2\pi}}\exp\left\{-\frac{z_i^2}{2}\right\}$

$\mathbf{x}=Q\Lambda^{1/2}\mathbf{z}+\boldsymbol{\mu}$

[1] 中央極限定理 (central limit theorem)：如果從平均數為 $\mu$，變異數為 $\sigma^2$ 的母體抽取大小為 $N$ 的樣本，那麼這些樣本的平均數將近似服從平均數為 $\mu$，變異數為 $\sigma^2/N$ 的常態分布。樣本大小 $N$ 越大，則越近似常態分布。

[2] 考慮單變量的高斯積分

$\displaystyle I=\int_{-\infty}^{\infty}\exp\left(-\frac{x^2}{2\lambda}\right)dx$

\displaystyle \begin{aligned} I^2&=\int_{-\infty}^{\infty}\exp\left(\frac{-x^2}{2\lambda}\right)dx\cdot\int_{-\infty}^{\infty}\exp\left(-\frac{y^2}{2\lambda}\right)dy\\ &=\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}\exp\left(-\frac{x^2+y^2}{2\lambda}\right)dxdy\\ &=\int_0^{\infty}\exp\left(-\frac{r^2}{2\lambda}\right)rdr\int_0^{2\pi}d\theta\\ &=\int_0^{\infty}\exp\left(-\frac{u}{\lambda}\right)du\cdot 2\pi\\ &=2\pi\lambda,\end{aligned}

$\displaystyle \int_{-\infty}^{\infty}\exp\left(-\frac{x^2}{2\lambda}\right)dx=\sqrt{2\pi\lambda}$

$\displaystyle \int_{-\infty}^{\infty}\frac{1}{\sqrt{2\pi\lambda}}\exp\left(-\frac{x^2}{2\lambda}\right)dx=1$

This entry was posted in 機率統計 and tagged , , , , , , , , , , , , , . Bookmark the permalink.

### 5 則回應給 共變異數矩陣與常態分布

1. 張盛東 說：

周老師，今天我收到通知我們學校這個星期四有一個Dissertation Proposal Defense，關於central matrix method in dimension reduction regression。我google了一下沒找到相關資料。是不是這個method有其他的名字？

2. Ou Yang 說：

老師，請問一下若是兩變數獨立的話，共變異數矩陣是不是一個只有對角線上有值，上三角和下三角都是零的矩陣？

• ccjou 說：