## 主成分分析與低秩矩陣近似

$X=\begin{bmatrix} \mathbf{x}_1^T\\ \vdots\\ \mathbf{x}_n^T \end{bmatrix}=\begin{bmatrix} x_{11}&\cdots&x_{1p}\\ \vdots&\ddots&\vdots\\ x_{n1}&\cdots&x_{np} \end{bmatrix}$

$\displaystyle S=[s_{ij}]=\frac{1}{n-1}\sum_{i=1}^n\mathbf{x}_i\mathbf{x}_i^T=\frac{1}{n-1}X^TX$

$\displaystyle S=Q\Lambda Q^T$

\displaystyle\begin{aligned} \sum_{j=1}^ps_{jj}&=\hbox{trace}S=\hbox{trace}(Q\Lambda Q^T)=\hbox{trace}(\Lambda Q^TQ)\\ &=\hbox{trace}\Lambda=\lambda_1+\cdots+\lambda_p,\end{aligned}

$\displaystyle \begin{array}{ll} \hbox{maximize}&\displaystyle\sum_{j=1}^k\mathbf{w}_j^TS\mathbf{w}_j\\ \hbox{subject to}&\mathbf{w}_i^T\mathbf{w}_j=\delta_{ij}.\end{array}$

$z_{ij}=\mathbf{w}_j^T\mathbf{x}_i$ 為新變數，稱為主成分，$z_j$ 的第 $i$ 個量測值，稱為主成分係數，$\mathbf{w}_j^TS\mathbf{w}_j$$z_j$ 的樣本變異數 (稍後會說明)。主成分分析給出的主軸就是樣本共變異數矩陣 $S$ 的單範正交特徵向量 $\mathbf{w}_j=\mathbf{q}_j$，新變數 $z_j$ 的樣本變異數為特徵值 $\lambda_j$$j=1,\ldots,k$

$\displaystyle \begin{array}{ll} \hbox{minimize}&\Vert X-\hat{X}\Vert_F^2\\ \hbox{subject to}&\hbox{rank}\hat{X}=k,\end{array}$

$\displaystyle P=W(W^TW)^{-1}W^T=WW^T$

$\hat{\mathbf{x}}_i=W\mathbf{z}_i=\begin{bmatrix} \mathbf{w}_1&\cdots&\mathbf{w}_k \end{bmatrix}\begin{bmatrix} z_{i1}\\ \vdots\\ z_{ik} \end{bmatrix}=z_{i1}\mathbf{w}_1+\cdots+z_{ik}\mathbf{w}_k$

$Z=\begin{bmatrix} \mathbf{z}_1^T\\ \vdots\\ \mathbf{z}_n^T \end{bmatrix}=\begin{bmatrix} \mathbf{x}_1^TW\\ \vdots\\ \mathbf{x}_n^TW \end{bmatrix}=XW$

$\hat{X}=\begin{bmatrix} \hat{\mathbf{x}}_1^T\\ \vdots\\ \hat{\mathbf{x}}_n^T \end{bmatrix}=\begin{bmatrix} \mathbf{z}_1^TW^T\\ \vdots\\ \mathbf{z}_n^TW^T\end{bmatrix}=XWW^T$

$\displaystyle \sum_{i=1}^nz_{ij}=\sum_{i=1}^n\mathbf{w}_j^T\mathbf{x}_i=\mathbf{w}_j^T\left(\sum_{i=1}^n\mathbf{x}_i\right)=\mathbf{w}_j^T\mathbf{0}=0,~~j=1,\ldots,k,$

$\displaystyle \begin{array}{ll} \hbox{minimize}&\displaystyle\sum_{i=1}^n\left\| \mathbf{x}_i-WW^T\mathbf{x}_i\right\|^2=\left\| X-XWW^T\right\|^2_F\\ \hbox{subject to}&W^TW=I_k. \end{array}$

\displaystyle \begin{aligned} \left\| X-XWW^T\right\|^2_F&=\hbox{trace}\left(\left(X-XWW^T\right)\left(X-XWW^T\right)^T\right)\\ &=\hbox{trace}\left(XX^T\right)-2\,\hbox{trace}\left(XWW^TX^T\right)+\hbox{trace}\left(XWW^TWW^TX^T\right)\\ &=\hbox{trace}\left(XX^T\right)-\hbox{trace}\left(XWW^TX^T\right)\\ &=\Vert X\Vert_F^2-\Vert XW\Vert_F^2.\end{aligned}

$\displaystyle \begin{array}{ll} \hbox{maximize}&\Vert XW\Vert_F^2\\ \hbox{subject to}&W^TW=I_k. \end{array}$

\displaystyle \begin{aligned} \mathbf{w}_j^TS\mathbf{w}_j&=\frac{1}{n-1}\mathbf{w}_j^TX^TX\mathbf{w}_j=\frac{1}{n-1}\left\|X\mathbf{w}_j\right\|^2\\ &=\frac{1}{n-1}\left\|\begin{bmatrix} \mathbf{x}_1^T\mathbf{w}_j\\ \vdots\\ \mathbf{x}_n^T\mathbf{w}_j \end{bmatrix}\right\|^2=\frac{1}{n-1}\sum_{i=1}^nz_{ij}^2,\end{aligned}

$\displaystyle \Vert XW\Vert_F^2=\Vert Z\Vert_F^2=\sum_{j=1}^k\left(\sum_{i=1}^nz_{ij}^2\right)=(n-1)\sum_{j=1}^k\mathbf{w}_j^TS\mathbf{w}_j$

\displaystyle \begin{aligned} \left\|XW\right\|_F^2&=\left\|U\Sigma V^TW\right\|_F^2=\hbox{trace}\left((U\Sigma V^TW)^T(U\Sigma V^TW)\right)\\ &=\hbox{trace}\left(W^TV\Sigma^TU^TU\Sigma V^TW\right)=\hbox{trace}\left(P^T\Sigma^T\Sigma P\right)\\ &=\Vert \Sigma P\Vert_F^2,\end{aligned}

$\displaystyle \begin{array}{ll} \hbox{maximize}&\Vert \Sigma P\Vert_F^2\\ \hbox{subject to}&P^TP=I_k.\end{array}$

$\displaystyle \Vert \Sigma P\Vert_F^2=\sum_{i=1}^r\sigma_i^2\sum_{j=1}^kp_{ij}^2$

\begin{aligned} \hat{X}&=XWW^T=U\Sigma V^TV_1V_1^T=U\Sigma \begin{bmatrix} V_1^T\\ V_2^T \end{bmatrix}V_1V_1^T\\ &=U\Sigma\begin{bmatrix} I_k\\ 0 \end{bmatrix}V_1^T=U\begin{bmatrix} D&0\\ 0&0 \end{bmatrix}\begin{bmatrix} I_k&0\\ 0&0 \end{bmatrix}V^T\\ &=U\begin{bmatrix} D'&0\\ 0&0 \end{bmatrix} V^T,\end{aligned}

1. 樣本共變異數矩陣的正交對角化 $S=Q\Lambda Q^T$ 可由數據矩陣的奇異值分解 $X=U\Sigma V^T$ 求出 (見“主成分分析與奇異值分解”)。使用定義，

$\displaystyle S=\frac{1}{n-1}X^TX=\frac{1}{n-1}V\Sigma^T U^TU\Sigma V^T=V\left(\frac{1}{n-1}\Sigma^T\Sigma\right)V^T$

比較 $S$ 的兩種正交對角化表達式，立得 $Q=V$$\Lambda=\frac{1}{n-1}\hbox{diag}(\sigma_1^2,\ldots,\sigma_p^2)$

2. 定義 $k\times k$ 階新樣本變異數矩陣

$\displaystyle S_z=\frac{1}{n-1}\sum_{i=1}^n\mathbf{z}_i\mathbf{z}_i^T=\frac{1}{n-1}Z^TZ$

使用 $Z=XW=XV_1$$S=Q\Lambda Q^T=V\Lambda V^T$，可得

\displaystyle \begin{aligned} S_z&=\frac{1}{n-1}V_1^TX^TXV_1=V_1^TSV_1=V_1^TV\Lambda V^TV_1\\ &=V_1^T\begin{bmatrix} V_1&V_2 \end{bmatrix}\Lambda \begin{bmatrix} V_1^T\\ V_2^T \end{bmatrix}V_1=\begin{bmatrix} I_k&0 \end{bmatrix}\Lambda\begin{bmatrix} I_k\\ 0 \end{bmatrix}\\ &=\begin{bmatrix} \lambda_1&&\\ &\ddots&\\ &&\lambda_k \end{bmatrix},\end{aligned}

證明任兩個新變數的相關係數等於零 (見“相關係數”)，變數 $z_j$ 的樣本變異數則為 $\lambda_j=\frac{1}{n-1}\sigma_j^2$$j=1,\ldots,k$

3. 最佳近似數據矩陣 $\hat{X}$ 的樣本共變異數矩陣為

\displaystyle \begin{aligned} \hat{S}&=\frac{1}{n-1}\hat{X}^T\hat{X}=\frac{1}{n-1}V\begin{bmatrix} D'&0\\ 0&0 \end{bmatrix}U^TU\begin{bmatrix} D'&0\\ 0&0 \end{bmatrix}V^T\\ &=\frac{1}{n-1}V\begin{bmatrix} \begin{matrix} \sigma_1^2&&\\ &\ddots& \\&&\sigma_k^2 \end{matrix}&0\\ 0&0 \end{bmatrix}V^T=V\begin{bmatrix} \begin{matrix} \lambda_1&&\\ &\ddots&\\ &&\lambda_k \end{matrix}&0\\ 0&0 \end{bmatrix}V^T. \end{aligned}

主成分分析保留的總變異量比為

$\displaystyle \frac{\hbox{trace}{S}_z}{\hbox{trace} S}=\frac{\hbox{trace}\hat{S}}{\hbox{trace} S}=\frac{\sum_{j=1}^k\lambda_j}{\sum_{j=1}^p\lambda_j}$

[1] 若樣本共變異數矩陣 $S$ 的條件數 (condition number) 很大，譬如大於 $1,000$，則變數之間的共線性關係十分嚴重 (見“條件數”)。
[2] Kronecker 函數 $\delta_{ij}$ 表示單位矩陣 $I$$(i,j)$ 元。

This entry was posted in 線性代數專欄, 應用之道 and tagged , , , , . Bookmark the permalink.