## 費雪的判別分析與線性判別分析

• 甚麼是線性判別分析 (linear discriminant analysis)？它與費雪的判別分析有何關係？
• 線性判別分析與最小平方法有甚麼關聯性？
• 費雪的判別分析如何推廣至多類別 (類別數大於2) 判別分析？

$\displaystyle y=\frac{\mathbf{w}^T\mathbf{x}}{\mathbf{w}^T\mathbf{w}}=\mathbf{w}^T\mathbf{x}$

$\displaystyle \mathbf{m}_j=\frac{1}{n_j}\sum_{i\in\mathcal{C}_j}\mathbf{x}_i,~~~j=1,2$

$\displaystyle m_j=\frac{1}{n_j}\sum_{i\in\mathcal{C}_j}y_i=\frac{1}{n_j}\sum_{i\in\mathcal{C}_j}\mathbf{w}^T\mathbf{x}_i=\mathbf{w}^T\mathbf{m}_j,~~~j=1,2$

$\displaystyle \vert m_2-m_1\vert=\left|\mathbf{w}^T(\mathbf{m}_2-\mathbf{m}_1)\right|$

\displaystyle\begin{aligned} L(\lambda,\mathbf{w})&=\left(\mathbf{w}^T(\mathbf{m}_2-\mathbf{m}_1)\right)^2-\lambda(\mathbf{w}^T\mathbf{w}-1)\\ &=\mathbf{w}^T(\mathbf{m}_2-\mathbf{m}_1)(\mathbf{m}_2-\mathbf{m}_1)^T\mathbf{w}-\lambda(\mathbf{w}^T\mathbf{w}-1), \end{aligned}

$\displaystyle \frac{\partial L}{\partial\mathbf{w}}=2(\mathbf{m}_2-\mathbf{m}_1)(\mathbf{m}_2-\mathbf{m}_1)^T\mathbf{w}-2\lambda\mathbf{w}$

$\displaystyle s_j^2=\sum_{i\in\mathcal{C}_j}(y_i-m_j)^2,~~j=1,2$

$\displaystyle J(\mathbf{w})=\frac{(m_2-m_1)^2}{s_1^2+s_2^2}$

$\displaystyle S_j=\sum_{i\in\mathcal{C}_j}(\mathbf{x}_i-\mathbf{m}_j)(\mathbf{x}_i-\mathbf{m}_j)^T,~~j=1,2$

\displaystyle\begin{aligned} s_j^2&=\sum_{i\in\mathcal{C}_j}\left(\mathbf{w}^T\mathbf{x}_i-\mathbf{w}^T\mathbf{m}_j\right)^2\\ &=\sum_{i\in\mathcal{C}_j}\mathbf{w}^T(\mathbf{x}_i-\mathbf{m}_j)(\mathbf{x}_i-\mathbf{m}_j)^T\mathbf{w}\\ &=\mathbf{w}^T\left(\sum_{i\in\mathcal{C}_j}(\mathbf{x}_i-\mathbf{m}_j)(\mathbf{x}_i-\mathbf{m}_j)^T\right)\mathbf{w}\\ &=\mathbf{w}^TS_j\mathbf{w}.\end{aligned}

$\displaystyle s_1^2+s_2^2=\mathbf{w}^TS_1\mathbf{w}+\mathbf{w}^TS_2\mathbf{w}=\mathbf{w}^TS_W\mathbf{w}$

$\displaystyle S_B=(\mathbf{m}_2-\mathbf{m}_1)(\mathbf{m}_2-\mathbf{m}_1)^T$

\displaystyle\begin{aligned} (m_2-m_1)^2&=\left(\mathbf{w}^T\mathbf{m}_2-\mathbf{w}^T\mathbf{m}_1\right)^2\\ &=\mathbf{w}^T(\mathbf{m}_2-\mathbf{m}_1)(\mathbf{m}_2-\mathbf{m}_1)^T\mathbf{w}\\ &=\mathbf{w}^TS_B\mathbf{w}. \end{aligned}

$\displaystyle J(\mathbf{w})=\frac{\mathbf{w}^TS_B\mathbf{w}}{\mathbf{w}^TS_W\mathbf{w}}$

$\displaystyle \max_{\mathbf{w}^TS_W\mathbf{w}=1}\mathbf{w}^TS_B\mathbf{w}$

$\displaystyle S_B\mathbf{w}=\lambda S_W\mathbf{w}$

$\displaystyle \mathbf{w}\propto S_W^{-1}(\mathbf{m}_2-\mathbf{m}_1)$

$\displaystyle \mathcal{N}(\mathbf{x}\vert\boldsymbol{\mu}_j,\Sigma)=\frac{1}{(2\pi)^{n/2}\vert\Sigma\vert^{1/2}}\exp\left\{-\frac{1}{2}(\mathbf{x}-\boldsymbol{\mu}_j)^T\Sigma^{-1}(\mathbf{x}-\boldsymbol{\mu}_j)\right\}$

$\displaystyle P(\mathcal{C}_j\vert\mathbf{x})=\frac{p(\mathbf{x}\vert\mathcal{C}_j)P(\mathcal{C}_j)}{p(\mathbf{x})}$

1. $P(\mathcal{C}_j)$ 是類別 $\mathcal{C}_j$ 出現的機率，稱為先驗機率 (priori probability)；
2. $p(\mathbf{x}\vert\mathcal{C}_j)$ 是條件密度函數，即給定類別 $\mathcal{C}_j$，數據點 $\mathbf{x}$ 的機率密度函數，也稱為似然 (likelihood)；
3. $p(\mathbf{x})$ 是數據點 $\mathbf{x}$ 的機率密度函數，稱為證據 (evidence)，算式為

$p(\mathbf{x})=p(\mathbf{x}\vert\mathcal{C}_1)P(\mathcal{C}_1)+p(\mathbf{x}\vert\mathcal{C}_2)P(\mathcal{C}_2)$

4. $P(\mathcal{C}_j\vert\mathbf{x})$ 是指在給定數據點 $\mathbf{x}$ 的情況下，該點屬於 $\mathcal{C}_j$ 的機率，稱為後驗機率 (posterior probability)。

$\displaystyle -\frac{1}{2}(\mathbf{x}-\boldsymbol{\mu}_1)^T\Sigma^{-1}(\mathbf{x}-\boldsymbol{\mu}_1)+\ln P(\mathcal{C}_1)\ge -\frac{1}{2}(\mathbf{x}-\boldsymbol{\mu}_2)^T\Sigma^{-1}(\mathbf{x}-\boldsymbol{\mu}_2)+\ln P(\mathcal{C}_2)$

$\displaystyle (\boldsymbol{\mu}_2-\boldsymbol{\mu}_1)^T\Sigma^{-1}\mathbf{x}\le\frac{1}{2}\boldsymbol{\mu}_2^T\Sigma^{-1}\boldsymbol{\mu}_2-\frac{1}{2}\boldsymbol{\mu}_1^T\Sigma^{-1}\boldsymbol{\mu}_1-\ln P(\mathcal{C}_2)+\ln P(\mathcal{C}_1)$

\displaystyle\begin{aligned} \mathbf{w}&=\Sigma^{-1}(\boldsymbol{\mu}_2-\boldsymbol{\mu}_1)\\ w_0&=-\frac{1}{2}\boldsymbol{\mu}_2^T\Sigma^{-1}\boldsymbol{\mu}_2+\frac{1}{2}\boldsymbol{\mu}_1^T\Sigma^{-1}\boldsymbol{\mu}_1+\ln P(\mathcal{C}_2)-\ln P(\mathcal{C}_1). \end{aligned}

\displaystyle\begin{aligned} S&=\left(\frac{n_1}{n}\right)\frac{1}{n_1}\sum_{i\in\mathcal{C}_1}(\mathbf{x}_i-\mathbf{m}_1)(\mathbf{x}_i-\mathbf{m}_1)^T+\left(\frac{n_2}{n}\right)\frac{1}{n_2}\sum_{i\in\mathcal{C}_2}(\mathbf{x}_i-\mathbf{m}_2)(\mathbf{x}_i-\mathbf{m}_2)^T\\ &=\frac{1}{n}(S_1+S_2)=\frac{1}{n}S_W.\end{aligned}

$\displaystyle \mathbf{w}=nS_W^{-1}(\mathbf{m}_2-\mathbf{m}_1)$

$\displaystyle w_0=-\frac{n}{2}\mathbf{m}_2^TS_W^{-1}\mathbf{m}_2+\frac{n}{2}\mathbf{m}_1^TS_W^{-1}\mathbf{m}_1+\ln \frac{n_2}{n}-\ln \frac{n_1}{n}$

$\displaystyle E=\frac{1}{2}\sum_{i=1}^n\left(g(\mathbf{x}_i)-r_i\right)^2=\frac{1}{2}\sum_{i=1}^n\left(\mathbf{w}^T\mathbf{x}_i+w_0-r_i\right)^2$

\displaystyle\begin{aligned} \frac{\partial E}{\partial w_0}&=2\sum_{i=1}^n(\mathbf{w}^T\mathbf{x}_i+w_0-r_i)\\ \frac{\partial E}{\partial \mathbf{w}}&=2\sum_{i=1}^n(\mathbf{w}^T\mathbf{x}_i+w_0-r_i)\mathbf{x}_i. \end{aligned}

\displaystyle\begin{aligned} w_0&=-\frac{1}{n}\sum_{i=1}^n\mathbf{w}^T\mathbf{x}_i+\sum_{i=1}^nr_i\\ &=-\mathbf{w}^T\left(\frac{1}{n}\sum_{i=1}^n\mathbf{x}_i\right)+n_1\frac{n}{n_1}-n_2\frac{n}{n_2}\\ &=-\mathbf{w}^T\mathbf{m}. \end{aligned}

$\displaystyle \left(S_W+\frac{n_1n_2}{n}S_B\right)\mathbf{w}=n(\mathbf{m}_1-\mathbf{m}_2)$

\displaystyle\begin{aligned} S_W\mathbf{w}&=-\frac{n_1n_2}{n}(\mathbf{m}_2-\mathbf{m}_1)(\mathbf{m}_2-\mathbf{m}_1)^T\mathbf{w}+n(\mathbf{m}_1-\mathbf{m}_2)\\ &=\left(-\frac{n_1n_2}{n}(\mathbf{m}_2-\mathbf{m}_1)^T\mathbf{w}-n\right)(\mathbf{m}_2-\mathbf{m}_1). \end{aligned}

$\displaystyle \mathbf{w}\propto S_W^{-1}(\mathbf{m}_2-\mathbf{m}_1)$

\displaystyle\begin{aligned} \mathbf{w}^T(\mathbf{m}_1-\mathbf{m})&=(\mathbf{m}_2-\mathbf{m}_1)^TS_W^{-1}\left(\mathbf{m}_1-\frac{n_1}{n}\mathbf{m}_1-\frac{n_2}{n}\mathbf{m}_2\right)\\ &=-\frac{n_2}{n}(\mathbf{m}_2-\mathbf{m}_1)S_W^{-1}(\mathbf{m}_2-\mathbf{m}_1)<0, \end{aligned}

$\displaystyle S_W=\sum_{j=1}^kS_j$

$\displaystyle S_j=\sum_{i\in\mathcal{C}_j}(\mathbf{x}_i-\mathbf{m}_j)(\mathbf{x}_i-\mathbf{m}_j)^T,~~~j=1,\ldots,k,$

$\displaystyle \mathbf{m}_j=\frac{1}{n_j}\sum_{i\in\mathcal{C}_j}\mathbf{x}_i,~~~j=1,\ldots,k$

$\displaystyle \mathbf{m}=\frac{1}{n}\sum_{i=1}^n\mathbf{x}_i=\frac{1}{n}\sum_{j=1}^k\sum_{i\in\mathcal{C}_j}\mathbf{x}_i=\frac{1}{n}\sum_{j=1}^kn_j\mathbf{m}_j$

$\displaystyle S_T=\sum_{i=1}^n(\mathbf{x}_i-\mathbf{m})(\mathbf{x}_i-\mathbf{m})^T$

\displaystyle\begin{aligned} S_T&=\sum_{j=1}^k\sum_{i\in\mathcal{C}_j}(\mathbf{x}_i-\mathbf{m}_j+\mathbf{m}_j-\mathbf{m})(\mathbf{x}_i-\mathbf{m}_j+\mathbf{m}_j-\mathbf{m})^T\\ &=\sum_{j=1}^k\sum_{i\in\mathcal{C}_j}(\mathbf{x}_i-\mathbf{m}_j)(\mathbf{x}_i-\mathbf{m}_j)^T+\sum_{j=1}^k\sum_{i\in\mathcal{C}_j}(\mathbf{m}_j-\mathbf{m})(\mathbf{m}_j-\mathbf{m})^T\\ &~~+\sum_{j=1}^k\sum_{i\in\mathcal{C}_j}(\mathbf{x}_i-\mathbf{m}_j)(\mathbf{m}_j-\mathbf{m})^T+\sum_{j=1}^k(\mathbf{m}_j-\mathbf{m})\sum_{i\in\mathcal{C}_j}(\mathbf{x}_i-\mathbf{m}_j)^T\\ &=S_W+\sum_{j=1}^k n_j(\mathbf{m}_j-\mathbf{m})(\mathbf{m}_j-\mathbf{m})^T. \end{aligned}

$\displaystyle S_B=\sum_{j=1}^k n_j(\mathbf{m}_j-\mathbf{m})(\mathbf{m}_j-\mathbf{m})^T$

$\displaystyle y_l=\mathbf{w}_l^T\mathbf{x},~~l=1,\ldots,q$

$\displaystyle \mathbf{y}=\begin{bmatrix} y_1\\ \vdots\\ y_q \end{bmatrix}=\begin{bmatrix} \mathbf{w}_1^T\mathbf{x}\\ \vdots\\ \mathbf{w}_q^T\mathbf{x} \end{bmatrix}=\begin{bmatrix} \mathbf{w}_1&\cdots&\mathbf{w}_q \end{bmatrix}^T\mathbf{x}=W^T\mathbf{x}$

\displaystyle\begin{aligned} \tilde{S}_W&=\sum_{j=1}^k \sum_{i\in\mathcal{C}_j}(\mathbf{y}_i-\tilde{\mathbf{m}}_j)(\mathbf{y}_i-\tilde{\mathbf{m}}_j)^T\\ \tilde{S}_B&=\sum_{j=1}^k n_j(\tilde{\mathbf{m}}_j-\tilde{\mathbf{m}})(\tilde{\mathbf{m}}_j-\tilde{\mathbf{m}})^T, \end{aligned}

$\displaystyle \tilde{\mathbf{m}}_j=\frac{1}{n_j}\sum_{i\in\mathcal{C}_j}\mathbf{y}_i=\frac{1}{n_j}\sum_{i\in\mathcal{C}_j}W^T\mathbf{x}_i=W^T\mathbf{m}_j,~~j=1,\ldots,k$

$\displaystyle \tilde{\mathbf{m}}=\frac{1}{n}\sum_{j=1}^k n_j\tilde{\mathbf{m}}_j=\frac{1}{n}\sum_{j=1}^k n_j W^T\mathbf{m}_j=W^T\mathbf{m}.$

\displaystyle\begin{aligned} \tilde{S}_W&=\sum_{j=1}^k \sum_{i\in\mathcal{C}_j}W^T(\mathbf{x}_i-\mathbf{m}_j)(\mathbf{x}_i-\mathbf{m}_j)^TW=W^TS_WW\\ \tilde{S}_B&=\sum_{j=1}^k n_jW^T(\mathbf{m}_j-\mathbf{m})(\mathbf{m}_j-\mathbf{m})^TW=W^TS_BW. \end{aligned}

$\displaystyle J_1(W)=\text{trace}\left(\tilde{S}_W^{-1}\tilde{S}_B\right)=\text{trace}\left((W^TS_WW)^{-1}(W^TS_BW)\right)$

$\displaystyle J_2(W)=\frac{\det \tilde{S}_B}{\det \tilde{S}_W}=\frac{\det(W^TS_BW)}{\det(W^TS_WW)}$

$\displaystyle S_B\mathbf{w}_l=\lambda_lS_W\mathbf{w}_l,~~l=1,\ldots,q$

$\displaystyle J_1(W)=\lambda_1+\cdots+\lambda_q$

$\displaystyle \text{rank}(\tilde{S}_W^{-1}\tilde{S}_B)=\text{rank}\tilde{S}_B=\dim\text{span}\{\tilde{\mathbf{m}}_1-\tilde{\mathbf{m}},\ldots,\tilde{\mathbf{m}}_k-\tilde{\mathbf{m}}\}\le k-1$

$q\le k-1$。給定包含 $k\ge 2$ 個類別的樣本，多類別判別分析所能產生的有效線性特徵總數至多為 $k-1$

[1] 先證明下列等式：

$\displaystyle S_W=\sum_{i=1}^n\mathbf{x}_i\mathbf{x}_i^T-n_1\mathbf{m}_1\mathbf{m}_1^T-n_2\mathbf{m}_2\mathbf{m}_2^T$

\displaystyle\begin{aligned} S_j&=\sum_{i\in\mathcal{C}_j}(\mathbf{x}_i-\mathbf{m}_j)(\mathbf{x}_i-\mathbf{m}_j)^T\\ &=\sum_{i\in\mathcal{C}_j}\mathbf{x}_i\mathbf{x}_i^T-\mathbf{m}_j\sum_{i\in\mathcal{C}_j}\mathbf{x}_i^T-\sum_{i\in\mathcal{C}_j}\mathbf{x}_i\mathbf{m}_j^T+n_j\mathbf{m}_j\mathbf{m}_j^T\\ &=\sum_{i\in\mathcal{C}_j}\mathbf{x}_i\mathbf{x}_i^T-\mathbf{m}_j(n_j\mathbf{m}_j^T)-(n_j\mathbf{m}_j)\mathbf{m}_j^T+n_j\mathbf{m}_j\mathbf{m}_j^T\\ &=\sum_{i\in\mathcal{C}_j}\mathbf{x}_i\mathbf{x}_i^T-n_j\mathbf{m}_j\mathbf{m}_j^T, \end{aligned}

$S_W=S_1+S_2=\sum_{i=1}^n\mathbf{x}_i\mathbf{x}_i^T-n_1\mathbf{m}_1\mathbf{m}_1^T-n_2\mathbf{m}_2\mathbf{m}_2^T$。寫出

$\displaystyle \frac{\partial E}{\partial \mathbf{w}}=2\sum_{i=1}^n(\mathbf{w}^T\mathbf{x}_i+w_0-r_i)\mathbf{x}_i=0$

$\displaystyle \sum_{i=1}^n\mathbf{x}_i(\mathbf{x}_i^T-\mathbf{m}^T)\mathbf{w}=\sum_{i=1}^nr_i\mathbf{x}_i$

\displaystyle\begin{aligned} \sum_{i=1}^n\mathbf{x}_i(\mathbf{x}_i^T-\mathbf{m}^T)\mathbf{w}&=\left(\sum_{i=1}^n\mathbf{x}_i\mathbf{x}_i^T-n\mathbf{m}\mathbf{m}^T\right)\mathbf{w}\\ &=\left(\sum_{i=1}^n\mathbf{x}_i\mathbf{x}_i^T-\frac{1}{n}(n_1\mathbf{m}_1+n_2\mathbf{m}_2)(n_1\mathbf{m}_1+n_2\mathbf{m}_2)^T\right)\mathbf{w}\\ &=\left(\sum_{i=1}^n\mathbf{x}_i\mathbf{x}_i^T-n_1\mathbf{m}_1\mathbf{m}_1^T-n_2\mathbf{m}_2\mathbf{m}_2^T+\frac{n_1n_2}{n}(\mathbf{m}_2-\mathbf{m}_1)(\mathbf{m}_2-\mathbf{m}_1)^T\right)\mathbf{w}\\ &=\left(S_W+\frac{n_1n_2}{n}S_B\right)\mathbf{w}. \end{aligned}

$\displaystyle \sum_{i=1}^nr_i\mathbf{x}_i=\frac{n}{n_1}\sum_{i\in\mathcal{C}_1}\mathbf{x}_i-\frac{n}{n_2}\sum_{i\in\mathcal{C}_2}\mathbf{x}_i=n(\mathbf{m}_1-\mathbf{m}_2)$

[2] 對於兩類別情況 (即 $k=2$)，多類別組間散布矩陣為

$\displaystyle S_B=n_1(\mathbf{m}_1-\mathbf{m})(\mathbf{m}_1-\mathbf{m})^T+n_2(\mathbf{m}_2-\mathbf{m})(\mathbf{m}_2-\mathbf{m})^T$

\displaystyle\begin{aligned} \mathbf{m}_1-\mathbf{m}&=\mathbf{m}_1-\frac{1}{n}(n_1\mathbf{m}_1+n_2\mathbf{m}_2)=\frac{n_2}{n}(\mathbf{m}_1-\mathbf{m}_2)\\ \mathbf{m}_2-\mathbf{m}&=\mathbf{m}_2-\frac{1}{n}(n_1\mathbf{m}_1+n_2\mathbf{m}_2)=\frac{n_1}{n}(\mathbf{m}_2-\mathbf{m}_1). \end{aligned}

$\displaystyle S_B=\frac{n_1n_2}{n}(\mathbf{m}_2-\mathbf{m}_1)(\mathbf{m}_2-\mathbf{m}_1)^T$

[3] Christopher M. Bishop, Pattern Recognition and Machine Learning, Springer, 2006, pp 192.

[4] Richard O. Duda, Peter E. Hart, and David G. Stork, Pattern Classification, 2nd ed., John Wiley & Sons, 2001, pp 123.

[5] 對於目標函數

$\displaystyle J_1(W)=\text{trace}\left((W^TS_WW)^{-1}(W^TS_BW)\right)$

\displaystyle\begin{aligned} \frac{\partial J_1}{\partial W}&=\frac{\partial\text{trace}\left((W^TS_WW)^{-1}(W^TS_BW)\right)}{\partial W}\\ &=-2S_WW(W^TS_WW)^{-1}W^TS_BW(W^TS_WW)^{-1}+2S_BW(W^TS_WW)^{-1}. \end{aligned}

$\frac{\partial J_1}{\partial W}=0$，並右乘 $W^TS_WW$，可得最佳條件式

$\displaystyle S_BW=S_WW(W^TS_WW)^{-1}W^TS_BW$

$\displaystyle S_BW=S_WW\tilde{S}^{-1}_W\tilde{S}_B$

$\displaystyle P^T\tilde{S}_WP=Q^T(M^{-1})^TM^TMM^{-1}Q=Q^TQ=I$

$\displaystyle P^T\tilde{S}_BP=Q^T(M^{-1})^T\tilde{S}_BM^{-1}Q=Q^TQDQ^TQ=D$

$\displaystyle \tilde{S}_W^{-1}\tilde{S}_B=PDP^{-1}$

$\displaystyle S_BWP=S_WWPD$

$d\times q$ 階矩陣 $U=WP$。以 $U$ 取代 $W$ 來計算目標函數，使用跡數循環不變性，可得

\displaystyle\begin{aligned} J_1(U)&=\text{trace}\left((U^TS_WU)^{-1}(U^TS_BU)\right)\\ &=\text{trace}\left((P^TW^TS_WWP)^{-1}(P^TW^TS_BWP)\right)\\ &=\text{trace}\left(P^{-1}(W^TS_WW)^{-1}(P^T)^{-1}P^T(W^TS_BW)P\right)\\ &=\text{trace}\left((W^TS_WW)^{-1}(W^TS_BW)PP^{-1}\right)\\ &=\text{trace}\left((W^TS_WW)^{-1}(W^TS_BW)\right)\\ &=J_1(W). \end{aligned}

$\displaystyle S_BU=S_WUD$

$\displaystyle S_B\mathbf{u}_l=\lambda_lS_W\mathbf{u}_l,~~l=1,\ldots,q$

[6] 對於目標函數

$\displaystyle J_2(W)=\frac{\det(W^TS_BW)}{\det(W^TS_WW)}$

$\displaystyle \ln J_2(W)=\ln\det(W^TS_BW)-\ln\det(W^TS_WW)$

$\displaystyle \frac{\partial \ln J_2}{\partial W}=2S_BW(W^TS_BW)^{-1}-2S_WW(W^TS_WW)^{-1}$

$\frac{\partial \ln J_2}{\partial W}=0$，並右乘 $W^TS_BW$，可得

$\displaystyle S_BW=S_WW(W^TS_WW)^{-1}W^TS_BW$

\displaystyle\begin{aligned} J_2(U)&=\frac{\det(U^TS_BU)}{\det(U^TS_WU)}\\ &=\frac{\det(P^TW^TS_BWP)}{\det(P^TW^TS_WWP)}\\ &=\frac{(\det P^T)\det(W^TS_BW)(\det P)}{(\det P^T)\det(W^TS_WW)(\det P)}\\ &=\frac{\det(W^TS_BW)}{\det(W^TS_WW)}\\ &=J_2(W).\end{aligned}

$\displaystyle S_BU=S_WUD$

This entry was posted in 機器學習 and tagged , , , , , , , , . Bookmark the permalink.

### 6 則回應給 費雪的判別分析與線性判別分析

1. ccjou 說：

本文過長，我校讀過二遍，如果讀者發現仍有錯誤請不吝指正，謝謝。

2. 張盛東 說：

周老師，我有個疑問：
文中，“直白地說，最佳投影直線的指向即為連接二個類別樣本中心的向量於排除組內散布效應後的方向 (如上圖左所示)。”

這裡是否應該是“如上圖右所示”？

• ccjou 說：

謝謝，你眼力真好！我後來也發現了這個錯誤，已訂正。還有沒有別的錯誤？特別是註解，記號實在非常繁雜。

• 張盛東 說：

老師，我剛才把註解部分看了一遍，覺得應該沒有任何問題。

謝謝老師的文章，令我對LDA背後的理論有更深刻的理解。

3. student 說：

周老師您好
您提到在兩類別資料中求取w時
如果 n>d，Sw 通常是正定矩陣
此時的特例應該是n>d時資料矩陣還是可能是 singular matrix
而原本應是求解廣義特徵值問題也因為 Sw與Sb 的inverse不存在而無法轉換
因此想請問您有甚麼辦法能在不刪除線性相依變數的前提下進行LDA呢?

• ccjou 說：

如果 $S_W$ 是不可逆矩陣，你可以在主對角元加入一個很小的擾動量 $\epsilon>0$，成為 $S_W+\epsilon I$。不過，解出的 $\mathbf{w}$ 將隨 $\epsilon$ 的大小改變。