凸優化──凸函數的最小化

本文的閱讀等級：中級

凸優化 (convex optimization) 是最佳化理論的一個分支。顧名思義，凸優化探討定義於一個凸集的凸函數最小化問題^[1]。令 $K\subset\mathbb{R}^n$ 為凸集且 $f:K\to\mathbb{R}$ 為凸函數。我們的目標要找出一個點 $\mathbf{x}^\star\in K$ ，使得每一 $\mathbf{x}\in K$ 滿足 $f(\mathbf{x}^\star)\le f(\mathbf{x})$ 。在最佳化理論中， $K$ 稱為可行域 (feasible set)， $f$ 稱為目標函數， $\mathbf{x}^\star$ 稱為全域最佳解^[2]。任一凸優化問題皆可表示為下列標準型：

$\displaystyle \begin{array}{lll} \min&f(\mathbf{x})&\\ \hbox{subject to}&g_i(\mathbf{x})\le 0,&i=1,\ldots,m\\ &h_j(\mathbf{x})=0,&j=1,\ldots,p, \end{array}$

其中 $f, g_1,\ldots,g_m:\mathbb{R}^n\to\mathbb{R}$ 是凸函數， $h_1,\ldots,h_p$ 是仿射函數，即 $h_j(\mathbf{x})=\mathbf{a}_j^T\mathbf{x}+b_j$ ， $\mathbf{a}_j\in\mathbb{R}^n$ ， $b_j\in\mathbb{R}$ 。最小平方法與線性規劃是兩種常見的凸優化問題。令 $A$ 為一個 $m\times n$ 階實矩陣， $\mathbf{b}$ 為 $m$ 維實向量， $\mathbf{c}$ 為 $n$ 維實向量。最小平方法是一個無約束 (unconstrained) 最佳化問題：

$\displaystyle \min_{\mathbf{x}\in\mathbb{R}^n}\Vert \mathbf{b}-A\mathbf{x}\Vert^2$ 。

線性規劃標準型問題具有下列形式 (見“線性規劃 (一)：標準型問題”)：

$\begin{array}{lll} \min& \mathbf{c}^T\mathbf{x} \\ \mathrm{subject~to}& A\mathbf{x}=\mathbf{b}, ~~\mathbf{x}\ge\mathbf{0}. \end{array}$

本文介紹凸優化的一個重要性質：任一局部最佳解 (亦稱相對最佳解) 即為全域最佳解。所謂局部最佳解 $\mathbf{x}^\star\in K$ 是指存在 $\delta>0$ 使得集合 $\{\mathbf{x}\vert\Vert\mathbf{x}-\mathbf{x}^\star\Vert<\delta,\mathbf{x}\in K\}$ 中每一點 $\mathbf{x}$ 滿足 $f(\mathbf{x}^\star)\le f(\mathbf{x})$ 。

定理一：若 $f$ 是定義於凸集 $K$ 的凸函數，則最佳解所形成的集合為一個凸集，且任一局部最佳解皆為全域最佳解。

如果目標函數 $f$ 不存在局部最佳解，譬如 $f(x)=x$ ，我們默認定理成立。下面考慮最佳解存在的情況，設 $c=\min_{\mathbf{x}\in K}f(\mathbf{x})$ 。所有最佳解組成的集合可表示為 $L_c^-(f)=\{\mathbf{x}\vert f(\mathbf{x})\le c,\mathbf{x}\in K\}$ ，稱為下水平集 (sublevel set)。因為 $f$ 是凸集合， $L_c^-(f)$ 必為凸集 (見“凸函數”，定理四)。令 $\mathbf{x}^\star\in K$ 是 $f$ 的一個局部最佳解。假設存在另一點 $\mathbf{y}\in K$ 滿足 $f(\mathbf{y})<f(\mathbf{x}^\star)$ 。令 $\mathbf{z}=\lambda\mathbf{y}+(1-\lambda)\mathbf{x}^\star$ ，其中 $0<\lambda<1$ 。使用凸函數定義，可得

$\displaystyle f(\mathbf{z})=f\left(\lambda\mathbf{y}+(1-\lambda)\mathbf{x}^\star\right)\le\lambda f(\mathbf{y})+(1-\lambda)f(\mathbf{x}^\star)< f(\mathbf{x}^\star)$ 。

當 $\lambda\to 0$ ， $\mathbf{z}=\mathbf{x}^\star+\lambda(\mathbf{y}-\mathbf{x}^\star)$ 非常靠近 $\mathbf{x}^\star$ ，上式與 $\mathbf{x}^\star$ 是一個局部最佳解的假設相矛盾，故證明局部最佳解 $\mathbf{x}^\star$ 滿足 $f(\mathbf{x}^\star)=c$ 。

定理一說明兩件事：如果存在一個局部最佳解，則該解為全域最佳解；所有的 (全域) 最佳解構成一個凸集。如果目標函數 $f$ 連續可導，我們還可以得到全域最佳解的充分條件，見下面定理。

定理二：令 $K\subset\mathbb{R}^n$ 為凸集， $f:K\to\mathbb{R}$ 為凸函數，且 $f\in C^1$ 。若存在一點 $\mathbf{x}^\star\in K$ 使得所有 $\mathbf{y}\in K$ ， $\nabla f(\mathbf{x}^\star)^T(\mathbf{y}-\mathbf{x}^\star)\ge 0$ ，則 $\mathbf{x}^\star$ 是 $f$ 在 $K$ 的全域最佳解。

證明使用此性質 (見“凸函數”，定理五)：若 $f$ 是凸函數，則對於任意 $\mathbf{x},\mathbf{y}\in K$ ，

$\displaystyle f(\mathbf{y})\ge f(\mathbf{x})+\nabla f(\mathbf{x})^T(\mathbf{y}-\mathbf{x})$ 。

以 $\mathbf{x}^\star$ 取代 $\mathbf{x}$ ，可得

$\displaystyle f(\mathbf{y})\ge f(\mathbf{x}^\star)+\nabla f(\mathbf{x}^\star)^T(\mathbf{y}-\mathbf{x}^\star)\ge f(\mathbf{x}^\star)$ 。

近代常用的凸優化算法包括次梯度法 (subgradient method) 和內點法 (interior-point method)。有興趣深入了解的讀者請參閱維基百科介紹^[3]或查詢非線性規劃 (nonlinear programming) 專門書籍。

註解：
[1] 一個向量集 $K\subset\mathbb{R}^n$ 稱為凸集 (見“凸組合、凸包與凸集”)，若給定任兩點 $\mathbf{x},\mathbf{y}\in K$ 和 $0<\lambda<1$ ，點 $\lambda\mathbf{x}+(1-\lambda)\mathbf{y}$ 屬於 $K$ 。一個函數 $f:K\to\mathbb{R}$ 稱為凸函數 (見“凸函數”)，若給定任兩點 $\mathbf{x},\mathbf{y}\in K$ 和 $0<\lambda<1$ ，

$\displaystyle f\left(\lambda\mathbf{x}+(1-\lambda)\mathbf{y}\right)\le \lambda f(\mathbf{x})+(1-\lambda)f(\mathbf{y})$ 。

傳統上，凸優化設定為凸函數的最小化問題。若凸函數 $f$ 定義於一個有界閉凸集 $K$ ，從直觀可推斷使 $f$ 最大化的最佳解發生於 $K$ 的端點。
[2] 這裡 $\mathbf{x}^\star$ 代表最佳解，請勿與共軛轉置 $\mathbf{x}^\ast=\overline{\mathbf{x}}^T$ 相混淆。
[3] 維基百科：次梯度法，內點法。

4 Responses to 凸優化──凸函數的最小化

levinc417 says:

09/27/2013 at 5:26 pm

若待解問題不凸或不夠凸要怎麼凸化？

- ccjou says:
  
  09/27/2013 at 11:08 pm
  
  如果可能的話，使用變數變換將問題轉成凸函數。
  
強健控制實驗室 says:

04/16/2017 at 2:10 pm

請問老師, 那線性矩陣不等式(LMI)也算是凸優化的問題嗎?

- ccjou says:
  
  04/17/2017 at 3:17 pm
  
  應該說linear matrix inequality是經常見於控制凸優化的約束條件，
  
  Click to access lmibook.pdf

	xmj on 內積的定義
	Ning ChingSan on 線性代數的第一堂課──矩陣乘法的定義
	momo on 兩岸線性代數用詞參照
	訪客 on 克拉瑪公式的簡易幾何證明
	悟 on 條件機率與貝氏定理
	jeremy on 內積與外積是怎麼來的？