## Math behind Soft-Margin SVM Classifier - Sequential Minimal Optimization

The decision function for an SVM classifier is given by:

\begin{equation*}\begin{aligned}\hat{f}(\bar x) = sgn(\bar w.\bar x - b) \hspace{5 mm} \bar x \in \mathbb{R}^n \hspace{5 mm} and \hspace{2 mm} \bar w \in \mathbb{R}^n\end{aligned}\end{equation*}

$\bar w$ is the normal vector and b is the offset term for the decision surface $\bar w.\bar x = b$.

The corresponding supporting hyperplanes are as follows:

\begin{equation*}\begin{aligned}\bar w.\bar x = b + 1 - \xi \hspace{5 mm} \forall \hspace{2 mm} (\bar x, y) \mid \bar x \in \mathbb{R}^n, \hspace{2 mm} \hspace{2 mm} y = +1, \hspace{2 mm} \xi \geq 0 \end{aligned}\end{equation*}

\begin{equation*}\begin{aligned}\bar w.\bar x = b - 1 + \xi \hspace{5 mm} \forall \hspace{2 mm} (\bar x, y) \mid \bar x \in \mathbb{R}^n, \hspace{2 mm} \hspace{2 mm} y = -1, \hspace{2 mm} \xi \geq 0 \end{aligned}\end{equation*}

In either of the above supporting hyperplanes, $\xi \geq 0$ is known as the slack variable or error term that measures how far a particular point lies on the wrong side of its respective hyperplane.

The optimization problem to compute the a soft-margin decision surface $\bar w^*.\bar x = b^*$ is expressed as follows:

\begin{equation*}\begin{aligned} \underset{\bar w, b, \xi}{\text{min}} & \left( \frac{1}{2} \bar w.\bar w \hspace{2 mm} + \hspace{2 mm} C \sum\limits_{i=1}^m \xi_i \right) \end{aligned}\end{equation*}

\begin{aligned}\text{subject to: }\end{aligned}
\begin{equation*}\begin{aligned} \hspace{0.7 in} \bar w.\bar x_i \geq b + 1 - \xi_i \hspace{5 mm} \forall \hspace{2 mm} (\bar x_i, y_i) \in \mathbb{D}, \hspace{2 mm} \hspace{2 mm} y_i = +1, \hspace{2 mm} \end{aligned}\end{equation*}

\begin{equation*}\begin{aligned} \hspace{0.7 in} \bar w.\bar x_j \leq b - 1 + \xi_j \hspace{5 mm} \forall \hspace{2 mm} (\bar x_j, y_j) \in \mathbb{D}, \hspace{2 mm} \hspace{2 mm} y_j = -1, \hspace{2 mm} \end{aligned}\end{equation*}

\begin{equation*}\begin{aligned} \hspace{0.7 in} \xi_i \geq 0, \hspace{2 mm} i = 1, \ldots, m \end{aligned}\end{equation*}

\begin{equation*}\begin{aligned} \hspace{0.7 in} \text{where} \hspace{2 mm} \mathbb{D} = \left\{ (\bar x_i, y_i) : \bar x_i \in \mathbb{R}^n, y_i \in \{+1, -1\} \right\} \hspace{2 mm} \text{ is Training Set} \end{aligned}\end{equation*}

Rewriting Supporting Hyperplane Constraints in Compact Form:

The above Supporting Hyperplane constraints can be formatted into compact form as follows:

For Supporting Hyperplane representing all training points labeled as +1, we can rewrite the dot product and free term as follows:

\begin{equation*}\begin{aligned} \hspace{0.7 in} \bar w. (+1)\bar x_i \geq 1 + (+1)b - \xi_i \hspace{5 mm} \forall \hspace{2 mm} (\bar x_i, y_i) \in \mathbb{D}, \hspace{2 mm} \hspace{2 mm} y_i = +1, \hspace{2 mm} \end{aligned}\end{equation*}

Now we can substitute $y_i$ in place of +1, since the inequality will still hold true:

\begin{equation*}\begin{aligned} \hspace{0.7 in} \bar w. y_i\bar x_i \geq 1 + y_ib - \xi_i \hspace{5 mm} \forall \hspace{2 mm} (\bar x_i, y_i) \in \mathbb{D}, \hspace{2 mm} \hspace{2 mm} y_i = +1, \hspace{2 mm} \end{aligned}\end{equation*}

Rearranging the terms, the constraint becomes:

\begin{equation*}\begin{aligned} \hspace{0.7 in} \bar w. y_i\bar x_i - y_ib + \xi_i - 1 \geq 0 \hspace{5 mm} \forall \hspace{2 mm} (\bar x_i, y_i) \in \mathbb{D}, \hspace{2 mm} \hspace{2 mm} y_i = +1, \hspace{2 mm} \end{aligned}\end{equation*}

\begin{equation*}\begin{aligned}\hspace{0.7 in} \Rightarrow y_i(\bar w.\bar x_i - b) + \xi_i - 1 \geq 0 \hspace{5 mm} \forall \hspace{2 mm} (\bar x_i, y_i) \in \mathbb{D}, \hspace{2 mm} \hspace{2 mm} y_i = +1, \hspace{2 mm} \end{aligned}\end{equation*}

For Supporting Hyperplane representing all training points labeled as -1, multiplying LHS and RHS by -1 and changing the inequality from $\leq$ to $\geq$ we get:

\begin{equation*}\begin{aligned} \hspace{0.7 in} (-1)( \bar w.\bar x_j) \geq (-1)(b - 1 + \xi_j) \hspace{5 mm} \forall \hspace{2 mm} (\bar x_j, y_j) \in \mathbb{D}, \hspace{2 mm} \hspace{2 mm} y_j = -1, \hspace{2 mm} \end{aligned}\end{equation*}

\begin{equation*}\begin{aligned} \hspace{0.7 in} \Rightarrow \bar w.(-1)\bar x_j \geq 1 - \xi_j + (-1)b \hspace{5 mm} \forall \hspace{2 mm} (\bar x_j, y_j) \in \mathbb{D}, \hspace{2 mm} \hspace{2 mm} y_j = -1, \hspace{2 mm} \end{aligned}\end{equation*}

Now we can substitute $y_j$ in place of -1, since the inequality will still hold true:

\begin{equation*}\begin{aligned} \hspace{0.7 in} \bar w.y_j\bar x_j \geq 1 - \xi_j + y_jb \hspace{5 mm} \forall \hspace{2 mm} (\bar x_j, y_j) \in \mathbb{D}, \hspace{2 mm} \hspace{2 mm} y_j = -1, \hspace{2 mm} \end{aligned}\end{equation*}

Rearranging the terms, the constraint becomes:

\begin{equation*}\begin{aligned} \hspace{0.7 in} \bar w.y_j\bar x_j - y_jb + \xi_j -1 \geq 0 \hspace{5 mm} \forall \hspace{2 mm} (\bar x_j, y_j) \in \mathbb{D}, \hspace{2 mm} \hspace{2 mm} y_j = -1, \hspace{2 mm} \end{aligned}\end{equation*}

\begin{equation*}\begin{aligned} \hspace{0.7 in} \Rightarrow y_j(\bar w.\bar x_j - b) + \xi_j -1 \geq 0 \hspace{5 mm} \forall \hspace{2 mm} (\bar x_j, y_j) \in \mathbb{D}, \hspace{2 mm} \hspace{2 mm} y_j = -1, \hspace{2 mm} \end{aligned}\end{equation*}

As evident, the compacted constraints for both the supporting hyperplanes are in similar form. Hence the constraint can be now expressed as one constraint for all points in the training set $\mathbb{D}$ as follows:

\begin{equation*}\begin{aligned}\hspace{0.7 in} y_i(\bar w.\bar x_i - b) + \xi_i - 1 \geq 0 \hspace{5 mm} \forall \hspace{2 mm} (\bar x_i, y_i) \in \mathbb{D}, \hspace{2 mm} y_i \in \{+1,-1\} \end{aligned}\end{equation*}

Optimization Problem with Constraints in Compact Form

\begin{equation*}\begin{aligned} \underset{\bar w, b, \xi}{\text{min}} & \left( \frac{1}{2} \bar w.\bar w \hspace{2 mm} + \hspace{2 mm} C \sum\limits_{i=1}^m \xi_i \right) \end{aligned}\end{equation*}

\begin{aligned}\text{subject to: }\end{aligned}
\begin{equation*}\begin{aligned}\hspace{0.7 in} y_i(\bar w.\bar x_i - b) + \xi_i - 1 \geq 0 \hspace{5 mm} \forall \hspace{2 mm} (\bar x_i, y_i) \in \mathbb{D}, \hspace{2 mm} y_i \in \{+1,-1\} \end{aligned}\end{equation*}

\begin{equation*}\begin{aligned} \hspace{0.7 in} \xi_i \geq 0, \hspace{2 mm} i = 1, \ldots, m \end{aligned}\end{equation*}

\begin{equation*}\begin{aligned} \hspace{0.7 in} \text{where} \hspace{2 mm} \mathbb{D} = \left\{ (\bar x_i, y_i) : \bar x_i \in \mathbb{R}^n, y_i \in \{+1, -1\} \right\} \hspace{2 mm} \text{ is Training Set} \end{aligned}\end{equation*}

Optimization Problem in Lagrange Form

\begin{equation*}\begin{aligned} \underset{\bar \alpha, \bar \beta}{\text{max}} \hspace{1 mm} \underset{\bar w, b, \xi}{\text{min}} & \left( \frac{1}{2} \bar w.\bar w \hspace{2 mm} + \hspace{2 mm} C \sum\limits_{i=1}^m \xi_i - \sum\limits_{i=1}^m \alpha_i (y_i(\bar w.\bar x_i - b) + \xi_i - 1) - \sum\limits_{i=1}^m \beta_i \xi_i \right) \end{aligned}\end{equation*}

\begin{aligned}\text{subject to: }\end{aligned}
\begin{equation*}\begin{aligned} \hspace{0.7 in} \alpha_i \geq 0, \hspace{2 mm} \beta_i \geq 0, \hspace{2 mm} i = 1, \ldots, m. \end{aligned}\end{equation*}

### One comment

1. messaging says:

Great work, quite a few incredibly strong tips! I truly appreciate you writing this post and the rest of your internet site is amazing!