In many constrained optimization problems (both maximization and minimization), the optimization problem is first converted into something called Lagrange form and then the optimal solution is evaluated.

The following is the general representation of a *Constrained minimization* problem:

is the Objective function and are Constraint functions.

Note that constraint functions are assumed to be linearly independent - meaning there is no linear relationship between themselves and hence one function cannot be expressed in terms of others scaled by some constant.

The above optimization problem is then written in *Lagrange optimization* form as follows:

The values in the above representation are called *Lagrange Multipliers (a.k.a. KKT Multipliers) *.

There seems to be two intuitions to above formulation of Lagrangian optimization - One simple enough to explain in a top-down manner and the second, somewhat related to first one but involves deep understanding. We will look at both the intuitions.

**Intuition # 1:**

If an optimal solution exists for the constrained optimization problem, then at that optimal point, the gradient vector of the objective function is parallel to the gradient vectors of constraint functions . This means that objective function gradient vector at the optimal point can be expressed as some linear combination of constraint function gradient vectors that are parallel to it:

The above equation with terms and seems to suggest that there exists a function whose gradient is 0 at .

The function and its gradient can be represented more generally as follows:

Note that the values in the above equations are represented as vector for convenience such that .

Also note that gradient is a vector and since , will have vector components .

If at then each of the vector components will be equal to 0.

Since is unknown and since finding optimal is our goal, the generalized function seems to give some confidence that there is possibility of finding optimal by solving for point where . However, for the generalized function , there can be many points where . All such points are called *saddle points* or *stationary points* and the challenge then would be which one of them will be the optimal point that will yield optimal solution to primary objective function subject to constraints .

The intuition postulates that under certain special circumstances (which we will quantify mathematically in later paragraphs), one of the saddle points on can potentially be the optimal point for the main objective function subject to constraints .

Note that the values for multipliers in function are unknown. Essentially, if we have to go about using the function to find the optimal point then the challenge is two fold. We have to not only search for but we have to also find optimal values for simultaneously such that the desired saddle point can be found.

This implies that we have to find an optimal combination of and with such that can potentially be the optimal point for main objective function subject to contraints .

Note that:

If we go with the above intuition solving for using , analytically speaking, the real challenge is that we have to find *n* variables in and *m* values in . Essentially there are *(n+m)* unknowns that we need to solve for. So far going with the notion of , we have arrived at the following *n* potential equations in (n+m) unknowns.

Missing are the *m* additional equations in *(n+m)* unknowns.

This is where the primary intuition can be revisited. Primary intuition suggested that under certain special conditions, one of the saddle points on can potentially be the optimal point for the main objective function subject to constraints . Essentially this means that when such special conditions are met, the optimal value of objective function at subject to constraints is nothing but the value of function at .

However by formulation of generalized function , we know that:

So the value of function at is then given by:

This can be re-written as:

If then the above equation results in:

Since we know that and constraint functions are linearly independent, the above equation will hold true only when each of the terms is equal to 0.

And Voila....!! We have the special conditions and the missing *m* equations that must be satisfied at :

Now using the following *(n+m)* equations in *(n+m)* unknowns, we can find an optimal combination of and with such that can potentially be the optimal point for main objective function subject to contraints .

Hence this intuition#1 forms the basis for the formulation of Lagrangian Optimization problem from the Constrained Optimization problem.

**Notes About Inequality Constraints:**

- In case of Minimization problems, the standard convention is to use constraints as . In this case, the gradient vector of objective function and
and gradient vectors of constraint functions at the optimal point will point in same direction and in will be . If constraints are declared as , gradient vector of objective function and gradient vectors of constraint functions at the optimal point will point in opposite directions and in will be to compensate for opposite directions

- Alternatively, in case of Minimization problems, if the constraints are declared as , gradient vectors linear combination can be expressed as to account for opposite directions and will be
- In case of Maximization problems, the standard convention is to use constraints as . In this case, the gradient vector of objective function and
and gradient vectors of constraint functions at the optimal point will point in same direction and in will be . If constraints are declared as , gradient vector of objective function and gradient vectors of constraint functions at the optimal point will point in opposite directions and in will be to compensate for opposite directions

- Alternatively, in case of Maximization problems, if the constraints are declared as , gradient vectors linear combination can be expressed as to account for opposite directions and will be