ST790 Advanced Computing Lab: Introduction to CVX
This file demonstrates how to use the modeling tool CVX to solve optimization problems, and also includes some examples of using MATLAB's and Gurobi's solvers directly. It was originally created by Brian Gaines for Eric Chi's ST790 Advanced Computing at NC State University. The companion slidedeck provides additional information on CVX. The code that generated this file is available (cvxDemo.m). Any comments or suggestions are welcome.
This file requires:
- MATLAB's Optimization Toolbox (if you are an NCSU student, you have this)
- The CVX toolbox for disciplined convex optimization (preferrably with the professional license)
- cvxDemo.mat data file in the working directory (or the load path will need to be changed)
- A stand-alone installation of the Gurobi solver is optional. If you do not have a stand-alone installation of Gurobi, please change the gurobiInstalled string to equal 'no' so the Gurobi-related code will be skipped.
Contents
Initial Setup
% clean up workspace clear; %######## Does this machine have a stand-alone Gurobi installation?? ########% % Answer 'yes' or 'no' gurobiInstalled = 'yes'; % load data load('cvxDemo')
Linear Programming Example: regression (least absolute deviations)
The first example is regression. As the name suggests, this is similar to the typical regression setup except that the norm is used for the loss function, so the objective function becomes
minimize
To solve this using a linear programming solver, we need to put it in the solver's standard form (which may differ slightly from the standard form Dr. Chi presented in class).
LP Example: Linprog
For MATLAB's lingprog function, we see that (from the online documentation or by the command "help linprog" in MATLAB) the problem's form is given by
minimize
subject to , , and .
Once we massage the problem into this form (I will do this on the board), then the corresponding function call is:
x = linprog(f, A, b, Aeq, beq, lb, ub, x0, options);
Not all of the inputs are required. For example, if we only have linear inequaltiy constraints, the function call becomes
x = linprog(f, A, b);
If some of the intermediate function inputs are not needed, they need to be replaced by two square brackets, []. So if we had linear equality constraints but no linear inequality constraints, the code is
x = linprog(f, [], [], Aeq, beq);
Anyway, after massaging the regression problem into this form, we can translate it to the code:
f = [zeros(p1, 1); ones(n1, 1)];
A = [X1 -speye(n1); -X1 -speye(n1)];
b = [y1; -y1];
, and
lb = [-inf(p1, 1); zeros(n1, 1)];
Now that all of the inputs are set, we can call lingprog to estimate the parameters, then extract the first parameters as those correspond to .
tic; % start timer thetaHatLP = linprog(f, A, b, [], [], lb); % estimate parameters toc; % run time betaHatLP = thetaHatLP(1:p1); % extract estimates for beta
Optimization terminated. Elapsed time is 0.522778 seconds.
LP Example: Gurobi
Please note that this requires a standalone installation of Gurobi, which is free for academic users but was not required for this lab. However, I do encourage the knowleddge of a commercial solver such as Gurobi or MOSEK (both of which are free for academic users, follow the links) as those are state-of-the-art solvers and more efficient than MATLAB's built-in solvers and the R equivalents that I'm aware of (both Gurobi and MOSEK are available in R).
As before, the first thing we need to do is become familiar with what the standard form looks like according to the solver. A quick view of this is available through Gurobi's online documentation, while the details are available in the actual reference manual (p. 433). The relevant parts for our purposes are
minimize
subject to and .
and the function call is
gurobi(model, params)
For Gurobi, the model argument is pieced together in a struct variable (which is essentially like a list in R) that contains different fields that correspond to the various parts of the optimization poblem. The field of a struct is referenced with a period, so the syntax is struct.field.
Here are a few notes on constructing the model, as the implementation is not as straightforward as linprog:
- is required (so a vector of zeros is used if it is not needed), and obj is its corresponding field (model.obj)
- is also required, and it must be a sparse matrix (using the sparse function). Even if you do not have any constraints in your model, is still required and has to be sparse (so simply using [] doesn't work). As we will see in the lasso example, I usually use sparse(zeros(0, p));, where is the dimension of the parameter vector/optimization variable, to create a sparse empty matrix.
- Linear inequality constraints are also possible, even though the standard form only has linear equality constraints. The sense field specifies the type of constraint, using either '=', '<', or '>'. A single value can be used when all of the constraints are the same, otherwise for mixed constraints there needs to be one value per row of . Since A is required, sense is also required.
- The field for is rhs. This is also required.
- The field for is lb. This is optional, but please note that the default lower bound is zero, and not , which can be accomplished by using -inf(p, 1).
- params is optional and can be used to specify options. A list of options is available on page 487 of the reference manual. I typically only set the field OutputFlag to to suppress the output.
- x is the field in the output struct that has the solution. The solution includes several other output fields, detailed on p. 436 of the referencec manual.
Now that we have some background on gurobi, we can see it in action:
%# Model Setup #% % linear term gmodel.obj = [zeros(p1, 1); ones(n1, 1)]; % constraint matrix gmodel.A = [X1 -speye(n1); -X1 -speye(n1)]; % constraint type. Only one value needed since all constraints are inequality gmodel.sense = '<'; % b vector for constraints gmodel.rhs = [y1; -y1]; % lower bound (needed because default is zero) gmodel.lb = [-inf(p1, 1); zeros(n1, 1)]; % suppress output gparam.OutputFlag = 0; %# Model Estimation #% if strcmpi(gurobiInstalled, 'yes') tic; % start timer gresult = gurobi(gmodel, gparam); % estimate model toc; % run time betaHatGurobiLP = gresult.x(1:p1); % extract estimates for beta end
Elapsed time is 0.007671 seconds.
LP Example: CVX
While Gurobi and MOSEK are solvers, CVX is a modeling tool which provides a much more user-friendly interface for solving optimization problems using a variety of solvers, saving the hassle of needing to massage the problem into standard form or the hassle of having to learn each solver's syntax.
I will get into the details of CVX later, but for now, let's look at how we would solve the regression problem using CVX.
%# Model Estimation #% tic; % start timer cvx_begin quiet % initiate CVX, hiding output variable betaHatCVX_LP(p1); % declare optimization variable minimize( norm(y1 - X1*betaHatCVX_LP, 1) ); % define objective function cvx_end % declare end of CVX statements toc; % run time
As you can see, the syntax is much more straightforward and succinct, and essentially translates the original form of the problem directly to code without first having to massage it into an intermediate form for the sake of estimation. Here
%# Compare estimates #% % reduce the number of decimals displayed format short % display estimates if strcmpi(gurobiInstalled, 'yes') display([betaHatLP betaHatCVX_LP betaHatGurobiLP]) elseif strcmpi(gurobiInstalled, 'no') display([betaHatLP betaHatCVX_LP]) end
Elapsed time is 0.564291 seconds. ans = -0.5722 -0.5722 -0.5722 0.5249 0.5249 0.5249 0.6711 0.6711 0.6711 -0.0283 -0.0283 -0.0283 0.1598 0.1598 0.1598 0.8119 0.8119 0.8119 -0.1091 -0.1091 -0.1091 0.2011 0.2011 0.2011 0.0029 0.0029 0.0029
Quadratic Programming Example: Lasso Regression
Our second example involves the popular Lasso (Tibshirani, 1996), which introduces an penalty term on the regression coefficients in the objective function
minimize
where is a tuning parameter.
We will first optimize this directly using both MATLAB's solver (quadprog) and Gurobi, which will require us to first transform the problem into each solver's standard form.
QP Example: Quadprog
MATLAB's quadprog function naturally extends the lingprog function to include a quadratic term (documentation):
minimize
subject to , , and .
and the function call is also similar
x = quadprog(H, f, A, b, Aeq, beq, lb, ub, x0, options);
The key trick to reformulating our problem is to represent the parameter vector using its positive and negative parts, , as this allows us to deal with the absolute value in the penalty term, since . After expanding and rearranging the objective function to match quadprog's standard form, it should look like
subject to ,
so we can now translate this to code:
%# Model Setup #% % Quadratic matrix H gram = X2'*X2; H = [gram -gram; -gram gram]; % constraints lb2 = zeros(2*p2, 1); % constant part of linear coefficient f Xy2 = X2'*y2; f2p2 = [Xy2; -Xy2]; % suppress quadprog output options = optimset('Display', 'off'); %# Model Estimation #% % define grid of tuning parameters lambdas = (0:2:64); nLambda = length(lambdas); % matrix to store parameter estimates betaHatPathQP = NaN(p2, nLambda); % vector to store estimation time qpTimerPath = NaN(1, nLambda); % estimate model using quadprog, looping over lambdas for k = 1:nLambda % finish constructing f f2(1:2*p2, 1) = lambdas(k); f2 = f2 - f2p2; % estimate model using quadprog tic; thetaHatQP = quadprog(H, f2, [], [], [], [], lb2, [], [], options); qpTimer = toc; % back out estimates of beta betaHatPathQP(:, k) = thetaHatQP(1:p2) - thetaHatQP(p2+1:end); % store estimation time qpTimerPath(k) = qpTimer; end
QP Example: Gurobi
Gurobi's standard formulation already inludes a quadratic term,
minimize
subject to and ,
so we can proceed as before while also including the quadratic matrix, , which is required to be sparse.
%# Model Setup #% % quadratic matrix gram = X2'*X2; gmodel2.Q = sparse([gram -gram; -gram gram])/2; % constant part of linear term f Xy2 = X2'*y2;
Recall from above that and are required, even if they are not in our model, and that has to be a sparse matix (p. 433-4 of the reference manual). So we will create as a sparse, empty matrix and as an empty vector. Note that 's dimensions still need to be conformable, but can be a 0-by-1 vector.
% Constraint matrix. This is an empty matrix since it's not in our problem gmodel2.A = sparse(zeros(0, 2*p2)); % constraint type gmodel2.sense = '='; % right hand side of constraints (corresponds to b), also empty gmodel2.rhs = zeros(0, 1); % suppress gurobi output; gparam.OutputFlag = 0; %# Model Estimation #% if strcmpi(gurobiInstalled, 'yes') % matrix to store parameter estimates betaHatPathGurobi = NaN(p2, nLambda); % vector to store estimation time gurobiTimerPath = NaN(1, nLambda); % solve model, looping over lambdas for k = 1:nLambda % update linear coefficient gmodel2.obj = lambdas(k)*ones(2*p2, 1) - [Xy2; -Xy2]; % estimate model using Gurobi tic; gresult = gurobi(gmodel2, gparam); gurobiTimer = toc; % store estimated coefficients betaHatPathGurobi(:, k) = gresult.x(1:p2) - gresult.x(p2+1:end); % store estimation time gurobiTimerPath(k) = gurobiTimer; end end
QP Example: CVX
To solve the lasso in CVX, the implementation is again more straightforward as we can work directly the original formulation. The core part of the estimatin code is given by
cvx_begin quiet variable betaHatCVX_QP(p); minimize( 0.5*sum_square(y - X*betaHatCVX_QP) + ... lambda*norm(betaHatCVX_QP, 1) ); cvx_end
So to solve the problem n a grid of values, we can loop over this as
%# Model Estiation #% % matrix to store parameter estimates betaHatPathCVX = NaN(p2, nLambda); % vector to store estimation time cvxTimerPath = NaN(1, nLambda); % solve model, looping over lambdas for k = 1:nLambda % estimate model using CVX tic; cvx_begin quiet variable betaHatCVX_QP(p2); minimize( 0.5*sum_square(y2 - X2*betaHatCVX_QP) + ... lambdas(k)*norm(betaHatCVX_QP, 1) ); cvx_end cvxTimer = toc; % store estimated coefficients betaHatPathCVX(:, k) = betaHatCVX_QP; % store estimation time cvxTimerPath(k) = cvxTimer; end
%# Plot solutions #% % Quadprog figure; plot(lambdas, betaHatPathQP); title({'Lasso Solution Path: Using Quadprog', ... ['Timing: ' num2str(sum(qpTimerPath)) ' sec.'] }) xlabel('lambda'); ylabel('betaHat(lambda)'); xlim([min(lambdas), max(lambdas)]); % Gurobi if strcmpi(gurobiInstalled, 'yes') figure; plot(lambdas, betaHatPathGurobi); title({'Lasso Solution Path: Using Gurobi', ... ['Timing: ' num2str(sum(gurobiTimerPath)) ' sec.'] }) xlabel('lambda'); ylabel('betaHat(lambda)'); xlim([min(lambdas), max(lambdas)]); end % CVX figure; plot(lambdas, betaHatPathCVX); title({'Lasso Solution Path: Using CVX', ... ['Timing: ' num2str(sum(cvxTimerPath)) ' sec.'] }) xlabel('lambda'); ylabel('betaHat(lambda)'); xlim([min(lambdas), max(lambdas)]);
Additional Example: 1-norm SVM
Here is an additional example of using CVX, this time for solving a 1-norm SVM (Zhu et al., 2004), which is given by
minimize
subject to ,
where is a tuning parameter. This example involves a function, the hinge loss function, that is not already in the CVX function library. However, we can quickly define a new function for this purpose, and since it follows the DCP ruleset, CVX can identify it as being convex.
% define function handle for hinge loss
hingeLoss = @(x) sum(max(0, 1 - x));
With this in hand, the core part of the estimation code is
cvx_begin quiet variable betaHatSVM(p3); minimize( hingeLoss(y3.*(X3*betaHatSVM)) ); subject to; norm(betaHatSVM(2:end), 1) <= tValues(k); cvx_end
%# Model Estiation #% % create tuning parameter grid tValues = (0:0.1:2.5); % number of tuning parameters to consider m = length(tValues); % matrix to store parameter estimates betaHatPath3 = NaN(p3, m); % vector to store estimation time svmTimerPath = NaN(1, m); % solve model, looping over lambdas for k = 1:m % estimate model using CVX tic; cvx_begin quiet variable betaHatSVM(p3); minimize( hingeLoss(y3.*(X3*betaHatSVM)) ); subject to; norm(betaHatSVM(2:end), 1) <= tValues(k); cvx_end % store estimation time svmTimerPath(k) = toc; % store estimated coefficients betaHatPath3(:, k) = betaHatSVM; end % #Plot solution path #% figure; plot(tValues, betaHatPath3(2:end, :)); title({'1-norm SVM Solution Path (without intercept)', ... ['Timing: ' num2str(sum(svmTimerPath)) ' sec.'] }) xlabel('t'); ylabel('betaHat(t)'); xlim([min(tValues), max(tValues)]);