Ee364b Homework Sheets

EE364b Prof. S. Boyd

EE364b Homework 7

1. Kelley’s cutting-plane algorithm with proximal regularization. The Kelley cutting-plane method was described in an earlier exercise. In this exercise we explore a simple modification of the basic Kelley cutting-plane method that can dramatically speed up convergence.

We consider the problem of minimizing a convex function f : Rn → R over a convex set C, using a subgradient oracle for f . Suppose we have evaluated the function and a subgradient at the iterates x(1), . . . , x(k). We define the associated piecewise-linear approximation (and under-estimator) of f as

f̂ (k)(x) = max i=1,...,k

(

f(x(i)) + g(i)T (x− x(i)) )

.

Kelley’s cutting-plane algorithm is the iteration

x(k+1) = argmin x∈C

f̂ (k)(x).

Here we mean any minimizer; the argmin need not be unique. This requires minimizing a piecewise-linear function over C in each iteration.

Kelley’s cutting-plane algorithm with proximal regularization is the iteration

x(k+1) = argmin x∈C

(

f̂ (k)(x) + (ρ/2)‖x− x(k)‖22 )

where ρ > 0 is a parameter. (Note that ρ = 0 gives the standard Kelley cutting-plane algorithm.) With proximal regularization, the argmin is unique. Kelley’s cutting-plane algorithm with proximal regularization requires minimizing a quadratic function (with diagonal Hessian) plus a piecewise-linear function over C in each iteration.

The lower and upper bounds on p⋆ = infx∈C f(x) (used in Kelley’s cutting-plane method)

L(k) = inf x∈C

f̂ (k)(x) ≤ p⋆, U (k) = min i=1,...,k

f(x(i)) ≥ p⋆,

are of course still valid, although evaluating L(k) now involves solving an additional problem each iteration. (For this reason, the method is often used without calculating L(k).)

Use proximal regularization for Kelley’s cutting-plane algorithm in order to solve the lasso problem inside the unit cube,

minimize (1/2)‖Ax− b‖22 + ‖x‖1 subject to ‖x‖∞ ≤ 1,

with variable x ∈ Rn, and parameters A ∈ Rm×n, b ∈ Rm. We recommend choosing a problem with dimensions n = 10 and m = 50. One way to generate data is the following:

1

docsity.com

By Cauchy-Schwarz, we have k g x-g y k 2 k x-y k 2 ≥ λ k x-y k 2 2 , or k x-y k 2 ≤ 1 λ k g x-g y k 2 . As we have x ∈ ∂f * ( g x ) and y ∈ ∂f * ( g y ), this gives the Lipschitz result; differen-tiability follows because of the continuity of ∂f * in g . (b) This follows by the calculus rules for subdifferentials, namely, ∂f * ( s ) = Co { x : s T x-f ( x ) = f * ( s ) } = argmin x ∈ C {-s T x + f ( x ) } , the second equality a consequence of part (a). (c) By Taylor’s theorem, we know that f ( y ) ≤ f ( x ) + ∇ f ( x ) T ( y-x ) + L 2 k x-y k 2 2 , which follows because f ( y ) = f ( x ) + ∇ f ( x ) T ( y-x ) + Z 10 ( ∇ f ((1-t ) x + ty )- ∇ f ( x )) T ( y-x ) ≤ f ( x ) + ∇ f ( x ) T ( y-x ) + Z 10 k∇ f ((1-t ) x + ty )- ∇ f ( x ) k 2 k y-x k 2 dt | {z } ≤ L 2 k x-y k 2 2 . Now, let g x = ∇ f ( x ) and g y = ∇ f ( y ), so that g x ,g y ∈ dom f * by Q. ?? . Then f ( y ) ≤ f ( x ) + g T x ( y-x ) + L 2 k x-y k 2 2 = inf z { f ( z )-g T x x } + g T x y + L 2 k x-y k 2 2 =-f * ( g x ) + g T x y + L 2 k x-y k 2 2 . Rearranging, we have for all y ∈ dom f that f ( y )-g T y y + ( g y-g x ) T y-L 2 k y-x k 2 2 ≤ f ( y )-g T x y-L 2 k y-x k 2 2 ≤ -f * ( g x ) , or-f * ( g y ) + ( g y-g x ) T y-L 2 k x-y k 2 2 ≤ -f * ( g x ) . For fixed vector g = g y-g x , we have sup y ± s T y-L 2 k x-y k 2 2 ² = 1 2 L k s k 2 2 + s T x, so we obtain-f * ( g x ) ≥ -f * ( g y ) + x T ( g y-g x ) + 1 2 L k g y-g x k 2 2 , which is equivalent to strong convexity of f * . 2

0 Replies to “Ee364b Homework Sheets”

Lascia un Commento

L'indirizzo email non verrà pubblicato. I campi obbligatori sono contrassegnati *