Description
Theoretical Problems (13 points)
1. (3 points) Consider x ∈ R+. Consider
h(x) = − ln x
• Find the Bregman divergence Dh(x, y). • Consider function
f (x) = 0.5x − ln(x + 1).
What’s the smoothness and strong convexity parameters of f with respect to h?
• Consider h-mirror descent method for solving f (x). What’s the formula to obtain xt from xt−1
with a learning rate ηt.
2. (3 points) Consider a composite optimization problem
f (x) + g(x),
with non-convex regularizer g(x) given by
g(x) = λ
d (cid:88)
j=1
ln(1 + |xj|),
for some λ > 0. Since logarithm grows slowly when |xj| increases, this regularizer leads to less bias than L1 regularization. Assume we want to apply proximal gradient method
to solve this problem, where the proximal optimization problem becomes
proxηg [x − η∇f (x)]
proxηg(z) = arg min
x
(cid:107)x − z(cid:107)2
(cid:21) 2 + ηg(x)
(cid:20) 1 2
(1)
• Show that when η > 0 is sufficiently small, then the proximal optimization problem (1) is strongly
convex for all z. What is the range for such η?
• Find the closed form solution for proxηg(z) when η is sufficiently small so that (1) is convex. • If f (x) is L smooth and 2λ strongly convex. Does the proximal gradient algorithm converge? If
so, how many iterations are needed to obtain an (cid:15) primal suboptimal solution?
1
3. (3 points) Consider the optimization problem
n (cid:88)
i=1
fi(Aix) + (cid:107)A0x(cid:107)2,
where Ai are matrices. We rewrite it as
φ(x, z) =
n (cid:88)
i=1
fi(zi) + (cid:107)z0(cid:107)2
subject to A1x − z1 = 0, . . . , Anx − zn = 0, A0x − z0 = 0.
• Write down the Lagrangian function L(x, z, α) with multipliers α1, . . . , αn and α0 corresponding
to the n + 1 linear constraints.
• Find the dual formulation φD(α) = minx,z L(x, z, α) in terms of f ∗ i . • If n = 1 and A1 = I, write down the dual objective function in α0 by eliminating α1. Assume
f ∗ 1 (·) is smooth, how to get primal the primal variable x from dual α0?
4. (4 points) Consider a symmetric positive definite matrix A, and let
f (x) =
1 2
x(cid:62)Ax − b(cid:62)x,
g(x) =
λ 2
(cid:107)x(cid:107)2
2 + µ(cid:107)x(cid:107)1.
• Find the Fenchel’s dual of f (x) + g(x). • Find ∇f ∗(α) and ∇g∗(α) • Write down a closed form formula for the dual ascent method. • Find the smoothness of f ∗ and g∗ and explain how to set learning rate for dual ascent.
Programming Problem (7 points)
• Download data in the mnist sub-directory (which contains class 1 (positive) versus 7 (negative)
• Use the python template “prog-template.py”, and implement functions marked with ’# implement’.
• (4 points) Complete codes and get the plots
• (1 points) Discuss the behaviors of proximal and dual averaging algorithms in the experiments
• (1 points) Discuss the impacts of different µ in the experiments
• (1 points) Moreover, can you reduce the degree of oscillations for GD by selecting other learning rates
when the objective function is highly non-smooth?
2







