[SOLVED] IE684 Lab 07

24.99 $

Category: Tags: ,

Description

Rate

In this lab, we will continue with the ordinary least squares linear regression (OLSLR) problem introduced in the previous lab. Recall that we considered two versions of optimization problems, as follows:

  • 􏰁  Direct OLSLRLet x∗f = argminx f(x).
  • 􏰁  Regularized OLSLR

minf(x)= 1∥Ax−y∥2. (1) x2

minfλ(x)= λx⊤x+ 1∥Ax−y∥2. (2) x22

Let x∗fλ = argminx fλ(x).
However, the need for regularized version was possibly not clear. In the first exercise, we will try

to motivate the need for the regularized version in problem (2). Exercise 1: The need for regularization

Load the digits dataset from scikit-learn package using the following code. Create the data matrix A and the label vector y as described in the code:

import numpy as np
from sklearn.datasets import load_digits
digits = load_digits()
#check the shape of digits data
print(digits.data.shape)
#check the shape of digits target
print(digits.target.shape)
#let us use the linear regression used in the previous lab
#N = digits.data.shape[0] #Number of data points
#d = digits.data.shape[1] #Dimension of data points
A = digits.data
#In the following code, we create a Nx1 vector of target labels
y = 1.0*np.ones([A.shape[0],1])
for i in range(digits.target.shape[0]):
   y[i] = digits.target[i]

1. [R] Now use your Newton method to solve problem (1), which is the direct OLSLR. Use the starting point x = 0. Indicate the difficulties you encounter. Check if you face similar difficulties when you use Newton method to solve problem (2), the regularized OLSLR with λ = 0.001 and starting point x = 0. Explain the reason for your observation. Report the values of x∗f and x∗fλ .

2

IE684, IEOR Lab
Lab 07 02-March-2022

2. [R] Use BFGS method with starting point x = 0, to solve problem (1) and describe if you observe any difficulty. Check if solving the regularized problem (2) helps (use λ = 0.001 and starting point x = 0). Explain your observations. Report the values of x∗f and x∗fλ .

3

IE684, IEOR Lab
Lab 07 02-March-2022

Exercise 2: Scalability

Having understood the need for regularization, we will focus on the regularized problem (2) from now on.
The experiments you performed in the last lab would have given an impression that Newton and BFGS methods work extremely well. However, recall that in the last lab, you performed experiments on a simple 10-dimensional data set. We will check the behavior of Newton and BFGS methods on large data sets in this exercise.

Caution: The scalability experiments might impose excessive memory demands. Please carefully watch the memory usage of Google colab.
Use the following code for the scalability experiments.

#Code for Newton method
import numpy as np
import timeit
np.random.seed(1000) #for repeatability
N = 200
ds = [1000, 5000, 10000, 20000, 25000, 50000, 100000, 200000, 500000, 1000000]
lambda_reg = 0.001
eps = np.random.randn(N,1) #random noise
#For each value of dimension in the ds array, we will check the behavior of Newton method
for i in range(np.size(ds)):
    d=ds[i]
    A = np.random.randn(N,d)
    #Normalize the columns
    for j in range(A.shape[1]):
     A[:,j] = A[:,j]/np.linalg.norm(A[:,j])
    xorig = np.ones((d,1))
    y = np.dot(A,xorig) + eps
    start = timeit.default_timer()
    #call Newton method with A,y,lambda and obtain the optimal solution x_opt
    #x_opt = Newton(A,y,lambda_reg)
    newtontime = timeit.default_timer() - start #time is in seconds
    #print the total time and the L2 norm difference || x_opt - xorig|| for Newton method
#Code for BFGS method
import numpy as np
import timeit
np.random.seed(1000) #for repeatability
N = 200
ds = [1000, 5000, 10000, 20000, 25000, 50000, 100000, 200000, 500000, 1000000]
lambda_reg = 0.001
eps = np.random.randn(N,1) #random noise
#For each value of dimension in the ds array, we will check the behavior of BFGS method
for i in range(np.size(ds)):
    d=ds[i]
    A = np.random.randn(N,d)
    #Normalize the columns

4

IE684, IEOR Lab Lab 07

    for j in range(A.shape[1]):
     A[:,j] = A[:,j]/np.linalg.norm(A[:,j])
    xorig = np.ones((d,1))
    y = np.dot(A,xorig) + eps
    start = timeit.default_timer()

02-March-2022

    #call Newton method with A,y,lambda and obtain the optimal solution x_opt
    #x_opt = Newton(A,y,lambda_reg)
    newtontime = timeit.default_timer() - start #time is in seconds
    #print the total time, ||Ax_opt-y||^2 and the L2 norm difference || x_opt - xorig||^2
        for Newton method
    start = timeit.default_timer()
    #call BFGS method with A,y,lambda and obtain the optimal solution x_opt_bfgs
    #x_opt_bfgs = BFGS(A,y,lambda_reg)
    bfgstime = timeit.default_timer() - start #time is in seconds
    #print the total time, ||Ax_opt_bfgs-y||^2 and the L2 norm difference || x_opt_bfgs -
        xorig||^2 for BFGS method

Note that in the code fragments, we experiment with different sizes of data set dimension.
[R] Prepare a tabulation where you report the following quantities for each dimension for Newton and BFGS methods:

1. The total CPU time taken to solve the respective method

2. The value ∥Ax∗ − y∥2, where x∗ is the respective optimizer obtained by Newton and BFGS methods.

3. l2 norm difference values ∥x∗ − xorig∥2 where x∗ is the respective optimizer obtained by Newton and BFGS methods and xorig is used in the code.

Run the experiments until you face either of the following situations (which we flag as failure):
􏰁 You sense that the code is taking much longer to execute for a particular dimension (say more

than 20 minutes)
􏰁 Your machine faces memory issues.

[R] Report the dimension where failure occurs respectively for Newton and BFGS method.

5

IE684, IEOR Lab
Lab 07 02-March-2022

Exercise 3: A possible approach to handle scalability

In this exercise, we will discuss one possible approach to handle the scalability issue faced in the previous exercise.

1. First note that the regularized problem (2) can be written as N

min fλ(x) = min 􏰂 fi(x). (3)

xx
i=1

[R] Find an appropriate choice of fi(x).

2. [R] Write an expression to compute the gradient of fi(x). Let us denote it by gi(x) = ∇xfi(x).

3. Consider the dimension where you observed failure in the previous exercise. Implement the following algorithm (ALG-LAB7) to solve (3):

     #Code for ALG-LAB7
     import numpy as np
     import timeit
     np.random.seed(1000) #for repeatability
     N = 200
     d = ??? #Consider the dimension which caused failure in the previous experiment
     lambda_reg = 0.001
     eps = np.random.randn(N,1) #random noise
     #Create data matrix, label vector
     A = np.random.randn(N,d)
     #Normalize the columns
     for j in range(A.shape[1]):
        A[:,j] = A[:,j]/np.linalg.norm(A[:,j])
     xorig = np.ones( (d,1) )
     y = np.dot(A,xorig) + eps
     #initialize the optimization variable to be used in the new algo ALG-LAB8

x = np.zeros((d,1))
epochs = 1e4 #initialize the number of rounds needed to process t=1
arr = np.arange(N) #index array

     start = timeit.default_timer() #start the timer
     for epoch in range(epochs):
       np.random.shuffle(arr) #shuffle every epoch
       for i in np.nditer(arr): #Pass through the data points
         # Update x using x <- x - 1/t * g_i (x)
         t = t+1
         if t>1e4:

t=1
alglab7time = timeit.default_timer() – start #time is in seconds x_alglab7 = x
#print the time taken, ||Ax_alglab8 – y||^2, ||x_alglab8 – xorig||^2

[R] Report the time taken for ALG-LAB7, ∥∇fλ(x∗)∥, ∥Ax∗ − y∥2 and ∥x∗ − xorig∥2 where x∗ is the optimizer obtained by ALG-LAB7.

6

IE684, IEOR Lab
Lab 07 02-March-2022

4. [R] Fix λ = 0.001 and repeat the experiment for 106, 108 and 1010 epochs and report the time taken for ALG-LAB7 and ∥∇fλ(x∗)∥, ∥Ax∗ − y∥2 and ∥x∗ − xorig∥2. Explain your observations.

5. [R] Fix 109 epochs and try λ ∈ {1000, 100, 10, 1, 0.1, 10−2, 10−3} and report the time taken for ALG-LAB7 and ∥∇fλ(x∗)∥, ∥Ax∗ − y∥2 and ∥x∗ − xorig∥2. Explain your observations.

6. [R] Does ALG-LAB7 work for the failure dimension? 7. [R] Explain your understanding of ALG-LAB7.

7