Description
In this lab, we will continue with the ordinary least squares linear regression (OLSLR) problem introduced in the previous lab. Recall that we considered two versions of optimization problems, as follows:
- Direct OLSLRLet x∗f = argminx f(x).
- Regularized OLSLR
minf(x)= 1∥Ax−y∥2. (1) x2
minfλ(x)= λx⊤x+ 1∥Ax−y∥2. (2) x22
Let x∗fλ = argminx fλ(x).
However, the need for regularized version was possibly not clear. In the first exercise, we will try
to motivate the need for the regularized version in problem (2). Exercise 1: The need for regularization
Load the digits dataset from scikit-learn package using the following code. Create the data matrix A and the label vector y as described in the code:
import numpy as np
from sklearn.datasets import load_digits
digits = load_digits()
#check the shape of digits data
print(digits.data.shape) #check the shape of digits target
print(digits.target.shape) #let us use the linear regression used in the previous lab
#N = digits.data.shape[0] #Number of data points #d = digits.data.shape[1] #Dimension of data points
A = digits.data
#In the following code, we create a Nx1 vector of target labels
y = 1.0*np.ones([A.shape[0],1]) for i in range(digits.target.shape[0]):
y[i] = digits.target[i]
1. [R] Now use your Newton method to solve problem (1), which is the direct OLSLR. Use the starting point x = 0. Indicate the difficulties you encounter. Check if you face similar difficulties when you use Newton method to solve problem (2), the regularized OLSLR with λ = 0.001 and starting point x = 0. Explain the reason for your observation. Report the values of x∗f and x∗fλ .
2
IE684, IEOR Lab
Lab 07 02-March-2022
2. [R] Use BFGS method with starting point x = 0, to solve problem (1) and describe if you observe any difficulty. Check if solving the regularized problem (2) helps (use λ = 0.001 and starting point x = 0). Explain your observations. Report the values of x∗f and x∗fλ .
3
IE684, IEOR Lab
Lab 07 02-March-2022
Exercise 2: Scalability
Having understood the need for regularization, we will focus on the regularized problem (2) from now on.
The experiments you performed in the last lab would have given an impression that Newton and BFGS methods work extremely well. However, recall that in the last lab, you performed experiments on a simple 10-dimensional data set. We will check the behavior of Newton and BFGS methods on large data sets in this exercise.
Caution: The scalability experiments might impose excessive memory demands. Please carefully watch the memory usage of Google colab.
Use the following code for the scalability experiments.
#Code for Newton method
import numpy as np import timeit np.random.seed(1000) #for repeatability
N = 200 ds = [1000, 5000, 10000, 20000, 25000, 50000, 100000, 200000, 500000, 1000000] lambda_reg = 0.001 eps = np.random.randn(N,1) #random noise
#For each value of dimension in the ds array, we will check the behavior of Newton method
for i in range(np.size(ds)):
d=ds[i]
A = np.random.randn(N,d)
#Normalize the columns
for j in range(A.shape[1]):
A[:,j] = A[:,j]/np.linalg.norm(A[:,j])
xorig = np.ones((d,1))
y = np.dot(A,xorig) + eps
start = timeit.default_timer()
#call Newton method with A,y,lambda and obtain the optimal solution x_opt
#x_opt = Newton(A,y,lambda_reg)
newtontime = timeit.default_timer() - start #time is in seconds
#print the total time and the L2 norm difference || x_opt - xorig|| for Newton method
#Code for BFGS method
import numpy as np import timeit np.random.seed(1000) #for repeatability
N = 200 ds = [1000, 5000, 10000, 20000, 25000, 50000, 100000, 200000, 500000, 1000000] lambda_reg = 0.001 eps = np.random.randn(N,1) #random noise
#For each value of dimension in the ds array, we will check the behavior of BFGS method
for i in range(np.size(ds)):
d=ds[i]
A = np.random.randn(N,d)
#Normalize the columns
4
IE684, IEOR Lab Lab 07
for j in range(A.shape[1]):
A[:,j] = A[:,j]/np.linalg.norm(A[:,j])
xorig = np.ones((d,1))
y = np.dot(A,xorig) + eps
start = timeit.default_timer()
02-March-2022
#call Newton method with A,y,lambda and obtain the optimal solution x_opt
#x_opt = Newton(A,y,lambda_reg)
newtontime = timeit.default_timer() - start #time is in seconds
#print the total time, ||Ax_opt-y||^2 and the L2 norm difference || x_opt - xorig||^2
for Newton method
start = timeit.default_timer()
#call BFGS method with A,y,lambda and obtain the optimal solution x_opt_bfgs
#x_opt_bfgs = BFGS(A,y,lambda_reg)
bfgstime = timeit.default_timer() - start #time is in seconds
#print the total time, ||Ax_opt_bfgs-y||^2 and the L2 norm difference || x_opt_bfgs -
xorig||^2 for BFGS method
Note that in the code fragments, we experiment with different sizes of data set dimension.
[R] Prepare a tabulation where you report the following quantities for each dimension for Newton and BFGS methods:
1. The total CPU time taken to solve the respective method
2. The value ∥Ax∗ − y∥2, where x∗ is the respective optimizer obtained by Newton and BFGS methods.
3. l2 norm difference values ∥x∗ − xorig∥2 where x∗ is the respective optimizer obtained by Newton and BFGS methods and xorig is used in the code.
Run the experiments until you face either of the following situations (which we flag as failure):
You sense that the code is taking much longer to execute for a particular dimension (say more
than 20 minutes)
Your machine faces memory issues.
[R] Report the dimension where failure occurs respectively for Newton and BFGS method.
5
IE684, IEOR Lab
Lab 07 02-March-2022
Exercise 3: A possible approach to handle scalability
In this exercise, we will discuss one possible approach to handle the scalability issue faced in the previous exercise.
1. First note that the regularized problem (2) can be written as N
min fλ(x) = min fi(x). (3)
xx
i=1
[R] Find an appropriate choice of fi(x).
2. [R] Write an expression to compute the gradient of fi(x). Let us denote it by gi(x) = ∇xfi(x).
3. Consider the dimension where you observed failure in the previous exercise. Implement the following algorithm (ALG-LAB7) to solve (3):
#Code for ALG-LAB7
import numpy as np
import timeit
np.random.seed(1000) #for repeatability
N = 200
d = ??? #Consider the dimension which caused failure in the previous experiment
lambda_reg = 0.001
eps = np.random.randn(N,1) #random noise
#Create data matrix, label vector
A = np.random.randn(N,d)
#Normalize the columns
for j in range(A.shape[1]):
A[:,j] = A[:,j]/np.linalg.norm(A[:,j])
xorig = np.ones( (d,1) )
y = np.dot(A,xorig) + eps
#initialize the optimization variable to be used in the new algo ALG-LAB8
x = np.zeros((d,1))
epochs = 1e4 #initialize the number of rounds needed to process t=1
arr = np.arange(N) #index array
start = timeit.default_timer() #start the timer
for epoch in range(epochs):
np.random.shuffle(arr) #shuffle every epoch
for i in np.nditer(arr): #Pass through the data points
# Update x using x <- x - 1/t * g_i (x)
t = t+1
if t>1e4:
t=1
alglab7time = timeit.default_timer() – start #time is in seconds x_alglab7 = x
#print the time taken, ||Ax_alglab8 – y||^2, ||x_alglab8 – xorig||^2
[R] Report the time taken for ALG-LAB7, ∥∇fλ(x∗)∥, ∥Ax∗ − y∥2 and ∥x∗ − xorig∥2 where x∗ is the optimizer obtained by ALG-LAB7.
6
IE684, IEOR Lab
Lab 07 02-March-2022
4. [R] Fix λ = 0.001 and repeat the experiment for 106, 108 and 1010 epochs and report the time taken for ALG-LAB7 and ∥∇fλ(x∗)∥, ∥Ax∗ − y∥2 and ∥x∗ − xorig∥2. Explain your observations.
5. [R] Fix 109 epochs and try λ ∈ {1000, 100, 10, 1, 0.1, 10−2, 10−3} and report the time taken for ALG-LAB7 and ∥∇fλ(x∗)∥, ∥Ax∗ − y∥2 and ∥x∗ − xorig∥2. Explain your observations.
6. [R] Does ALG-LAB7 work for the failure dimension? 7. [R] Explain your understanding of ALG-LAB7.
7





