[SOLVED] Machine Learning 2 Week 12-DL4

29.99 $

Description

Rate

 

Exercise Sheet 12

Consider a dataset x1, . . . , xN ∈ Rd, and a simple linear feature map φ(x) = w⊤x + b with trainable parameters w and b. For this simple scenario, we can formulate the deep SVDD problem as:

w,b N i=1
where we have hardcoded the center parameter of deep SVDD to 1. We then classify new points x to be

anomalous if ∥w⊤x + b − 1∥2 > τ.

  1. (a)  Give a choice of parameters (w, b) that minimizes the objective above for any dataset (x1, . . . , xN ).
  2. (b)  We now consider a regularizer for our feature map φ which simply consists of forcing the bias term to b = 0. Show that under this regularizer, the solution of deep SVDD is given by:w = Σ − 1 x ̄
    where x ̄ and Σ are the empirical mean and uncentered covariance.

    Exercise 2: Restricted Boltzmann Machine (30 P)

    The restricted Boltzmann machine is a system of binary variables comprising inputs x ∈ {0, 1}d and hidden units h ∈ {0, 1}K . It associates to each configuration of these binary variables the energy:

    E(x, h) = −x⊤W h − b⊤h and the probability associated to each configuration is then given as:

    p(x, h) = Z1 exp(−E(x, h))
    where Z is a normalization constant that makes probabilities sum to one. Let sigm(t) = exp(t)/(1 + exp(t))

    be the sigmoid function.

  1. (a)  Show that p(hk = 1 | x) = sigm􏰍x⊤W:,k + bk􏰎.
  2. (b)  Show that p(xj = 1|h) = sigm􏰍W⊤ h􏰎. j,:

(c) Show that

where

p(x) = Z1 exp(−F (x))

K F(x)=−􏰄log􏰍1+exp􏰍x⊤W:,k +bk􏰎􏰎

k=1
is the free energy and where Z is again a normalization constant.

Exercise 3: Programming (50 P)

Download the programming files on ISIS and follow the instructions.

1 􏰄N
min ∥w⊤xi + b − 1∥2

Exercise sheet 12 (programming) [SoSe 2021] Machine Learning 2

KDE and RBM for Anomaly Detection

In this programming exercise, we compare in the context of anomaly detection two energy-based models: kernel density estimation (KDE) and the restricted Boltzmann machine (RBM).

In [1]:

import utils
import numpy
import scipy,scipy.special,scipy.spatial import sklearn,sklearn.metrics %matplotlib inline
import matplotlib
from matplotlib import pyplot as plt

We consider the MNIST dataset and define the class “0” to be normal (inlier) and the remain classes (1-9) to be anomalous (outlier). We consider that we have a training set Xr composed of 100 normal data points. The variables

Xi and Xo denote normal and anomalous test data. In [2]: Xr,Xi,Xo = utils.getdata()

The 100 training points are visualized below:

In [3]: plt.figure(figsize=(16,4)) plt.imshow(Xr.reshape(5,20,28,28).transpose(0,2,1,3).reshape(140,560)) plt.show()

Kernel Density Estimation (15 P)

We first consider kernel density estimation which is a shallow model for anomaly detection. The code below implement kernel density estimation.

Task:

Implement the function energy that returns the energy of the points X given as input as computed by the KDE energy function (cf. slide Kernel Density Estimation as an EBM).

In [4]: class AnomalyModel:

def auroc(self):
Ei = self.energy(Xi)
Eo = self.energy(Xo)
return sklearn.metrics.roc_auc_score(

            numpy.concatenate([Ei*0+0,Eo*0+1]),
            numpy.concatenate([Ei,Eo])
        )

class KDE(AnomalyModel):
def __init__(self,gamma):

        self.gamma = gamma

def fit(self,X): self.X = X

def energy(self,X):

# ———————————————— # TODO: Replace by your code
# ———————————————— import solution

        E = solution.kde_energy(self,X)
        # ------------------------------------------------

return E
The following code applies KDE with different scale parameters gamma

anomaly detection model measured in terms of area under the ROC.

In [5]: for gamma in numpy.logspace(-2,0,10):

and returns the performance of the resulting

kde = KDE(gamma)
kde.fit(Xr)
print(‘gamma = %5.3f AUROC = %5.3f’%(gamma,kde.auroc()))

              gamma = 0.010  AUROC = 0.957
              gamma = 0.017  AUROC = 0.962
              gamma = 0.028  AUROC = 0.969
              gamma = 0.046  AUROC = 0.976
              gamma = 0.077  AUROC = 0.981
              gamma = 0.129  AUROC = 0.983
              gamma = 0.215  AUROC = 0.983
              gamma = 0.359  AUROC = 0.982
              gamma = 0.599  AUROC = 0.982
              gamma = 1.000  AUROC = 0.981

We observe that the best performance is obtained for some intermediate value of the parameter

gamma .

Restricted Boltzmann Machine (35 P)

We now consider a restricted Boltzmann machine composed of 100 binary hidden units (h ∈ {0, 1}100 ). The joint energy function of our RBM is given by:

E(x, h) = −x⊤ a − x⊤ W h − h⊤ b
The model can be marginalized over its hidden units and the energy function that depends only on the input x is then

given as:

100 E(x)=−x⊤a−∑log(1+exp(x⊤W:,k +bk))

k=1 The RBM training algorithm is already implemented for you.

Tasks:

Implement the energy function E(x)
Augment the function fit with code that prints the AUROC every 100 iterations.

In [6]:

def sigm(t): return numpy.tanh(0.5*t)*0.5+0.5
def realize(t): return 1.0*(t>numpy.random.uniform(0,1,t.shape))

class RBM(AnomalyModel):

def __init__(self,X,h): self.mb = X.shape[0] self.d = X.shape[1] self.h = h

self.lr = 0.1

        # Model parameters
        self.A = numpy.zeros([self.d])
        self.W = numpy.random.normal(0,self.d**-.25 * self.h**-.25,[self.
d,self.h])

self.B = numpy.zeros([self.h]) def fit(self,X,verbose=False):

Xm = numpy.zeros([self.mb,self.d]) for i in numpy.arange(1001):

0.01*self.W)
# Gibbs sampling (PCD)
Xd = X*1.0
Zd = realize(sigm(Xd.dot(self.W)+self.B))
Zm = realize(sigm(Xm.dot(self.W)+self.B))
Xm = realize(sigm(Zm.dot(self.W.T)+self.A))
# Update parameters
self.W += self.lr*((Xd.T.dot(Zd) - Xm.T.dot(Zm)) / self.mb -
self.B += self.lr*(Zd.mean(axis=0)-Zm.mean(axis=0))
self.A += self.lr*(Xd.mean(axis=0)-Xm.mean(axis=0))

if verbose:
# ———————————————— # TODO: Replace by your code
# ———————————————— import solution
solution.track_auroc(self,i)
# ————————————————

def energy(self,X):

# ———————————————— # TODO: Replace by your code
# ———————————————— import solution

    E = solution.rbm_energy(self,X)
    # ------------------------------------------------

return E
We now train our RBM on the same data as the KDE model for approximately 1000 iterations.

In [7]:

rbm = RBM(Xr,100) rbm.fit(Xr,verbose=True)

it= 0 AUROC = 0.962 it = 100 AUROC = 0.943 it = 200 AUROC = 0.985 it = 300 AUROC = 0.987 it = 400 AUROC = 0.988 it = 500 AUROC = 0.986 it = 600 AUROC = 0.987 it = 700 AUROC = 0.987 it = 800 AUROC = 0.989 it = 900 AUROC = 0.986 it = 1000 AUROC = 0.990

We observe that the RBM reaches superior levels of AUROC performance compared to the simple KDE model. An advantage of the RBM model is that it learns a set of parameters that represent variations at multiple scales and with specific orientations in input space. We would like to visualize these parameters:

Task:

Render as a mosaic the weight parameters ( W ) of the model. Each tile of the mosaic should correspond to the receptive field connecting the input image to a particular hidden unit.

In [8]:

# ———————————————— # TODO: Replace by your code
# ———————————————— import solution

solution.plot_weights(rbm)
# ------------------------------------------------