Description
Exercise Sheet 12
Consider a dataset x1, . . . , xN ∈ Rd, and a simple linear feature map φ(x) = w⊤x + b with trainable parameters w and b. For this simple scenario, we can formulate the deep SVDD problem as:
w,b N i=1
where we have hardcoded the center parameter of deep SVDD to 1. We then classify new points x to be
anomalous if ∥w⊤x + b − 1∥2 > τ.
- (a) Give a choice of parameters (w, b) that minimizes the objective above for any dataset (x1, . . . , xN ).
- (b) We now consider a regularizer for our feature map φ which simply consists of forcing the bias term to b = 0. Show that under this regularizer, the solution of deep SVDD is given by:w = Σ − 1 x ̄
where x ̄ and Σ are the empirical mean and uncentered covariance.Exercise 2: Restricted Boltzmann Machine (30 P)
The restricted Boltzmann machine is a system of binary variables comprising inputs x ∈ {0, 1}d and hidden units h ∈ {0, 1}K . It associates to each configuration of these binary variables the energy:
E(x, h) = −x⊤W h − b⊤h and the probability associated to each configuration is then given as:
p(x, h) = Z1 exp(−E(x, h))
where Z is a normalization constant that makes probabilities sum to one. Let sigm(t) = exp(t)/(1 + exp(t))be the sigmoid function.
- (a) Show that p(hk = 1 | x) = sigmx⊤W:,k + bk.
- (b) Show that p(xj = 1|h) = sigmW⊤ h. j,:
(c) Show that
where
p(x) = Z1 exp(−F (x))
K F(x)=−log1+expx⊤W:,k +bk
k=1
is the free energy and where Z is again a normalization constant.
Exercise 3: Programming (50 P)
Download the programming files on ISIS and follow the instructions.
1 N
min ∥w⊤xi + b − 1∥2
Exercise sheet 12 (programming) [SoSe 2021] Machine Learning 2
KDE and RBM for Anomaly Detection
In this programming exercise, we compare in the context of anomaly detection two energy-based models: kernel density estimation (KDE) and the restricted Boltzmann machine (RBM).
In [1]:
import utils
import numpy
import scipy,scipy.special,scipy.spatial import sklearn,sklearn.metrics %matplotlib inline
import matplotlib
from matplotlib import pyplot as plt
We consider the MNIST dataset and define the class “0” to be normal (inlier) and the remain classes (1-9) to be anomalous (outlier). We consider that we have a training set Xr composed of 100 normal data points. The variables
Xi and Xo denote normal and anomalous test data. In [2]: Xr,Xi,Xo = utils.getdata()
The 100 training points are visualized below:
In [3]: plt.figure(figsize=(16,4)) plt.imshow(Xr.reshape(5,20,28,28).transpose(0,2,1,3).reshape(140,560)) plt.show()
Kernel Density Estimation (15 P)
We first consider kernel density estimation which is a shallow model for anomaly detection. The code below implement kernel density estimation.
Task:
Implement the function energy that returns the energy of the points X given as input as computed by the KDE energy function (cf. slide Kernel Density Estimation as an EBM).
In [4]: class AnomalyModel:
def auroc(self):
Ei = self.energy(Xi)
Eo = self.energy(Xo)
return sklearn.metrics.roc_auc_score(
numpy.concatenate([Ei*0+0,Eo*0+1]),
numpy.concatenate([Ei,Eo])
)
class KDE(AnomalyModel):
def __init__(self,gamma):
self.gamma = gamma
def fit(self,X): self.X = X
def energy(self,X):
# ———————————————— # TODO: Replace by your code
# ———————————————— import solution
E = solution.kde_energy(self,X)
# ------------------------------------------------
return E
The following code applies KDE with different scale parameters gamma
anomaly detection model measured in terms of area under the ROC.
In [5]: for gamma in numpy.logspace(-2,0,10):
and returns the performance of the resulting
kde = KDE(gamma)
kde.fit(Xr)
print(‘gamma = %5.3f AUROC = %5.3f’%(gamma,kde.auroc()))
gamma = 0.010 AUROC = 0.957
gamma = 0.017 AUROC = 0.962
gamma = 0.028 AUROC = 0.969
gamma = 0.046 AUROC = 0.976
gamma = 0.077 AUROC = 0.981
gamma = 0.129 AUROC = 0.983
gamma = 0.215 AUROC = 0.983
gamma = 0.359 AUROC = 0.982
gamma = 0.599 AUROC = 0.982
gamma = 1.000 AUROC = 0.981
We observe that the best performance is obtained for some intermediate value of the parameter
gamma .
Restricted Boltzmann Machine (35 P)
We now consider a restricted Boltzmann machine composed of 100 binary hidden units (h ∈ {0, 1}100 ). The joint energy function of our RBM is given by:
E(x, h) = −x⊤ a − x⊤ W h − h⊤ b
The model can be marginalized over its hidden units and the energy function that depends only on the input x is then
given as:
100 E(x)=−x⊤a−∑log(1+exp(x⊤W:,k +bk))
k=1 The RBM training algorithm is already implemented for you.
Tasks:
Implement the energy function E(x)
Augment the function fit with code that prints the AUROC every 100 iterations.
In [6]:
def sigm(t): return numpy.tanh(0.5*t)*0.5+0.5
def realize(t): return 1.0*(t>numpy.random.uniform(0,1,t.shape))
class RBM(AnomalyModel):
def __init__(self,X,h): self.mb = X.shape[0] self.d = X.shape[1] self.h = h
self.lr = 0.1
# Model parameters
self.A = numpy.zeros([self.d])
self.W = numpy.random.normal(0,self.d**-.25 * self.h**-.25,[self. d,self.h])
self.B = numpy.zeros([self.h]) def fit(self,X,verbose=False):
Xm = numpy.zeros([self.mb,self.d]) for i in numpy.arange(1001):
0.01*self.W)
# Gibbs sampling (PCD)
Xd = X*1.0 Zd = realize(sigm(Xd.dot(self.W)+self.B)) Zm = realize(sigm(Xm.dot(self.W)+self.B)) Xm = realize(sigm(Zm.dot(self.W.T)+self.A))
# Update parameters
self.W += self.lr*((Xd.T.dot(Zd) - Xm.T.dot(Zm)) / self.mb -
self.B += self.lr*(Zd.mean(axis=0)-Zm.mean(axis=0)) self.A += self.lr*(Xd.mean(axis=0)-Xm.mean(axis=0))
if verbose:
# ———————————————— # TODO: Replace by your code
# ———————————————— import solution
solution.track_auroc(self,i)
# ————————————————
def energy(self,X):
# ———————————————— # TODO: Replace by your code
# ———————————————— import solution
E = solution.rbm_energy(self,X)
# ------------------------------------------------
return E
We now train our RBM on the same data as the KDE model for approximately 1000 iterations.
In [7]:
rbm = RBM(Xr,100) rbm.fit(Xr,verbose=True)
it= 0 AUROC = 0.962 it = 100 AUROC = 0.943 it = 200 AUROC = 0.985 it = 300 AUROC = 0.987 it = 400 AUROC = 0.988 it = 500 AUROC = 0.986 it = 600 AUROC = 0.987 it = 700 AUROC = 0.987 it = 800 AUROC = 0.989 it = 900 AUROC = 0.986 it = 1000 AUROC = 0.990
We observe that the RBM reaches superior levels of AUROC performance compared to the simple KDE model. An advantage of the RBM model is that it learns a set of parameters that represent variations at multiple scales and with specific orientations in input space. We would like to visualize these parameters:
Task:
Render as a mosaic the weight parameters ( W ) of the model. Each tile of the mosaic should correspond to the receptive field connecting the input image to a particular hidden unit.
In [8]:
# ———————————————— # TODO: Replace by your code
# ———————————————— import solution
solution.plot_weights(rbm)
# ------------------------------------------------





