<div class="alert alert-block alert-success">
    <b>ARTIFICIAL INTELLIGENCE (E016350A)</b> <br>
ALEKSANDRA PIZURICA <br>
GHENT UNIVERSITY <br>
AY 2024/2025 <br>
Assistant: Nicolas Vercheval
</div>

# Regularization - part I

In the theory class, you have seen the role of regularization in regression and classification problems and some common types of regularisation techniques, like $\ell_1$ and $\ell_2$ regularisation. We explained the linear least squares regression with $\ell_2$ regularization (Ridge regression or Tikhonov regularization) and the linear least squares regression with $\ell_1$ regularization (LASSO regression). Now, you experiment with these regularisation approaches and with a combined $\ell_1-\ell_2$ regularization (Elasticnet regression).

The following examples illustrate the standard regularizations that accompany regression models. Other libraries refer to a `penalty` parameter, which enables the setting of some regularization technique.

In [None]:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt 
from sklearn import linear_model, model_selection, metrics, datasets, preprocessing

In [None]:
np.random.seed(7)

We use a set of data to predict real estate prices.

In [None]:
from sklearn.datasets import fetch_california_housing
data = fetch_california_housing()
df = pd.DataFrame(data.data, columns=data.feature_names)
df['Target'] = data.target
df

This set has 8 attributes.

#### Exercise: Split the dataset and perform feature normalization

Split the dataset using a train-test split of 2 : 1. Set the random state to 42.

Perform feature normalization.

In [None]:
X_train, X_test, y_train, y_test = # Your code here...

In [None]:
scaler = # Your code here...

### 1. Linear regression

We use the simple linear regression model as the base model.

#### Exercise: Train a linear regression model on the data

In [None]:
linear = # Your code here...

Model coefficients can be obtained via the `coef_` property.

In [None]:
linear.coef_

We will monitor the performance of the model at the training set and the test set. We will use the coefficient of determination as a metric.

In [None]:
linear_train_score = linear.score(X_train, y_train) 

In [None]:
linear_test_score = linear.score(X_test, y_test) 

In [None]:
print('Training: ', linear_train_score, '\nTesting: ', linear_test_score)

### 2. Ridge regression (Tikhonov regularization)

Recall from the theory class that  linear regression with the square loss function and $\ell_2$ regularisation is called Ridge regression or Tikhonov regularisation. We explained that in this case, the weights $w_i$ are determined by minimising the following cost function: $$\|\textbf{y}-\textbf{Xw}\|^2_2+\lambda\|\textbf{w}\|^2_2.$$ The $\lambda$ parameter is a meta parameter that affects the strength of regularization. For large values of the $\lambda$ parameter, the model is encouraged to have small coefficients. The coefficients obtained in this way can be close to zero, but they are rarely exactly zero because their squared value becomes so small that has little to no impact before that happens.

Working with linear regression models with $\ell_2$ regularization is supported by the `scikit-learn` library via the `Ridge` class. The `alpha` parameter plays the role of the regularization hyperparameter $\lambda $. Its values must be positive numbers.

#### Exercise: Train the `Ridge` regression model on the training data

Use `lambda_ridge` as the $\lambda$ parameter.

In [None]:
# alpha is a different notation for the \lambda parameter (lambda is already a keyword!)
lambda_ridge = 100
ridge = # Your code here...

In [None]:
ridge.coef_

In [None]:
print('Squared sum of the coefficients without regularization:', (linear.coef_ ** 2).sum())
print('Squared sum of the coefficients with Ridge regularization:', (ridge.coef_ ** 2).sum())

In [None]:
ridge_train_score = ridge.score(X_train, y_train) 

In [None]:
ridge_test_score = ridge.score(X_test, y_test) 

In [None]:
print('Training: ', ridge_train_score, '\nTesting: ', ridge_test_score)

### 3. Lasso regression (linear regression with $\ell_1$ regularization)

In contrast to the Tikhonov regularisation, LASSO (Least Absolute Shrinkage and Selection Operator) regularization adds the term $\|w\|_1= \lambda\sum\limits_{i = 1}^{N}{|w_{i}|}$ to the squared error term of the regression model. The $\lambda$ parameter is a meta parameter that affects the strength of regularization. Unlike ridge regression, such models can result in coefficients equal to zero.

Working with linear regression models with lasso regularization is supported by the `scikit-learn` library via the `Lasso` class. The `alpha` parameter plays the role of the regularization hyperparameter $\lambda $. Its value must be a positive number.

#### Exercise: Train the `Lasso` regression model on the training data

Use `lambda_lasso` as the $\lambda$ parameter.

In [None]:
lambda_lasso = 0.01
lasso = # Your code here

In [None]:
lasso.coef_

In [None]:
print('Absolute sum of the coefficients without regularization:', abs(linear.coef_).sum())
print('Absolute sum of the coefficients with Lasso regularization:', abs(lasso.coef_).sum())

In [None]:
lasso_train_score = lasso.score(X_train, y_train) 

In [None]:
lasso_test_score = lasso.score(X_test, y_test) 

In [None]:
print('Training: ', lasso_train_score, '\nTesting: ', lasso_test_score)

### 4. ElasticNet regression (linear regression with $\ell_1$ and $\ell_2$ regularization)

`ElasticNet` is a type of regularization that combines $\ell_1$ and $\ell_2$ regularization. The regularization expression added to the model is $a\cdot \ell_1 + 0.5\cdot b \cdot \ell_2$. For $a=0$, the expression corresponds to ridge regularization, while for $b=0$, the expression corresponds to lasso regularization. This type of regularization is supported by the `ElasticNet` function at the `scikit-learn` library level. The parameters `alpha` and `l1_ratio` are so that $\alpha=a+b $ and $\ell_1\_ratio = \frac{a}{a+b}$.

#### Exercise: Train the `ElasticNet` regression model on the training data

Use `lambda_elastic` as the $\lambda$ parameter, and `l1_ratio` as $\ell_1\_ratio$.

In [None]:
lambda_elastic = 0.005
l1_ratio = 0.5
elastic = # Your code here

In [None]:
elastic.coef_

In [None]:
elastic_train_score = elastic.score(X_train, y_train) 

In [None]:
elastic_test_score = elastic.score(X_test, y_test) 

In [None]:
print('Squared sum of the coefficients without regularization:', (linear.coef_ ** 2).sum())
print('Squared sum of the coefficients with Elasticnet regularization:', (elastic.coef_ ** 2).sum())
print('Absolute sum of the coefficients without regularization:', abs(linear.coef_).sum())
print('Absolute sum of the coefficients with Elasticnet regularization:', abs(elastic.coef_).sum())

In [None]:
print('Training: ', elastic_train_score, '\nTesting: ', elastic_test_score)

### Visualization of the model coefficients

In [None]:
number_of_features = len(data.feature_names)
plt.figure(figsize=(10, 5))
plt.xticks(np.arange(0, number_of_features), data.feature_names, rotation='horizontal')
plt.plot(linear.coef_, '^', label='Without regularization' )
plt.plot(ridge.coef_, 'o', label=f'Ridge regression (alpha = {lambda_ridge})')
plt.plot(lasso.coef_, 'v', label=f'Lasso regresion (alpha = {lambda_lasso})')
plt.plot(elastic.coef_, 'x', label=f'ElasticNet regression (alpha = {lambda_elastic})')
plt.plot(np.arange(0, number_of_features), np.zeros(number_of_features), color='gray', linestyle='--')
plt.legend(loc='best')
plt.show()

The values of hyperparameters that occur in regularized models are determined in the same way as the hyperparameters of the models observed so far.