Need for Regularisation in Logistic Regression

Vishal Sharma
Jun 6, 2021
3 min read

Updated: Jun 7, 2021

When we hear about Data Science or Artificial Intelligence , the first thought stuck in our mind is “how fascinating this thing is”. But as we go deep down , we get to know that AI is mixture of mathematical and other areas of science. Example -> Statistics , Graph Theory , Python Programming and so on….

The main focus of this blog is to give you a taste of a Machine learning concept along with its mathematical formulation.

Ok, Let’s first understand our roadmap….

Introduction
Optimisation equation
Problem definition
Solution strategy
L1 regularisation
L2 regularisation
Elastic net

Introduction

Logistic Regression is one of the most important and simple model in Machine Learning. It is easy to apply(check sklearn documentation for Logistic Regression). It is a statistical model and generally used for Binary Classification.

NOTE:- Don’t go by its name , It is a classification technique not regression.

Optimisations equation

Optimisation equation of logistic regression

Don’t get afraid from this typical complex mathematical equation. Once you go through Logistic Regression concepts , then it will feel just like a piece of cake.

Ok , let’s look forward to learn “Need for Regularisation” problem definition in Logistic Regression….

Problem Definition

Okay lets take exp(-zi) part from the optimisation equation , If we plot the graph of exp(-zi) then we get something like this.

From the graph we can note some important takings as:

1 = Graph is always positive

2 = As zi move towards infinity , our function will be 0

Okay now the big question is , When does this zi becomes positive or infinity ?

As we know zi = yiwTxi ,Here yi and xi are points from training data so they are fixed , So we left only with wT , hence if we modified our w in such a way that our zi becomes positive or infinity then our classifier works well. Hence our ideal/optimal w will be which make all the training points classified well and zi tends to infinity to get our minima.

But there is a problem

1 = if all training points classified well then we jump into overfitting problem.

2 = To make zi tends to infinity , we have to make wi = +infinity/-infinity according to our yi . Hence our w is becoming very very large

Solution

To overcome from these problems we take regularisation into picture ……

Regularisation are of two types :-

1= L2 regularisation

2= L1 regularisation

L2 regularisation

How it works ?

This reg term will ensure that our w doesn't become +/- infinity

Hence when Wj tends to then zi tends to infinity , we got minima(loss term becomes zero) but as well as our regularisation term goes to infinity . So we have only minimise our loss term but we have to minimise both the terms…..

We can see this as TUG OF WAR , If one increase other decrease , So we have to balance these two terms to get our solution….

Now what about lambda ?

Lambda = hyper-parameter

When lambda. = 0 , reg term = 0 -> we get back our same old problem of OVER-FITTING , When lambda = large , influence of loss term decreases/reduced means we will not using training data anymore, this take us to UNDER-FITTING

L1 regularisation

Here we just replace or L2 norm of or Weight Vector with L1 norm. Now you are thinking Is it still able to solve our problem. And the answer is yes. Just look at the equation , still nothing has changed. Ok i will explain you in short :-

If w tends to infinity , l2 regularisation term also tends to infinity , because is is just square of absolute value of w.

Hence you can see that here we also get to see that this term tends to infinity as w tends to infinity. In both the situation we get our regularisation term tends to infinity.

Then what is difference between L1 norm and L2 norm ?

-> L1 norm creates sparsity in our solution(w*) as compared to L2 norm.(Trust me , i will make a blog on it very soon….)

Elastic net

I know , you got a question like , Can i use both the regularisation ideas at same time ?

Yes , you can use both and that is a known as Elastic - Net.

After using Elastic-Net our equation looks like this :-

Need for Regularisation in Logistic Regression

Recent Posts

Comments