WEBVTT

00:01.540 --> 00:06.760
In this session, we will discuss about the mathematics behind logistic regression.

00:07.330 --> 00:09.640
So first of all, let us look at a few items.

00:10.120 --> 00:12.310
So the first stop is ORMs.

00:14.330 --> 00:22.220
The odds are the chances that something will happen now, if you flip a coin, the odds are 50/50,

00:22.760 --> 00:24.360
that is you'll get hit.

00:24.980 --> 00:31.190
Now, if something strange happens, people will often see what were the odds of that, which means

00:31.400 --> 00:33.480
that I can't believe that happened.

00:33.680 --> 00:35.340
The odds were against it.

00:35.780 --> 00:37.500
So how would we find that?

00:37.670 --> 00:48.230
So, for example, the odds in favor of a ruling or two or three or six cited by one is to five or one

00:48.380 --> 00:49.460
divided by five.

00:49.760 --> 00:58.460
That is odds against will be and against an even equal number of unfavorable outcome divided by the

00:58.460 --> 00:59.960
number of favorable outcome.

01:00.230 --> 01:10.850
So here, let us say, the number of times our team has already won is one that a team has lost the

01:10.850 --> 01:12.370
game four times.

01:12.770 --> 01:19.820
So the odds in favor of the first team winning is one is to fourth with five wins.

01:19.820 --> 01:21.400
The total number of games.

01:21.650 --> 01:24.360
This is also written as one is to four.

01:25.130 --> 01:26.810
Now, what is probability?

01:27.170 --> 01:31.190
Probability is the odds of something happening or what?

01:31.190 --> 01:34.700
All the popular cases, all the possible cases.

01:34.880 --> 01:43.430
So probability in this case will be one to five where one is the chance of this team winning, divided

01:43.430 --> 01:45.830
by the total number of cases with just five.

01:46.100 --> 01:53.330
While Vilvoorde is not equal probability itself, it is the ratio of something happening over something

01:53.600 --> 01:54.700
not happening.

01:55.070 --> 02:00.900
Probability is ratio of something happening over all the possible cases.

02:00.920 --> 02:03.710
So this is the difference between odds and probability.

02:04.640 --> 02:10.790
OK, now let us see, we have odds of winning as five to three.

02:11.210 --> 02:17.150
That is five times IBM has one and three times the team has lost the game.

02:17.690 --> 02:19.490
So what will be the odds value?

02:19.490 --> 02:21.780
The value will be five divided by three.

02:21.800 --> 02:23.430
That is one point seven.

02:24.230 --> 02:26.480
Now, what will be the probability of winning?

02:26.690 --> 02:34.010
The probability of winning will be five, but just the same as the giants of the team winning, divided

02:34.010 --> 02:39.140
by the total number of values that this five plus three comes out to be eight.

02:40.420 --> 02:44.830
So the probability of winning will be five by eight here.

02:46.110 --> 02:53.880
Which is zero point six to five now, because probability of winning is equal to opposite of probability

02:53.880 --> 02:57.210
of losing, that is one minus probability of losing.

02:57.600 --> 03:00.930
So the probability of losing will be three by eight.

03:01.860 --> 03:03.760
That is zero point three seven five.

03:04.080 --> 03:09.140
We can also write as a probability of losing equal to one minus probability of winning.

03:09.840 --> 03:16.440
So which will be equal to one minus zero point sixty five, equal to zero point three seven five.

03:17.390 --> 03:25.550
Right, so probability of winning is zero point six dubi, but just five by eight probability of losing

03:25.550 --> 03:33.620
is three by eight, and probability of losing can also be written as one minus probability of winning.

03:34.920 --> 03:42.870
Now, probability of winning divided by probability of losing candidate A. probability of winning,

03:42.870 --> 03:45.120
divided by one minus probability of winning.

03:46.020 --> 03:50.640
We have just replaced probability of losing by one minus probability of winning.

03:51.630 --> 03:59.700
So this gives us five by eight, divided by three by which comes out to be five by three, which is

03:59.700 --> 04:02.760
actually what it does, the odds of winning.

04:03.740 --> 04:10.990
So you can say that probability of winning divided by probability of losing is equal to or winning,

04:11.720 --> 04:12.070
right?

04:12.320 --> 04:22.280
So if the goal, the probability of winning as fee, then we can also be divided by one minus the.

04:24.340 --> 04:32.860
You can take some moment for the slide and try to understand what I just said, that is the what is

04:32.860 --> 04:33.960
the odds of winning?

04:33.970 --> 04:35.830
What is the probability of winning?

04:36.100 --> 04:42.200
How probability of winning is related to the probability of losing and all of this mathematics.

04:42.970 --> 04:47.470
Now, the odds value, which is five, is to three.

04:47.710 --> 04:51.400
This odds value will range from zero to one.

04:52.480 --> 04:58.450
For against scenario, they range from one to infinity for favorite movies.

04:59.320 --> 05:01.840
So let's see if.

05:02.960 --> 05:09.140
Our theme is the value of is if the above value, if the odds of winning.

05:10.640 --> 05:16.050
Is a larger value rate, the odds of winning is a larger value here.

05:16.280 --> 05:24.680
So in this particular case, the favorable case will be of greater value divided by the smaller value.

05:25.500 --> 05:31.590
A favorable case will always be of greater value divided by the smaller value, then only it will be

05:31.590 --> 05:33.550
the in favor of the being right.

05:33.840 --> 05:40.950
So it will always be a greater number divided by a smaller number, which means that it will always

05:40.950 --> 05:42.030
be greater than one.

05:43.630 --> 05:47.890
So the value greater than one can range from one to infinity.

05:48.670 --> 05:56.740
Now when we are discussing both against scenario so four against scenario, it will always be a smaller

05:56.740 --> 05:58.390
dome divided by the larger.

06:01.280 --> 06:09.020
So when this would happen, then the value will be between zero to one and a smaller number divided

06:09.020 --> 06:15.020
by a larger number, will always give us a value which is ranging from zero to one.

06:15.950 --> 06:23.540
So this is what we get from or so of value, which is ranging from zero to one, is always be present

06:23.540 --> 06:27.200
for or against scenario and for favorable scenario.

06:27.200 --> 06:31.520
Odds will always range from one to infinity.

06:33.350 --> 06:33.860
Now.

06:39.670 --> 06:41.560
Now, when you see this.

06:42.440 --> 06:43.910
Four hours of winning.

06:44.840 --> 06:46.430
So although veni.

06:47.500 --> 06:50.920
Will be a favorable scenario right now.

06:51.060 --> 06:51.930
Let's go for the.

06:54.310 --> 07:03.310
Now, let us see, we have these equations which we have formulated, which is probability of Y is equal

07:03.310 --> 07:10.810
to one given the excess true is B they we just said we will say probability of winning.

07:13.500 --> 07:24.000
What is being that this is probability of Y is equal to one given X is equal to B, then law will be

07:24.900 --> 07:31.770
divided by one minus V will be better not let's be double next one plus X.

07:33.300 --> 07:37.200
So here what we are doing is we have this equation.

07:40.150 --> 07:46.960
Where we were seeing that the probability is equal to be done or plus with our next one, plus we do

07:46.960 --> 07:49.970
X to be the extra and so on.

07:50.350 --> 07:56.950
So we are seeing that this is the probability value now because this probability value was ranging from

07:56.950 --> 07:57.780
zero to one.

07:58.060 --> 08:02.660
And this equation has a range from minus infinity to infinity.

08:02.950 --> 08:12.430
So what we can do is we can convert this equation into the same range value by taking a log of.

08:13.550 --> 08:19.060
So if we will meet the goal, all of it, then the range mismatch will be resolved.

08:20.100 --> 08:28.020
OK, so what we do here, we will take a look and now for taking a look at the situation and on solving

08:28.020 --> 08:31.090
the equation for VA, we get the equation.

08:31.150 --> 08:37.910
The VA is equal to one upon one plus E to the power minus Vetinari plus vedova an excellent as that

08:38.280 --> 08:38.880
went on.

08:39.390 --> 08:44.700
And this is actually the logic function which we have discussed about.

08:45.210 --> 08:51.810
This is the logic function we have formed for this equation, which will convert this linear line into

08:51.810 --> 08:55.530
this goal line, which will be able to map to the.

08:56.710 --> 08:58.210
Two classes, which we have.

09:00.430 --> 09:04.600
So why do we do this with this equation now?

09:04.810 --> 09:12.070
The objective here is not to correctly estimate the value of log the by one minus B, so this is not

09:12.070 --> 09:14.290
the objective V one.

09:14.290 --> 09:22.540
Do either maximize or minimize the value of B such that we get either a low probability or higher probability

09:22.540 --> 09:27.100
that it's either somebody would just close to zero or somebody would just close to one.

09:27.910 --> 09:36.130
So here we will be having the probability value as some value as zero point one zero point three zero

09:36.130 --> 09:37.370
point five zero point seven.

09:37.390 --> 09:42.010
So these would be different probability values, which will be ranging from zero to one.

09:42.550 --> 09:44.930
Now we have these probability values.

09:45.160 --> 09:53.010
Now we want to change the values such that when we take this log of this, the this function will actually

09:53.040 --> 09:56.350
tell the values closer to zero and one.

09:56.740 --> 10:04.950
So the values will be if the values closer to zero, then it is more prominent that the answer is zero.

10:05.080 --> 10:06.610
That is because of zero.

10:06.610 --> 10:13.420
Plus, if the value of the probability is closer to one, that means that the answer is one.

10:13.420 --> 10:17.350
That is the probabilities of the first class.

10:19.110 --> 10:27.720
Now we have this likely war to function, but just create it so that we can formulate the same thing.

10:29.370 --> 10:38.220
So here, the objective is not to correctly estimate the value of log be divided by one minus five.

10:38.740 --> 10:48.210
OK, so we want that barometer's to be values which reside in such a score or a probability which enables

10:48.210 --> 10:50.490
us to have a good cut of value.

10:50.940 --> 10:59.610
So if the values will be closer to zero or closer to one, then we can have a lot of value which will

10:59.610 --> 11:03.090
be more prominent, which will be more helpful.

11:03.240 --> 11:09.740
If the total value is not created properly, then some points might be misclassified.

11:09.930 --> 11:12.330
And that is something which we don't want to happen.

11:14.100 --> 11:20.140
So meaning this school should be high for one class and low for another class.

11:20.160 --> 11:26.010
We want to have a good distance between both of these school values.

11:26.920 --> 11:29.500
Now, let us consider this particular equation.

11:32.040 --> 11:41.520
So here, when Y is equal to zero, when Y will be zero at that point of time, the value of likelihood

11:41.520 --> 11:43.590
this function is called likelihood function.

11:43.800 --> 11:48.600
So when Y will be zero, the equation will go on and do one minus B.

11:49.910 --> 11:58.040
Because this value will be zettl, so this dome will turn into one, this value will be one minus one.

11:58.050 --> 11:59.320
So this will turn into.

12:01.210 --> 12:07.590
This this will be one minus zero, so it will turn into one, so we will get one minus B as the equation

12:08.080 --> 12:12.130
now when the value of Vye will be one.

12:12.280 --> 12:16.090
So what will happen to the other one will be probability.

12:16.390 --> 12:19.190
Now, this will be one minus one zero.

12:19.510 --> 12:22.690
So this will give up one here.

12:23.470 --> 12:30.340
So that is why the entire tone of likelihood function will go on and do the.

12:35.950 --> 12:44.650
So in other words, we want to maximize this likelihood function, we want to achieve this value because

12:44.650 --> 12:51.760
if we will maximize the likelihood function, then the value of likelihood will go into either B or

12:51.760 --> 12:52.540
my one mind.

12:52.540 --> 12:57.250
The speed, which is not a good hit that is then is equal to zero.

12:57.250 --> 13:02.410
The likelihood equals probability of being one then by is equal to zero.

13:02.410 --> 13:07.220
The likelihood of probability will be of Y will be zero.

13:07.600 --> 13:11.160
So we want that probabilities to match with the real outcome.

13:11.440 --> 13:14.990
Hence we want to maximize the value of L.

13:15.250 --> 13:17.410
So that is the value.

13:17.410 --> 13:21.700
When Y is equal to one, the likelihood will be probability.

13:21.910 --> 13:29.920
That is, it will go on and do one value of maximum value and then the negative form will diminish.

13:30.130 --> 13:31.510
The storm will diminish.

13:31.750 --> 13:39.700
And when the value of Y is equal to zero, then this entire goal will be reduced to almost zero.

13:40.300 --> 13:42.760
So this is what the likelihood will do to us.

13:43.030 --> 13:47.650
Now, this function which we have created is for only one observation.

13:48.760 --> 13:54.560
So we will have to convert it into a combination of all the observation.

13:55.570 --> 14:05.110
So for this, we can do this by basically multiplying all of these storms together, because in terms

14:05.110 --> 14:09.520
of probability, then we want to combine different probabilities.

14:09.520 --> 14:11.860
We will be multiplying these probabilities.

14:12.190 --> 14:19.300
I in earlier scenario, when we were combining the different medications for different rules of beta,

14:19.600 --> 14:20.860
we do the same mission.

14:21.130 --> 14:23.710
But in in this case, we have probabilities.

14:23.710 --> 14:27.070
So we will be taking a product of the probabilities.

14:28.850 --> 14:32.330
So we will have to maximize this collective form.

14:32.510 --> 14:34.880
We will have to maximize this entire thing.

14:35.660 --> 14:40.010
So for maximizing this and the entire thing, we will be.

14:41.380 --> 14:47.860
Then voting this into a cost function and we will be taking the logo of this entire thing and this becomes

14:47.860 --> 14:50.410
the cost function of this entire.

14:51.290 --> 14:57.470
Of likelihood function for logistic regression, this is the cost function of the logistic regression.

14:58.400 --> 15:04.340
So next is different performance metrics, which we will be launching later.

15:04.960 --> 15:12.200
But for now, what do you need to understand is that even if you don't understand the more details of

15:12.200 --> 15:19.790
the entire concept here, which we have just discussed, you just need to remember that our main purpose

15:19.820 --> 15:24.500
is to throw in the values we want to get the probability value.

15:25.620 --> 15:29.700
Which is closer to zero, and one that is because.

15:30.970 --> 15:33.220
When we have this kind of plot.

15:34.160 --> 15:42.890
When we have this kind of plot, our main target is to set up a value which is a cartel value.

15:44.740 --> 15:50.470
Now, the current value will supposedly be in the middle of this line.

15:51.310 --> 15:57.940
So when the value is in the middle of this line, what we want to achieve is if the probability value

15:57.940 --> 16:04.570
will be very low and very high, then what will happen is wherever we will put the cost of value, these

16:04.570 --> 16:07.060
points will not get misclassified.

16:08.530 --> 16:17.260
The line will be more towards if the cutoff, if the distance is more, if the distance between these

16:17.260 --> 16:20.350
is more, then the line would do something like.

16:22.330 --> 16:22.870
This.

16:24.530 --> 16:29.750
So this will be able to make a better decision, give a better decision boundary.

16:31.280 --> 16:38.410
So that is what we want to achieve instead of having the the values of the probabilities very close.

16:38.630 --> 16:40.980
We want to have the value of probabilities.

16:41.000 --> 16:47.210
Finally, when the values of probabilities will be far away, then the value can range from anywhere

16:47.210 --> 16:48.590
to anywhere like this.

16:48.860 --> 16:55.300
And we will still be able to classify the data properly because now the data is far from each other,

16:55.520 --> 16:57.950
the probability values are far from each other.

16:57.980 --> 17:01.200
So any cutoff will still make that magnetism.

17:02.510 --> 17:08.720
But when those values will be closer like this, when the values will be closer like this.

17:09.680 --> 17:14.750
In that case, if the values are closer like this, somewhat like this.

17:16.210 --> 17:22.200
Then the cutoff will have to be a very strict one, then it will be very important to find the perfect

17:22.210 --> 17:23.030
cutoff for you.

17:23.620 --> 17:31.030
But in this scenario, because the values are already far away, we can have any of value between this

17:31.030 --> 17:35.170
to this end of value will give a good result.

17:35.890 --> 17:37.860
So that is what we want to obtain.

17:38.080 --> 17:45.310
And the only formula that you want to remember from this entire thing is the formula for the logic function

17:45.670 --> 17:47.860
that is being equal to.

17:48.980 --> 17:55.760
One upon one plus E to the power be done on Glasby, Delmonico's one, Buzby that way too and so on,

17:56.120 --> 18:02.780
when this be done will be the one that will be the professions which you will be generating an X one

18:02.780 --> 18:09.920
extra extra hour, different values of the features which we have, the feature values, the attributes

18:09.920 --> 18:10.660
which we had.

18:10.910 --> 18:13.550
So it is similar to the linear regression.

18:13.550 --> 18:19.430
But the only thing is we have applied this logistic function instead of the normal linear regression

18:19.700 --> 18:23.120
so that we can get this kind of transformation here.

18:23.450 --> 18:27.950
This is the only thing which is important for you to understand here.

18:29.380 --> 18:36.400
Now, in the next session, we will learn about different types of metrics which are available, and

18:36.400 --> 18:43.120
then we will go ahead with the implementation of this so that you will get a better picture of what

18:43.180 --> 18:44.430
you are trying to do here.
