1
00:00:11,670 --> 00:00:17,010
In this lecture we are going to look at an alternative and better way of building a logistic regression

2
00:00:17,010 --> 00:00:19,710
model in PI talk as usual.

3
00:00:19,710 --> 00:00:25,050
You can look at the title of the notebook to determine what notebook we are currently looking at the

4
00:00:25,050 --> 00:00:32,270
motivation behind doing this is that when we perform the sigmoid we have to do an exponential operation.

5
00:00:32,280 --> 00:00:37,500
The problem with that is that exponential is our numerically unstable due to the fact that they grow

6
00:00:37,500 --> 00:00:41,880
very fast with respect to the input.

7
00:00:41,890 --> 00:00:48,340
In addition the long term in the binary cross entropy is also unstable for similar reasons.

8
00:00:48,340 --> 00:00:54,610
Well it turns out that because the binary cross entropy loss has a log in it it almost cancels out the

9
00:00:54,610 --> 00:00:59,550
exponential in some sentence although we won't go through any derivations in this course.

10
00:00:59,650 --> 00:01:05,140
The end result is that if you know you have to do both the sigmoid and then the binary cross entropy

11
00:01:05,140 --> 00:01:11,640
loss right after there is a way to combine them into a single function so that you avoid numerical instability

12
00:01:12,660 --> 00:01:13,430
in particular.

13
00:01:13,440 --> 00:01:18,410
You can express the loss only in terms of the activation which is also called the logic.

14
00:01:18,420 --> 00:01:23,560
In this case if you've ever studied statistics then you may have heard of this term before.

15
00:01:23,580 --> 00:01:25,400
Otherwise don't worry.

16
00:01:25,410 --> 00:01:31,140
Basically the logic is just the input into the logistic function and the logistic function is just another

17
00:01:31,140 --> 00:01:35,930
way of saying the sigmoid function.

18
00:01:35,950 --> 00:01:42,040
So in this script pretty much everything is exactly the same except in two places the first place is

19
00:01:42,040 --> 00:01:43,150
when we create the model

20
00:01:52,640 --> 00:01:57,200
so before we use the sequential and combined the linear layer with a sigmoid layer.

21
00:01:57,800 --> 00:02:04,190
But in this script we no longer need the sigmoid since that's included in the cost calculation and therefore

22
00:02:04,190 --> 00:02:07,390
our model goes back to being just a linear model.

23
00:02:07,610 --> 00:02:10,380
The output of this model is the logic.

24
00:02:10,400 --> 00:02:18,030
Then when we create the lost function we use BCE with logic loss instead of BCE loss like we were earlier.

25
00:02:18,380 --> 00:02:23,740
As its name suggests this function calculates the binding across entropy loss directly from the lodge

26
00:02:23,740 --> 00:02:24,230
it's

27
00:02:27,170 --> 00:02:30,110
as you can see everything after this is the same as well.

28
00:02:31,130 --> 00:02:34,100
So our training loop is the same and we get the same results.

29
00:02:43,650 --> 00:02:48,060
One more difference we have to pay attention to is when we make predictions.

30
00:02:48,060 --> 00:02:54,990
If you recall the sigmoid always outputs a number between 0 and 1 and we interpret that as a probability.

31
00:02:54,990 --> 00:02:59,280
So when we want to make a prediction we simply round those probabilities.

32
00:02:59,280 --> 00:03:02,460
Anything greater than zero point five is a prediction of one.

33
00:03:02,610 --> 00:03:05,930
And anything less than zero point five is a prediction of zero.

34
00:03:06,060 --> 00:03:11,040
But what do our predictions look like now since the output can again be any number.

35
00:03:11,400 --> 00:03:15,120
In this case we just go back to our geometrical picture.

36
00:03:15,150 --> 00:03:17,540
You can also think of what the sigmoid actually does.

37
00:03:17,610 --> 00:03:20,560
And if you follow that logic this should make sense.

38
00:03:22,040 --> 00:03:28,160
If you recall anything on one side of the hyper plain defined micro model is positive and anything on

39
00:03:28,160 --> 00:03:30,140
the other side is negative.

40
00:03:30,140 --> 00:03:35,040
The sigmoid simply maps all the positive numbers to probabilities greater than zero point five.

41
00:03:35,210 --> 00:03:40,150
And similarly it maps all the negative numbers to probabilities less than zero point five.

42
00:03:40,160 --> 00:03:42,540
Thus it's very easy to make predictions.

43
00:03:42,590 --> 00:03:46,960
We simply check if the model output is greater than zero or less than zero.

44
00:03:47,060 --> 00:03:53,370
Anything greater than zero is considered a one prediction and anything less than zero is a zero prediction.

45
00:03:53,390 --> 00:03:58,340
Note that we can use the greater than sign directly even though the greater than sign returns truths

46
00:03:58,340 --> 00:04:04,310
and falsities as we discussed earlier in python a true is 1 and False is 0.

47
00:04:04,310 --> 00:04:07,310
Therefore we don't need to do any further processing.

48
00:04:07,310 --> 00:04:12,200
Once we have our predictions we can check the model accuracy and again we can verify that the result

49
00:04:12,230 --> 00:04:12,890
is the same.
