1
00:00:00,810 --> 00:00:07,140
Hi and welcome to another introduction to regularization technique called early stopping.

2
00:00:07,710 --> 00:00:11,460
This is a very easy one to implement and understand, as I said previously.

3
00:00:11,910 --> 00:00:17,100
So the overturning problem, so generally at some point during a training process, let's say we set

4
00:00:17,100 --> 00:00:21,900
the box at 100 at some point, maybe 40, 50, 60 epochs.

5
00:00:22,440 --> 00:00:28,380
Our validation laws is going to stagnate or stop start decreasing or even sometimes start to increase,

6
00:00:28,380 --> 00:00:29,390
which is not a good thing.

7
00:00:29,400 --> 00:00:32,700
It means that it's over fitting on the training data set at that point.

8
00:00:33,450 --> 00:00:39,930
So at that point, you really should stop treating your wasting time, wasting computational resources,

9
00:00:39,930 --> 00:00:43,020
wasting electricity, contributing to climate change at that point.

10
00:00:43,020 --> 00:00:45,810
So we need to find a way to stop at that point.

11
00:00:45,840 --> 00:00:46,920
So how do we do that?

12
00:00:46,980 --> 00:00:53,520
Well, basically what we're doing here this this technique where we stop training early is called early

13
00:00:53,520 --> 00:00:53,920
stopping.

14
00:00:53,940 --> 00:00:54,690
It's quite simple.

15
00:00:55,140 --> 00:00:56,430
So here's how it's implemented.

16
00:00:56,580 --> 00:01:02,090
So this is the diagram here shows a typical validation error and training error loss.

17
00:01:02,100 --> 00:01:09,870
You can see training error continuously goes down because if we give it so many books to train, it's

18
00:01:09,870 --> 00:01:12,120
going to keep getting better and better when the training dataset.

19
00:01:12,240 --> 00:01:19,380
However, at some point it's going to stop getting better and the validation set of data this point

20
00:01:19,380 --> 00:01:22,320
here key minus p kidswear to stop.

21
00:01:22,740 --> 00:01:29,700
Let's stops here initially, and P is where basically it's stopped training and stop producing volition

22
00:01:29,700 --> 00:01:30,000
error.

23
00:01:30,570 --> 00:01:32,010
This is where we want to stop.

24
00:01:33,060 --> 00:01:38,130
Basically, it's simply area where we want to stop here, and it's a simple algorithm in telling this.

25
00:01:38,280 --> 00:01:40,710
Telling your model when to stop training.

26
00:01:41,460 --> 00:01:47,010
How it works is basically we just have a simple conditional loop if you want to manually implement it

27
00:01:47,010 --> 00:01:48,030
in your training process.

28
00:01:48,480 --> 00:01:55,650
That says if our validation loss has stopped decreasing after X number of epochs, now let's just stop

29
00:01:55,650 --> 00:01:59,550
the model and see if the we'd see a saved the best tweets where it was, Lois.

30
00:02:00,030 --> 00:02:06,570
That's all it is and it is stopping is basically it's a very simple way to reduce overfitting, as I

31
00:02:06,570 --> 00:02:12,420
said, because at that point, it stops a model from overfitting on the training dataset and it can

32
00:02:12,420 --> 00:02:18,650
be manually implemented and paid too much because as you saw in the PyTorch, could we can we manually

33
00:02:18,680 --> 00:02:20,280
treating it on every epoch?

34
00:02:20,280 --> 00:02:24,570
You can actually see the code breaking down as we go forward back, propagate all of those things.

35
00:02:24,570 --> 00:02:30,630
So we have easy access to obtaining the losses, data validation losses so we can easily test fitsum

36
00:02:30,630 --> 00:02:34,590
code and data check and see if it's stopped decreasing, then stop training.

37
00:02:35,190 --> 00:02:42,510
Or there's also which is not commonly available and commonly known, but PyTorch does have some API.

38
00:02:42,520 --> 00:02:48,330
High level libraries like Lightning is one of them which will be living in discourse that allow us to

39
00:02:48,330 --> 00:02:52,950
implement these features quite easily because something is actually very easy to implement in.

40
00:02:53,430 --> 00:02:54,390
It doesn't take much work.

41
00:02:54,840 --> 00:03:00,300
It uses something called callbacks, and we can do a number of things with callbacks such as saving

42
00:03:00,300 --> 00:03:03,600
the best model, early stopping and a bunch of other things.

43
00:03:04,140 --> 00:03:05,520
So it's easy to implement.

44
00:03:05,730 --> 00:03:08,640
It's not not a big concern you should have right now.

45
00:03:09,420 --> 00:03:11,130
So that's it for this lesson.

46
00:03:12,000 --> 00:03:17,400
Just to reiterate, it is something that allows it to not put, not over fits on the training data,

47
00:03:17,550 --> 00:03:22,260
which means that when overfitting you're learning patterns of noise in your training data.

48
00:03:23,040 --> 00:03:26,730
Next, we'll take a look at batch normalization.

49
00:03:27,330 --> 00:03:29,370
So thank you, and I'll see you the next lesson.
