1
00:00:00,300 --> 00:00:06,780
Hi and welcome to the lesson on understanding, overfitting and generalization, two very important

2
00:00:06,780 --> 00:00:12,030
topics in deep learning because overfitting is a constant problem with deep learning.

3
00:00:12,750 --> 00:00:15,930
So how do we make sure our model is good?

4
00:00:16,200 --> 00:00:18,870
What, what metric or what metrics do we look at?

5
00:00:19,350 --> 00:00:24,540
So we've covered several different metrics to assess model performance, and in reality, there's no

6
00:00:24,540 --> 00:00:26,100
one metric you should be looking at.

7
00:00:26,310 --> 00:00:28,530
It all depends on what you're using your model for.

8
00:00:28,920 --> 00:00:32,490
As I showed you in disease prediction, false positives could be fine.

9
00:00:32,820 --> 00:00:37,980
However, for credit card fraud, a false positive might be annoying for a customer.

10
00:00:38,910 --> 00:00:44,550
Things like spam, you may you don't want to miss important emails to have it classified as spam, so

11
00:00:44,550 --> 00:00:50,010
your spam filters are always set to allow some of the spam to go through just so you don't miss important

12
00:00:50,010 --> 00:00:50,460
emails.

13
00:00:51,180 --> 00:00:54,600
So, as you can see, it's quite subjective to what your model is doing.

14
00:00:55,330 --> 00:01:01,800
So generally, how do we know when we obtain good performance and how do we obtain good performance?

15
00:01:02,400 --> 00:01:03,750
Well, do we use a lot of data?

16
00:01:04,470 --> 00:01:06,470
Do we make deeper, more complicated models?

17
00:01:06,480 --> 00:01:08,280
Do we train for more epochs?

18
00:01:08,880 --> 00:01:11,110
Well, we can do all of those things.

19
00:01:11,130 --> 00:01:16,320
However, there's a quite a few other techniques that improve overfitting, which we'll discuss in the

20
00:01:16,320 --> 00:01:17,310
upcoming chapters.

21
00:01:17,700 --> 00:01:22,410
However, for now, let's define what overfitting and generalization is.

22
00:01:23,160 --> 00:01:27,420
So overfitting is a very common issue that plagues modern development.

23
00:01:27,940 --> 00:01:30,720
It's to be an of all deep learning.

24
00:01:30,900 --> 00:01:33,720
You don't want it over fit to your training data, but what does that mean?

25
00:01:34,530 --> 00:01:40,230
It basically means that your model has been over trained on the training data and ends up performing

26
00:01:40,230 --> 00:01:43,830
poorly on the test of validation data set so unseen data.

27
00:01:45,270 --> 00:01:50,250
This often happens when we don't have enough data or we have used too many features or developed an

28
00:01:50,250 --> 00:01:51,720
overly complex model.

29
00:01:52,140 --> 00:01:56,400
So our model is very, very good at understanding the training data.

30
00:01:56,790 --> 00:02:02,640
However, it basically learned the wrong things to basically predict on new data.

31
00:02:02,820 --> 00:02:03,900
And that's the problem.

32
00:02:04,390 --> 00:02:09,690
It's either a problem of not having enough data or maybe having a overly complex model number for this

33
00:02:09,690 --> 00:02:10,000
use.

34
00:02:10,020 --> 00:02:11,610
But it's a lot of ways we can fix this.

35
00:02:11,610 --> 00:02:12,450
So don't worry.

36
00:02:13,500 --> 00:02:15,030
Let's talk about generalization, though.

37
00:02:15,630 --> 00:02:22,080
Basically, this is a measure of how well our model performs an unseen data models that over to the

38
00:02:22,080 --> 00:02:24,330
training data, are performed poorly on your data.

39
00:02:24,600 --> 00:02:25,590
We just said that.

40
00:02:25,980 --> 00:02:32,040
So that would mean they have poor generalization, and a model that's towards a lot of information has

41
00:02:32,040 --> 00:02:35,510
the potential to be more accurate by leveraging features.

42
00:02:35,520 --> 00:02:40,650
However, this puts it at a risk of storing a lot of irrelevant features.

43
00:02:41,040 --> 00:02:42,540
And I'll give you an example of that.

44
00:02:43,020 --> 00:02:44,190
Let's take a look at this.

45
00:02:44,220 --> 00:02:50,040
Imagine you were given tree images of cores and tree images of fire trucks, and you had tools, basically

46
00:02:50,040 --> 00:02:51,510
a classifier between them.

47
00:02:52,740 --> 00:02:54,720
What if you get this input here?

48
00:02:54,890 --> 00:02:55,590
Red car?

49
00:02:56,670 --> 00:03:01,770
If your model learned to use a color red, as you can see, all these fire engines are red and all these

50
00:03:01,770 --> 00:03:08,010
cars have very little red to identify fire trucks, then you might have classified this car as a fire

51
00:03:08,010 --> 00:03:08,370
truck.

52
00:03:08,820 --> 00:03:14,970
This means this is a very good example of you having over fits your model to the training dataset.

53
00:03:15,510 --> 00:03:20,310
So it didn't learn that it caused different featured model in that color was differentiating factor,

54
00:03:20,820 --> 00:03:26,190
which happens when we don't have enough data or we have an overly complex model that has been learning

55
00:03:26,190 --> 00:03:27,150
the wrong things.

56
00:03:27,860 --> 00:03:32,190
So let's look at a simple example.

57
00:03:32,790 --> 00:03:37,350
We have tree models here, and the models are trained to distinguish between the green and the blue

58
00:03:37,350 --> 00:03:37,860
class.

59
00:03:38,340 --> 00:03:44,700
The model basically would put a plane or a decision boundary that separates these two classes.

60
00:03:45,270 --> 00:03:48,870
So let's look at Model E model is quite squiggly.

61
00:03:48,870 --> 00:03:51,660
You can see it's very good at this here.

62
00:03:51,750 --> 00:03:57,750
Model B is a little more smoother and still is very good, but it misses this class here and model sees

63
00:03:57,750 --> 00:04:00,450
as a straight line here and misses a bunch of these classes here.

64
00:04:01,020 --> 00:04:02,880
So which model would you say is best?

65
00:04:03,360 --> 00:04:06,630
Well, think about it and I'll reveal the answer the next slide.

66
00:04:07,410 --> 00:04:12,270
So this model to middle model is the best performing model, and I'll tell you why.

67
00:04:12,780 --> 00:04:19,980
Firstly, this model is Google Fit because as you can see, Green Dot said move close to here would

68
00:04:19,980 --> 00:04:24,870
immediately be classed as blue when in reality, it probably should be green because you can see it

69
00:04:24,870 --> 00:04:27,630
as like a natural decision line boundary right here.

70
00:04:28,200 --> 00:04:30,120
This blue class can be an anomaly.

71
00:04:30,330 --> 00:04:33,110
We don't want to be predicting four to four anomalies.

72
00:04:33,120 --> 00:04:38,550
We don't want to consider these anomalies as having a big impact on our model because that affects our

73
00:04:38,550 --> 00:04:39,490
generalization.

74
00:04:39,510 --> 00:04:46,050
So we basically want our model to pull back a bit on the screen, Ines, or the basically the sensitivity

75
00:04:46,290 --> 00:04:48,170
to the T of this decision boundary.

76
00:04:48,870 --> 00:04:50,820
And we want this to be nice and smooth.

77
00:04:50,940 --> 00:04:53,460
This gives us good, real world performance.

78
00:04:54,090 --> 00:04:58,890
An undefeated model basically is as a straight line here, and it doesn't take a lot of things into

79
00:04:58,890 --> 00:04:59,580
consideration.

80
00:05:00,040 --> 00:05:03,910
It happens if your model is too shallow or you're using a simple model.

81
00:05:04,680 --> 00:05:11,470
In that case, and this basically describes this year high variance, meaning it's a high and squiggly

82
00:05:11,470 --> 00:05:18,130
ness or variability, and it fits the noises like, as I described and this one has low variance, this

83
00:05:18,130 --> 00:05:19,810
one's a good balance between them.

84
00:05:21,100 --> 00:05:23,350
So let's look at this shot.

85
00:05:24,340 --> 00:05:25,630
What does this tell telling us?

86
00:05:25,720 --> 00:05:27,490
Let's take a look at the legend here.

87
00:05:28,030 --> 00:05:36,520
You have Teslas, which seems a bit unbalanced and going up as the eBucks go, and then you have test

88
00:05:36,520 --> 00:05:41,200
accuracy, which again seems to be hovering below 50 percent, which isn't great.

89
00:05:41,650 --> 00:05:48,390
But look at the trading analysis, the trading lines trading accuracy is close to 100 percent.

90
00:05:48,390 --> 00:05:53,640
In fact, it seems like it reaches a hundred percent here, and trading loss is quite low.

91
00:05:53,660 --> 00:05:55,210
It goes almost all the way down to zero.

92
00:05:55,870 --> 00:06:00,340
This, I mean, this probably wouldn't happen in the real world with most datasets and most models,

93
00:06:00,760 --> 00:06:04,720
but it's basically an embellishment of what can happen in training.

94
00:06:04,960 --> 00:06:10,540
When you're hitting a model here, you can see we have excellent, excellent performance on the training

95
00:06:10,540 --> 00:06:17,950
dataset, but pretty poor performance on the test dataset or losses going up as ebooks go in and our

96
00:06:17,950 --> 00:06:20,650
test accuracy is below 50 percent, which isn't good.

97
00:06:21,010 --> 00:06:22,960
It's basically it was higher when we started.

98
00:06:23,770 --> 00:06:30,700
So you can see that these basically are the good values here and these are the bad values here.

99
00:06:31,000 --> 00:06:33,070
We don't want this to be the opposite.

100
00:06:33,340 --> 00:06:39,430
We want our training, our test loss here to be going down just like training loss.

101
00:06:39,760 --> 00:06:43,780
And we want our test accuracy to be going up just like our training accuracy.

102
00:06:44,080 --> 00:06:44,950
When is it going up?

103
00:06:44,950 --> 00:06:49,210
I mean, going up to eBucks as we train for a number of epochs, more ebooks.

104
00:06:49,600 --> 00:06:52,780
We want all of these metrics to go either up or down accordingly.

105
00:06:53,500 --> 00:06:55,810
So this is an example of overfitting.

106
00:06:56,590 --> 00:06:58,720
So how do we avoid overfitting?

107
00:06:59,320 --> 00:07:03,550
Well, generally overfitting is best done by a combination of methods.

108
00:07:04,000 --> 00:07:07,030
Using a large data sets and reducing complexity is one way.

109
00:07:07,330 --> 00:07:13,300
However, regularization, which is a broad category of techniques where we can truly model complexity

110
00:07:13,660 --> 00:07:16,960
and ensure the model is using the right features to classify the objects.

111
00:07:17,500 --> 00:07:19,600
So let's take a look at this example here.

112
00:07:20,680 --> 00:07:22,390
It's a dog on model classifier.

113
00:07:22,960 --> 00:07:28,720
So we'd like to classify to look at the overall shape of the dog, which means its tail nose is moat

114
00:07:29,170 --> 00:07:36,190
and important features like the overall body type and not to identify dogs and not associated with things

115
00:07:36,190 --> 00:07:42,010
such as which are commonly found in images such as grass or trees and other foot model may predict that

116
00:07:42,010 --> 00:07:48,550
this grass is here has a dog because it sort of see saw grass in all these dog images before.

117
00:07:48,940 --> 00:07:50,890
That's basically a bad model.

118
00:07:52,060 --> 00:07:54,730
And this is a funny example of how to confuse machine learning.

119
00:07:55,060 --> 00:07:57,010
Posted by Dr. Julia Show on Twitter.

120
00:07:57,940 --> 00:08:04,480
This image shows you that this labradoodle this this classifier had a lot of trouble predicting what

121
00:08:04,480 --> 00:08:08,920
was a labradoodle or what was fried chicken with some squeaks, some ice on it.

122
00:08:09,730 --> 00:08:12,340
So this is pretty funny, in my opinion.

123
00:08:13,080 --> 00:08:16,000
Basically, again, this fool is machine learning computer vision algorithms.

124
00:08:16,000 --> 00:08:21,310
These very deep models by just putting some little eyes on Friday looking at anything that it's a dog.

125
00:08:21,730 --> 00:08:26,110
So things to note, things to know about computer vision models.

126
00:08:26,110 --> 00:08:27,320
They're not infallible.

127
00:08:27,730 --> 00:08:30,700
They do have a lot of issues still, and they can be defeated.

128
00:08:31,120 --> 00:08:36,310
But generally, generally, they're quite good at most image recognition tests.

129
00:08:36,910 --> 00:08:42,910
So next, we'll take a look at the regularization methods that help generalization of all models.

130
00:08:43,330 --> 00:08:45,880
There are quite a few methods, so stay tuned for this lesson.

131
00:08:46,030 --> 00:08:46,480
Thank you.