1
00:00:00,080 --> 00:00:01,970
In the section on corrective measures.

2
00:00:01,970 --> 00:00:09,920
We shall start by commenting on our loss and validation plots.

3
00:00:09,920 --> 00:00:18,830
So we saw already that our model was underperforming, and looking at the loss versus epoch plot, we

4
00:00:18,830 --> 00:00:25,850
could see clearly that our loss values were much higher than what we could potentially get.

5
00:00:26,090 --> 00:00:31,310
And so this kind of scenario where we have both the training and the validation.

6
00:00:31,310 --> 00:00:35,180
So here we have train and then here we have validation.

7
00:00:35,180 --> 00:00:42,290
So when we have this training and validation plots um or training and validation loss values which are

8
00:00:42,290 --> 00:00:45,410
way higher than what we would expect.

9
00:00:45,410 --> 00:00:49,970
In that case we'll call this um under fitting.

10
00:00:49,970 --> 00:00:58,940
And this is because in the worst case scenario, our model at least should be able to learn from the

11
00:00:58,940 --> 00:00:59,990
training data.

12
00:00:59,990 --> 00:01:01,490
Well, here we could interchange this.

13
00:01:01,490 --> 00:01:03,950
We could have the validation and the training below.

14
00:01:03,950 --> 00:01:10,700
But the point here is that the model underperforms in both the training and validation sets.

15
00:01:10,700 --> 00:01:17,600
Now generally when we have this kind of scenario where the model underperforms even on the training

16
00:01:17,600 --> 00:01:25,220
data, this means that the model is not suited for the problem, or the model may be too simple.

17
00:01:25,220 --> 00:01:34,730
And by simple here we mean that this nine parameters m one to m eight plus c that's nine aren't enough

18
00:01:34,730 --> 00:01:40,250
to learn from our 1000 samples data set.

19
00:01:40,250 --> 00:01:48,530
That's actually to get all the patterns necessary to be able to get, um, a single sample like this

20
00:01:48,530 --> 00:01:53,930
one and then predict a reasonable price, which is close to what's expected.

21
00:01:53,930 --> 00:02:02,900
So when our model is underfitting and not able to learn these patterns from our data set, what we do

22
00:02:02,900 --> 00:02:05,270
is we build a more complex model.

23
00:02:05,270 --> 00:02:12,530
And now this more complex model will be made of many more layers, hence rendering our deep learning

24
00:02:12,530 --> 00:02:14,630
model much deeper.

25
00:02:14,630 --> 00:02:18,710
So let's take this here and just pull this up here.

26
00:02:18,710 --> 00:02:21,560
You see we have all these inputs.

27
00:02:21,560 --> 00:02:24,380
We have this weights and then we have this output.

28
00:02:24,380 --> 00:02:29,870
Then instead of just taking this as our output we're going to add this other three two outputs.

29
00:02:29,870 --> 00:02:32,510
So now we have three outputs.

30
00:02:32,510 --> 00:02:38,540
Let's pull this up here we have this one C we have this one.

31
00:02:38,540 --> 00:02:44,990
And notice how each one has each input is linked to um every other output.

32
00:02:44,990 --> 00:02:48,200
So take this here and put out here C.

33
00:02:48,200 --> 00:02:53,360
Again each and every um input here is linked to all three outputs.

34
00:02:53,360 --> 00:02:57,590
And so what this means is we are going from eight weights here.

35
00:02:57,620 --> 00:02:59,780
That's keeping aside the bias.

36
00:02:59,780 --> 00:03:04,940
We're going from eight because we had um m one to M eight now 224.

37
00:03:04,940 --> 00:03:08,360
So instead of um eight we now have 24.

38
00:03:08,360 --> 00:03:12,470
And then on this other side, what we'll do is we'll take this again.

39
00:03:12,470 --> 00:03:15,110
So let's, um, drag this here.

40
00:03:15,230 --> 00:03:16,160
There we go.

41
00:03:16,160 --> 00:03:17,480
We have this one.

42
00:03:17,480 --> 00:03:19,940
We have this one.

43
00:03:20,810 --> 00:03:22,040
There we go.

44
00:03:22,040 --> 00:03:24,500
And then we have this other one.

45
00:03:25,650 --> 00:03:27,000
Well, we also have this other one.

46
00:03:27,000 --> 00:03:28,230
So let's just pull this up here.

47
00:03:28,230 --> 00:03:29,010
And that's fine.

48
00:03:29,010 --> 00:03:32,910
Again we have all this linked to um one another.

49
00:03:32,910 --> 00:03:33,960
So you see that.

50
00:03:33,960 --> 00:03:38,250
Now we have um here and this um intermediate layer.

51
00:03:38,250 --> 00:03:40,560
We have 24 parameters.

52
00:03:40,560 --> 00:03:43,320
And in this one we have three times four.

53
00:03:43,320 --> 00:03:45,240
That's um 12 parameters.

54
00:03:45,240 --> 00:03:46,440
Here we have 12.

55
00:03:46,470 --> 00:03:47,880
Here we have 24.

56
00:03:47,880 --> 00:03:53,940
And then for this last layer we have um four times one that's three.

57
00:03:53,940 --> 00:03:55,170
So let's get back.

58
00:03:55,170 --> 00:03:58,110
Let's drag this up here and there we go.

59
00:03:58,110 --> 00:04:05,220
So you see that now we have a much deeper model with several hidden layers.

60
00:04:05,460 --> 00:04:07,170
We have now hidden layers.

61
00:04:07,170 --> 00:04:11,040
And then we have our inputs and the output.

62
00:04:11,040 --> 00:04:19,230
And one great thing about TensorFlow is that if here we had dense that's we had this dense.

63
00:04:19,230 --> 00:04:22,710
And then we placed uh, we had a single output.

64
00:04:22,710 --> 00:04:24,630
And, and this was in a list.

65
00:04:24,840 --> 00:04:29,640
Now we're going to have that same list with a dense.

66
00:04:29,730 --> 00:04:33,270
But now we have um three outputs.

67
00:04:33,270 --> 00:04:35,010
So here we specify three.

68
00:04:35,580 --> 00:04:39,000
And then for the next because we, we go from three.

69
00:04:39,000 --> 00:04:45,270
That's this here we this layer, we have three outputs for this next dense we're going to have four

70
00:04:45,270 --> 00:04:46,110
outputs.

71
00:04:46,110 --> 00:04:50,370
And then we'll end up with um a dense with one output.

72
00:04:50,370 --> 00:04:51,840
So we have four.

73
00:04:51,840 --> 00:04:56,160
And then we have a dense with just a single output.

74
00:04:56,160 --> 00:04:59,010
So here we have dense and we have one output.

75
00:04:59,010 --> 00:05:04,890
So we're going from this line of code to this other one, which doesn't change much at the level of

76
00:05:04,890 --> 00:05:06,930
the coding because we just have to add this.

77
00:05:06,930 --> 00:05:13,890
But actually our model has become much deeper by deeper here by increasing the number of layers as we

78
00:05:13,890 --> 00:05:20,880
go, um, in this right direction, we go deeper and um, our model becomes much more complex.

79
00:05:20,880 --> 00:05:28,320
And so it's able to learn, um, the patterns in our data set, much easily now as compared to this

80
00:05:28,320 --> 00:05:33,660
very simple model, which wasn't capable of learning those complex patterns in our data set.

81
00:05:33,660 --> 00:05:36,180
So far we've been dealing with just linear functions.

82
00:05:36,180 --> 00:05:42,270
So, um, essentially all what we've been doing is MX plus C, which is a linear function.

83
00:05:42,270 --> 00:05:50,010
But to increase the complexity of our model, we could add some non-linearity, which we'll call big

84
00:05:50,010 --> 00:05:50,370
N.

85
00:05:50,370 --> 00:05:58,770
So if we add a nonlinear function or if after getting our outputs here, that's for each and every neuron

86
00:05:58,770 --> 00:06:03,840
we have here, we pass this into a nonlinear function.

87
00:06:03,840 --> 00:06:06,300
Because obviously this this is a linear function here.

88
00:06:06,300 --> 00:06:12,390
If we pass it into a nonlinear function, we shall make our model overall much more complex and hence

89
00:06:12,390 --> 00:06:15,930
could learn more complex patterns in our data set.

90
00:06:15,930 --> 00:06:19,740
Some of the commonly used activation functions are.

91
00:06:19,740 --> 00:06:29,910
Nonlinear functions are the sigmoid, the tanche and the ReLU with a ReLU, for example, is just um

92
00:06:29,910 --> 00:06:39,510
y equal x if x is positive and y equal zero if x is negative and x is e x minus e minus x divided by

93
00:06:39,510 --> 00:06:44,010
x plus e minus x going from -1 to 1 for the sigmoid.

94
00:06:44,010 --> 00:06:45,180
We saw this already.

95
00:06:45,390 --> 00:06:49,650
Um, it's one on one plus e to the negative x and it goes from 0 to 1.

96
00:06:49,650 --> 00:06:53,730
So when we compute m x plus c.

97
00:06:55,290 --> 00:06:57,720
This year becomes our X.

98
00:06:57,720 --> 00:07:00,210
And then we pass this in here.

99
00:07:00,210 --> 00:07:02,040
So we pass this in here.

100
00:07:02,040 --> 00:07:04,740
And then uh we now obtain the sigmoid.

101
00:07:04,740 --> 00:07:05,970
Same as with the tangent.

102
00:07:06,000 --> 00:07:08,370
We take this x, we pass it in here.

103
00:07:08,370 --> 00:07:11,940
We pass it in here, we pass it in here, pass it in here.

104
00:07:11,940 --> 00:07:13,680
And we obtain the tangent.

105
00:07:13,680 --> 00:07:18,360
And then if it is um, the with the ReLU it's going to remain the same.

106
00:07:18,810 --> 00:07:25,470
Um, if all of this is positive and if all of this is negative, then it's going to be zero.

107
00:07:25,500 --> 00:07:26,880
Dive into the code.

108
00:07:26,880 --> 00:07:32,880
We're just going to change this and say 128 for example, we could have 128.

109
00:07:32,880 --> 00:07:34,650
We're going to stack three of this.

110
00:07:34,650 --> 00:07:36,180
And then with one output.

111
00:07:36,180 --> 00:07:38,460
So we have 128 128.

112
00:07:38,460 --> 00:07:42,360
And then our output must be one because that's the nature of our data.

113
00:07:42,360 --> 00:07:49,230
So because our data um expects to have some input which is of the shape, uh, and also the output must

114
00:07:49,230 --> 00:07:51,570
be of um, this shape.

115
00:07:51,570 --> 00:07:55,560
So if you don't have this kind of shape, then you would get an error.

116
00:07:55,560 --> 00:07:56,400
So that's it.

117
00:07:56,400 --> 00:07:58,230
We just stack this up like this.

118
00:07:58,260 --> 00:07:59,070
This is very easy.

119
00:07:59,070 --> 00:08:00,960
You could stack as many as you want.

120
00:08:00,960 --> 00:08:01,920
That's all.

121
00:08:01,920 --> 00:08:02,940
Uh, you need to do.

122
00:08:02,940 --> 00:08:04,710
So there we go.

123
00:08:04,710 --> 00:08:08,310
We have three of this, and then we run that.

124
00:08:08,310 --> 00:08:18,000
And you could see now that we have 34,304 or 305 trainable parameters, we still have our 17 non trainable

125
00:08:18,000 --> 00:08:19,770
parameters from our normalizer.

126
00:08:19,770 --> 00:08:21,300
We could reduce this.

127
00:08:21,300 --> 00:08:23,460
Let's say we take um 28.

128
00:08:24,420 --> 00:08:26,400
Or we could also have different values.

129
00:08:26,400 --> 00:08:28,470
So we could take 53 for example.

130
00:08:28,470 --> 00:08:30,900
And you see we have fewer parameters.

131
00:08:30,900 --> 00:08:35,250
Anyways let's get back to 128 128 and 128.

132
00:08:35,250 --> 00:08:36,660
And we run that.

133
00:08:36,660 --> 00:08:38,010
We have our summary.

134
00:08:38,010 --> 00:08:39,210
We check out our plot.

135
00:08:39,210 --> 00:08:44,340
You see now that it's going to be different, um, from what we had uh, previously because we've made

136
00:08:44,340 --> 00:08:45,240
some changes.

137
00:08:45,240 --> 00:08:51,150
Then we go ahead and compile the model and we start with the training.

138
00:08:51,150 --> 00:08:54,360
Let's train for about, um, 200 epochs.

139
00:08:54,360 --> 00:09:01,050
After the training is complete, you could now see that we have a much better performance as our loss

140
00:09:01,050 --> 00:09:04,350
values have, um, dropped substantially.

141
00:09:04,350 --> 00:09:06,330
You could also check out the root mean square error.

142
00:09:06,330 --> 00:09:09,810
You see, there's this great drop compared to what we had previously.

143
00:09:09,810 --> 00:09:14,730
We evaluate and then we see check out this um plots.

144
00:09:14,730 --> 00:09:21,420
So you see now that the actual and then what the model predicts are not too far from each other as compared

145
00:09:21,420 --> 00:09:23,250
to the previous model.

146
00:09:23,250 --> 00:09:28,590
Now let's go ahead and add the non-linearities and then retrain again this time around just for 100

147
00:09:28,590 --> 00:09:34,920
epochs, as we see that after even about um, 20 epochs, the model doesn't change much.

148
00:09:34,920 --> 00:09:38,670
So let's go ahead and train for 100 epochs with.

149
00:09:38,670 --> 00:09:43,950
Now the non-linearities added to other non-linearities is simply activations.

150
00:09:43,950 --> 00:09:46,410
So you have activation and then you specify ReLU.

151
00:09:46,410 --> 00:09:47,070
That's all.

152
00:09:47,070 --> 00:09:51,840
So here we have activation and there we go.

153
00:09:51,840 --> 00:09:59,970
We have activation and we have ReLU um activation and ReLU.

154
00:09:59,970 --> 00:10:06,450
Now for this last one we don't need to have the activation as our values should be um all positive.

155
00:10:06,450 --> 00:10:09,990
So we just, um, let that and that's fine.

156
00:10:09,990 --> 00:10:14,160
Let's run again and recompile and start with the training.

157
00:10:14,160 --> 00:10:16,350
Let's change this to 100.

158
00:10:16,620 --> 00:10:18,030
Um, there we go.

159
00:10:18,030 --> 00:10:22,290
And then we run all this so we could check those out after.

160
00:10:22,290 --> 00:10:30,450
Now we actually forgot to note the the the the y test values for the root mean square error.

161
00:10:30,450 --> 00:10:34,260
As you could see, we have, um, better results.

162
00:10:34,530 --> 00:10:38,520
Uh, we could go ahead and check out those, uh, charts.

163
00:10:38,520 --> 00:10:39,870
Let's run this again.

164
00:10:39,870 --> 00:10:41,340
And there we go.

165
00:10:41,340 --> 00:10:49,170
We have our charts indicating that our model now performs much better as compared to our initial model.