1
00:00:00,050 --> 00:00:06,680
Let's consider this example or this sample SAT prep um exam.

2
00:00:06,680 --> 00:00:14,990
So let's suppose that you get into the exam hall and you realize that out of this, um, eight questions

3
00:00:14,990 --> 00:00:18,380
which have been asked, four were already treated in class.

4
00:00:18,380 --> 00:00:27,710
What this means is that this exam paper is not very great, or will not do a great job at evaluating

5
00:00:27,710 --> 00:00:35,450
your performance as you've already, um, had to treat this, um, four questions in class.

6
00:00:35,510 --> 00:00:39,860
Actually, your performance should be evaluated on this.

7
00:00:39,860 --> 00:00:41,330
Four other questions.

8
00:00:41,330 --> 00:00:47,960
Since you you've not, um, yet come across them in class or you've not yet come across them before.

9
00:00:47,960 --> 00:00:56,180
And so after the exams, you would have, um, all the students getting a score of greater than, um,

10
00:00:56,180 --> 00:01:00,290
let's get back greater than four on eight.

11
00:01:00,560 --> 00:01:04,220
Or we could say, um, greater than, um, 50%.

12
00:01:04,220 --> 00:01:10,970
So everybody in class is going to have, um, a score greater than 50% because the questions have already

13
00:01:10,970 --> 00:01:14,180
been treated and everyone has revised these.

14
00:01:14,180 --> 00:01:21,080
And so at the end of the evaluation, the instructor might think that all the students have grasped

15
00:01:21,080 --> 00:01:24,170
the different calculus concepts, whereas that's not true.

16
00:01:24,170 --> 00:01:27,890
They have just reproduced what they were already trained on.

17
00:01:27,890 --> 00:01:31,160
That's what they already revised in class.

18
00:01:31,160 --> 00:01:40,220
And so to avoid this kind of scenario with our model training, where we actually evaluate our model

19
00:01:40,220 --> 00:01:48,260
on what is already seen, like in this case, well, we say, for example, we train on this data,

20
00:01:48,260 --> 00:01:52,310
we train on this data, and then we evaluate on this same data.

21
00:01:52,310 --> 00:01:55,790
And we have say a root mean square error of ten.

22
00:01:55,820 --> 00:02:03,260
Then that would not be a correct way of evaluating the model because um, it's already seen this data.

23
00:02:03,260 --> 00:02:08,390
So generally what we want to do is split our data up into the train set.

24
00:02:08,390 --> 00:02:13,100
And then, uh, a validation set or let's say a train set and a test set.

25
00:02:13,100 --> 00:02:15,050
Let's keep this one aside for now.

26
00:02:15,350 --> 00:02:17,720
So we have our train.

27
00:02:17,720 --> 00:02:22,190
So let's say we have this part here this first five.

28
00:02:22,190 --> 00:02:27,560
We use this for training and then this other three we use this for testing.

29
00:02:27,560 --> 00:02:28,820
So this is training.

30
00:02:28,820 --> 00:02:30,800
And then here we have testing.

31
00:02:30,800 --> 00:02:39,110
So what happens now is the model uh after being trained on this is now going to be evaluated on data

32
00:02:39,110 --> 00:02:40,640
it has not seen.

33
00:02:40,640 --> 00:02:48,170
Now the whole point of doing this is we want to be sure that the model is able to predict the right

34
00:02:48,170 --> 00:02:50,780
price when given the inputs.

35
00:02:50,780 --> 00:02:58,520
And so if we go ahead, um, or if we go around, um, testing or evaluating on the data it has already

36
00:02:58,520 --> 00:03:03,410
been trained on, then we may overestimate our model's performance.

37
00:03:03,800 --> 00:03:09,800
And so breaking this up like this makes sense because now we are testing on data the model has never

38
00:03:09,800 --> 00:03:10,340
seen.

39
00:03:10,610 --> 00:03:18,560
Now apart from breaking this up into these two parts, we also, um, might decide to break it up actually

40
00:03:18,560 --> 00:03:19,820
into three parts.

41
00:03:19,820 --> 00:03:22,610
That is the train validation and testing.

42
00:03:22,610 --> 00:03:26,630
So now we could say this first four will be for training.

43
00:03:26,630 --> 00:03:35,240
So this first four for training, the next two for our validation and then the last two for testing.

44
00:03:35,330 --> 00:03:41,180
The reason we shall do this is because we want that during the training process.

45
00:03:41,180 --> 00:03:46,310
We have an idea of how the model performs on data.

46
00:03:46,310 --> 00:03:53,150
It has not seen as if we know how the model is performing on data it is yet to see.

47
00:03:53,150 --> 00:04:00,470
Then we could make modifications at the level of the model, or even, um, a level of the optimizers

48
00:04:00,530 --> 00:04:09,470
to see how we could, um, ameliorate our model's performance before we go ahead and test our model

49
00:04:09,470 --> 00:04:10,970
on the test data set.

50
00:04:11,000 --> 00:04:18,590
Now, in our case that we have a thousand samples, so we have 1000 samples, then we want to break

51
00:04:18,590 --> 00:04:23,720
this up such that we have a considerable number of validation samples.

52
00:04:23,720 --> 00:04:26,570
Because the number of samples we have two matters.

53
00:04:26,570 --> 00:04:34,580
As, um, if we have a very small number of validation samples, then we may not be able to properly

54
00:04:34,580 --> 00:04:38,090
evaluate how our model performs on data it's yet to see.

55
00:04:38,090 --> 00:04:45,170
So in this kind of, uh, or in this kind of scenario, we could say we could take 60% for training,

56
00:04:45,170 --> 00:04:50,540
we could take 20, um, for validation, and then we could take 20 for testing.

57
00:04:50,540 --> 00:04:53,030
We could also say we we take 70.

58
00:04:53,060 --> 00:04:56,870
We could say to take 70, uh, we could take 20.

59
00:04:57,110 --> 00:04:59,750
And then here we could take ten.

60
00:05:00,330 --> 00:05:07,710
But the point is, we want to have a pretty accurate idea of how the model performs on data.

61
00:05:07,710 --> 00:05:08,970
It's not yet seen.

62
00:05:09,000 --> 00:05:16,680
Now, if we have if we change this and now instead of 1000, we have say 10 million samples.

63
00:05:16,680 --> 00:05:20,610
So instead of 1000 we have 10 million samples.

64
00:05:20,610 --> 00:05:30,690
Then in that case we could, um, decide to say, okay, we want to have 90% for training, um, 10%

65
00:05:30,690 --> 00:05:34,800
for validation and then another 10% for testing.

66
00:05:34,800 --> 00:05:42,300
And the simple reason here is because we have so many samples, this 10% of those, um, 10 million

67
00:05:42,300 --> 00:05:46,980
samples is enough for us to evaluate how well the model performs on data.

68
00:05:47,070 --> 00:05:53,850
It's not yet seen, because the idea is to ensure that our model, um, is evaluated on data.

69
00:05:53,850 --> 00:05:55,170
It's not yet seen.

70
00:05:55,350 --> 00:06:02,160
Um, operations like the normalization shouldn't involve the validation and testing as that will be,

71
00:06:02,160 --> 00:06:03,360
um, data leakage.

72
00:06:03,360 --> 00:06:10,410
So if, for example, you want to normalize your data and then you take x minus um mean divided by the

73
00:06:10,410 --> 00:06:11,580
standard deviation.

74
00:06:11,580 --> 00:06:18,630
And then this mean is the average of all the data that's even including the validation and test.

75
00:06:18,630 --> 00:06:25,530
Then you would have leaked some information from the validation and testing into the training data set.

76
00:06:25,530 --> 00:06:33,150
And so to avoid this, um, operations like normalization are carried out only here, uh, in our train

77
00:06:33,150 --> 00:06:37,470
set such that we do not have anything to do with the validation and testing.

78
00:06:37,470 --> 00:06:39,030
So these are completely separate.

79
00:06:39,030 --> 00:06:43,260
And then from here we train our model, we obtain our trained model.

80
00:06:43,260 --> 00:06:46,290
And then now we evaluate on the validation and test set.

81
00:06:46,290 --> 00:06:48,420
Now that said we get back to the code.

82
00:06:48,420 --> 00:06:51,030
We dive into data preparation.

83
00:06:51,030 --> 00:06:57,870
So here we have our data which we were uh which we had prepared, yet we had X which is the full data

84
00:06:57,870 --> 00:06:58,410
set.

85
00:06:58,410 --> 00:07:03,330
And then before normalizing let's get let's create the cell below.

86
00:07:03,330 --> 00:07:06,480
And we we have now our train ratio.

87
00:07:06,480 --> 00:07:11,490
So let's say we have our train ratio which we will set to 0.8.

88
00:07:11,940 --> 00:07:12,690
Take that off.

89
00:07:12,690 --> 00:07:17,250
We have the validation ratio which will set to 0.1.

90
00:07:17,250 --> 00:07:22,140
And then we have the test ratio which will also set to 0.1.

91
00:07:22,140 --> 00:07:24,750
So we have validation training and testing.

92
00:07:24,750 --> 00:07:27,570
And then we have our data set size.

93
00:07:27,570 --> 00:07:32,610
Data set size which is simply the length of x which um obviously is a thousand.

94
00:07:32,610 --> 00:07:38,010
You could print that out and check that it's uh, and verify that it's 1000.

95
00:07:38,010 --> 00:07:39,120
So that's it.

96
00:07:39,120 --> 00:07:43,620
We have train ratio, validation ratio and test ratio.

97
00:07:43,620 --> 00:07:46,530
We are going to dive into splitting up our data.

98
00:07:46,530 --> 00:07:52,650
So we have x train x x arm underscore train.

99
00:07:53,310 --> 00:07:54,210
There we go.

100
00:07:54,210 --> 00:07:55,200
We have x.

101
00:07:55,200 --> 00:08:02,220
And then we take all the values from zero right up to 0.8 times the length of the data set.

102
00:08:02,220 --> 00:08:10,020
Because if we if we have 0.8, if we have the split 0.8, 0.1 and 0.1, then what this means is here

103
00:08:10,020 --> 00:08:12,960
we have 800, here we have 100.

104
00:08:12,960 --> 00:08:15,120
And then here we would have 100 samples.

105
00:08:15,120 --> 00:08:18,960
So we go from zero to um actually 800.

106
00:08:18,960 --> 00:08:21,000
Now since it's zero we could take this off.

107
00:08:21,000 --> 00:08:30,090
And then we have int um data set data set size which is 1000 times um 0.8.

108
00:08:30,090 --> 00:08:32,610
So that is times the train ratio.

109
00:08:32,820 --> 00:08:35,280
So 0.8 times 1800.

110
00:08:35,280 --> 00:08:36,000
And that's it.

111
00:08:36,000 --> 00:08:38,730
And then now we we have the Y.

112
00:08:38,730 --> 00:08:42,930
So we repeat the same for the Y we have y.

113
00:08:42,930 --> 00:08:45,600
And then take this off we have y.

114
00:08:45,600 --> 00:08:46,290
So that's it.

115
00:08:46,290 --> 00:08:48,360
We have our train ratio.

116
00:08:48,360 --> 00:08:50,910
Um well we have y train and X train.

117
00:08:51,060 --> 00:09:00,390
And now we could print out extreme X train um shape and then Y train shape.

118
00:09:00,570 --> 00:09:01,440
There we go.

119
00:09:01,440 --> 00:09:02,550
So we run that.

120
00:09:02,550 --> 00:09:04,230
We create a new cell below.

121
00:09:04,740 --> 00:09:07,290
And then we do the same for the validation.

122
00:09:07,290 --> 00:09:08,490
So let's copy this.

123
00:09:08,490 --> 00:09:09,840
And there we go.

124
00:09:09,840 --> 00:09:12,120
Now note that for the validation we are not starting from zero.

125
00:09:12,120 --> 00:09:14,160
We're starting from this um 800.

126
00:09:14,160 --> 00:09:16,170
We go from 800 to 900.

127
00:09:16,170 --> 00:09:18,630
So in this case we have um this 800.

128
00:09:18,720 --> 00:09:21,990
And then we go, let's copy all this again.

129
00:09:21,990 --> 00:09:24,420
So this is 800.

130
00:09:24,720 --> 00:09:26,790
This is 800.

131
00:09:26,790 --> 00:09:30,150
But we add the vowel ratio.

132
00:09:30,150 --> 00:09:34,230
So vowel underscore ratio close that.

133
00:09:34,230 --> 00:09:39,090
So this is now 0.8 plus 0.1 which is 0.9.

134
00:09:39,090 --> 00:09:42,150
So we go from 0.8 to 0.9.

135
00:09:42,150 --> 00:09:43,740
And then we repeat the same.

136
00:09:43,740 --> 00:09:45,090
So let's just copy all this.

137
00:09:45,090 --> 00:09:48,600
And we repeat that for the training for the Y.

138
00:09:48,600 --> 00:09:50,370
So let's copy this.

139
00:09:51,000 --> 00:09:53,940
Um take this off and there we go.

140
00:09:53,940 --> 00:09:57,690
We have take that off and then we have that now for Y.

141
00:09:57,690 --> 00:09:59,130
So let's print out x.

142
00:09:59,130 --> 00:09:59,610
Um.

143
00:10:00,020 --> 00:10:02,330
Train ship, or rather exile ship.

144
00:10:02,330 --> 00:10:03,350
Let's get back here.

145
00:10:03,350 --> 00:10:17,000
We have x vowel, x vowel, and then y vowel, and then x vowel ship and y vowel ship.

146
00:10:17,270 --> 00:10:20,210
Then we do again the same for the testing.

147
00:10:20,210 --> 00:10:23,870
So let's copy this and then print that out.

148
00:10:23,870 --> 00:10:25,970
So you notice how we have 800 here.

149
00:10:25,970 --> 00:10:28,070
And then here we have 100 samples.

150
00:10:28,070 --> 00:10:30,560
So let's go ahead and do the same for the testing.

151
00:10:30,560 --> 00:10:31,430
Take this off.

152
00:10:31,430 --> 00:10:32,480
We have test.

153
00:10:32,720 --> 00:10:36,320
We have test we have test.

154
00:10:36,320 --> 00:10:38,630
And then now we're going from 900 to the end.

155
00:10:38,630 --> 00:10:41,150
So let's just take all this off.

156
00:10:42,400 --> 00:10:43,330
That's fine.

157
00:10:43,330 --> 00:10:47,710
Take all this off and then we go right up to the end.

158
00:10:47,740 --> 00:10:49,210
So run that again.

159
00:10:49,210 --> 00:10:51,490
And you see we have 100 samples.

160
00:10:51,490 --> 00:10:58,660
So that's how we split this up simply now the normalizer is going to be adapted to um on the train set.

161
00:10:58,660 --> 00:11:00,040
So we adapt that to the train.

162
00:11:00,040 --> 00:11:05,860
So the mean the standard deviation is going to be gotten from the train data and not from the full data

163
00:11:05,860 --> 00:11:06,490
set.

164
00:11:06,550 --> 00:11:09,190
Here we have normalizer train.

165
00:11:09,280 --> 00:11:11,830
So train there we go.

166
00:11:11,830 --> 00:11:14,650
So now we have set our we have our new normalizer.

167
00:11:14,830 --> 00:11:16,990
You see we get back to the model.

168
00:11:17,470 --> 00:11:19,300
We it's going to be the same.

169
00:11:19,300 --> 00:11:20,170
Nothing's going to change.

170
00:11:20,170 --> 00:11:27,010
But uh when we get to our model or rather when we want to train our model, we'll specify X train and

171
00:11:27,010 --> 00:11:31,060
then Y train, and then we'll say here we have validation data.

172
00:11:31,060 --> 00:11:33,190
So we specify that we have validation data.

173
00:11:33,190 --> 00:11:38,200
And then we'll give it um x val and then y underscore val.

174
00:11:38,200 --> 00:11:39,340
So that's it.

175
00:11:39,340 --> 00:11:43,900
Let's um put this way so it's more clearer.

176
00:11:43,900 --> 00:11:48,220
And we have this comma and we run that you see with training.

177
00:11:48,220 --> 00:11:50,500
And you could um see we have loss.

178
00:11:50,500 --> 00:11:52,240
We have root mean square error.

179
00:11:52,240 --> 00:11:54,790
Um, we have validation loss.

180
00:11:54,790 --> 00:11:58,150
And then we have root mean square error or we have validation root mean square error.

181
00:11:58,150 --> 00:12:02,950
So once while you are training you have an idea of what's going on.

182
00:12:02,950 --> 00:12:06,160
You don't need to stop finish training before testing you.

183
00:12:06,160 --> 00:12:09,190
While you're training you, you you get your data validated.

184
00:12:09,190 --> 00:12:14,260
And so you could even stop the training after, let's say, five epochs depending on how the model is

185
00:12:14,260 --> 00:12:14,830
performing.

186
00:12:14,830 --> 00:12:20,860
And then you could go and modify some parameters or change the model overall.

187
00:12:20,860 --> 00:12:22,000
So that's it.

188
00:12:22,000 --> 00:12:26,620
That's how we carry out um, validation in TensorFlow very easily.

189
00:12:26,620 --> 00:12:28,840
Now let's get back to documentation.

190
00:12:28,900 --> 00:12:31,090
Come here Keras models.

191
00:12:31,090 --> 00:12:34,060
And then we'll check out the fit method.

192
00:12:34,270 --> 00:12:36,010
We have this fit method.

193
00:12:36,010 --> 00:12:41,230
And you see here that apart from the validation data or by apart from specifying the validation data,

194
00:12:41,230 --> 00:12:43,390
you could also specify the validation split.

195
00:12:43,390 --> 00:12:49,330
So you could have um, all your data set and then decide that your validation split is going to be say

196
00:12:49,330 --> 00:12:50,620
0.2.

197
00:12:50,620 --> 00:12:57,670
So let's say if you took in all x here, if you train it on all x and then you have y, you could instead

198
00:12:57,670 --> 00:13:00,730
of having validation data you could just say validation split.

199
00:13:00,730 --> 00:13:04,420
So so you don't have to split the the data up manually.

200
00:13:04,420 --> 00:13:07,030
So you could just do this and that will be fine.

201
00:13:07,030 --> 00:13:11,200
And then obviously you specify let's say 0.2 and that's it.

202
00:13:11,200 --> 00:13:15,280
So you run that and you see that it still works just fine.

203
00:13:15,280 --> 00:13:17,500
Now let's go ahead and modify the plot.

204
00:13:17,500 --> 00:13:21,070
So such that we have both the validation and training plots.

205
00:13:21,070 --> 00:13:23,380
So yeah, we have this plot.

206
00:13:23,950 --> 00:13:25,480
We'll have our second plot.

207
00:13:25,480 --> 00:13:26,290
Take that off.

208
00:13:26,290 --> 00:13:28,600
And now we'll specify that as a validation.

209
00:13:28,600 --> 00:13:32,980
So we have val loss um y label loss and epochs.

210
00:13:32,980 --> 00:13:37,060
Then for the legend we have here val loss.

211
00:13:37,060 --> 00:13:40,000
So we add that and then we do the same for the root mean square.

212
00:13:40,000 --> 00:13:42,670
So we just add this up here.

213
00:13:43,440 --> 00:13:44,370
Piece it out.

214
00:13:44,370 --> 00:13:52,380
We have vowel root mean square, and then we also have um, vowel.

215
00:13:53,160 --> 00:13:57,990
Let's take this off vowel root mean squared.

216
00:13:57,990 --> 00:13:59,730
So run that again.

217
00:13:59,940 --> 00:14:08,430
And you see from here that the model or the training performance is better than the validation performance,

218
00:14:08,430 --> 00:14:14,700
as we have a lower value for the train loss as compared to that of the validation loss.

219
00:14:14,970 --> 00:14:17,700
That's the same with the root mean square error.

220
00:14:17,700 --> 00:14:24,810
And then now if you want to evaluate your model instead of putting all x, you could say val just on

221
00:14:24,810 --> 00:14:25,620
the validation.

222
00:14:25,620 --> 00:14:29,580
So you now evaluate that model only on validation data, you see.

223
00:14:29,580 --> 00:14:33,750
And then you could also decide to evaluate the model only on the test data.

224
00:14:33,750 --> 00:14:35,250
So let's take this off.

225
00:14:35,250 --> 00:14:41,220
And we have um test data run that and there we go.

226
00:14:41,220 --> 00:14:46,560
So you see it performs differently in the validation and test data splits.

227
00:14:46,560 --> 00:14:48,570
After evaluating the model.

228
00:14:48,570 --> 00:14:53,760
The next step will be to make use of this model to carry out predictions.

229
00:14:53,760 --> 00:14:56,790
So let's say we have for example X test.

230
00:14:56,790 --> 00:15:01,410
Let's take a simple uh a single sample X test zero.

231
00:15:01,410 --> 00:15:06,300
And then we you see we have this eight different elements and it's of shape eight.

232
00:15:06,300 --> 00:15:11,610
So we would um now take our model and do model.predict.

233
00:15:11,610 --> 00:15:16,050
That's the predict method we shall be using and then pass in this X test.

234
00:15:16,050 --> 00:15:20,160
So we have x test and run that and then see what we get.

235
00:15:20,160 --> 00:15:21,870
Let's say this is y output.

236
00:15:22,110 --> 00:15:24,030
Here we have y output.

237
00:15:24,030 --> 00:15:27,450
You see we have this um output right here.

238
00:15:27,450 --> 00:15:32,910
If we print out the y output here we have y output.

239
00:15:33,030 --> 00:15:36,480
You, we have something like 2748.

240
00:15:36,480 --> 00:15:42,960
Actually this has been gotten like this because I retrained again for over 1000 epochs.

241
00:15:42,960 --> 00:15:45,360
So the model got much better.

242
00:15:45,750 --> 00:15:48,030
Um, anyways, that's what we have.

243
00:15:48,030 --> 00:15:50,520
We now dive to the next part.

244
00:15:50,520 --> 00:15:58,860
In this next part, we shall be placing the model's predictions and the actual, um, prices or the

245
00:15:58,860 --> 00:16:01,470
actual expected prices side by side.

246
00:16:01,470 --> 00:16:07,260
So we'll do something like this for each and every sample we will have, uh, we'll plot out what the

247
00:16:07,260 --> 00:16:09,570
model predicts and then what was expected.

248
00:16:09,570 --> 00:16:13,380
And then we'll repeat for all the rest.

249
00:16:13,380 --> 00:16:15,870
Again, here we shall be making use of matplotlib.

250
00:16:15,870 --> 00:16:20,340
So we'll define our figure and then specify the figure size.

251
00:16:20,340 --> 00:16:24,360
We have 40 by 20.

252
00:16:24,930 --> 00:16:29,280
And then we shall um specify the width of each bar.

253
00:16:29,280 --> 00:16:32,430
So let's take um 0.1.

254
00:16:32,430 --> 00:16:34,590
And then we have plot bar.

255
00:16:34,590 --> 00:16:37,560
And then here we have a range of values.

256
00:16:37,560 --> 00:16:46,020
So let's let's just specify um well let's just put that in here TensorFlow range 100 numpy.

257
00:16:46,470 --> 00:16:51,030
And then we have what the model predicts or y predicted.

258
00:16:51,030 --> 00:16:52,620
What the what the model predicts.

259
00:16:52,620 --> 00:16:55,440
We specify the width and we have the label.

260
00:16:55,440 --> 00:16:59,820
So for our label here we have um model prediction.

261
00:16:59,820 --> 00:17:01,380
Model prediction.

262
00:17:02,010 --> 00:17:05,040
Let's take this off uh copy and paste that.

263
00:17:05,040 --> 00:17:06,990
And then now again we repeat the same.

264
00:17:06,990 --> 00:17:11,250
Because this, this 100 is picked because we have 100 test elements.

265
00:17:11,250 --> 00:17:13,470
So we could just say length of test set.

266
00:17:13,470 --> 00:17:14,280
So that's it.

267
00:17:14,280 --> 00:17:15,510
We have what the model predicts.

268
00:17:15,510 --> 00:17:18,930
And then we also have um we have the actual values.

269
00:17:18,930 --> 00:17:21,270
So we'll call this y true or y.

270
00:17:21,420 --> 00:17:22,650
We call this y actual.

271
00:17:22,650 --> 00:17:25,050
Anyways here we have y true.

272
00:17:25,050 --> 00:17:26,010
Keep that short.

273
00:17:26,280 --> 00:17:27,810
Um we have the same width.

274
00:17:27,810 --> 00:17:36,750
And then we have our label now for each and every um, for, for this range we have here because let's,

275
00:17:36,750 --> 00:17:40,140
let's, let's copy this and, and paste our year around that.

276
00:17:40,770 --> 00:17:48,090
As we're saying for this range of values, we want that each time we, we, we plot out the y predicted

277
00:17:48,090 --> 00:17:54,150
to the right, we should plot out the actual, um, values or maybe to the left.

278
00:17:54,150 --> 00:17:55,800
But anyway, we want that shift.

279
00:17:55,800 --> 00:17:58,860
And so what we'll do is we will add a width.

280
00:17:58,860 --> 00:18:01,380
So we'll add a width to this values.

281
00:18:01,380 --> 00:18:08,070
So you see that if you add plus plus a width you find out instead of 0 or 1 we will have 0.1 and then

282
00:18:08,070 --> 00:18:08,400
one.

283
00:18:08,400 --> 00:18:09,660
Let's take numpy.

284
00:18:09,960 --> 00:18:13,500
NumPy um on that.

285
00:18:13,500 --> 00:18:14,490
That should be fine.

286
00:18:14,490 --> 00:18:18,210
So here we have uh width is not defined or okay.

287
00:18:18,210 --> 00:18:19,050
Width is not defined.

288
00:18:19,050 --> 00:18:21,210
Let's say 0.10.1.

289
00:18:21,210 --> 00:18:23,370
We run that again and that should be fine.

290
00:18:23,370 --> 00:18:29,070
So while that's running we have um plot um x label.

291
00:18:29,250 --> 00:18:37,260
And then we have actual actual versus the predicted price.

292
00:18:37,470 --> 00:18:40,110
And then we have plot y label.

293
00:18:40,110 --> 00:18:42,030
We have car.

294
00:18:42,550 --> 00:18:44,980
Rice, uh, car price.

295
00:18:44,980 --> 00:18:49,690
Okay, so we have plot, um, show.

296
00:18:49,720 --> 00:18:50,680
There we go.

297
00:18:50,680 --> 00:18:51,670
Plot show.

298
00:18:51,910 --> 00:18:52,600
That's fine.

299
00:18:52,600 --> 00:18:58,930
So now the next thing we'll do is actually get this y and y true to get the y true.

300
00:18:58,930 --> 00:19:01,420
That's the actual values of the prices.

301
00:19:01,420 --> 00:19:04,330
What we could do is we we would get the y test.

302
00:19:04,330 --> 00:19:08,110
So y true y true is equal y test.

303
00:19:08,110 --> 00:19:16,660
But since y test is of shape 100 by one, what we'll do is we'll just, um, take all and the rows we'll

304
00:19:16,660 --> 00:19:22,450
take all, we'll pick all the rows and then um, and then pick this um, first column.

305
00:19:22,450 --> 00:19:23,800
So let's take that off.

306
00:19:23,800 --> 00:19:25,090
That's our y test.

307
00:19:25,090 --> 00:19:27,670
We print out y true.

308
00:19:28,330 --> 00:19:30,190
And there we go.

309
00:19:30,190 --> 00:19:31,870
So you see we have this value.

310
00:19:31,900 --> 00:19:35,200
We could take numpy take numpy.

311
00:19:35,200 --> 00:19:36,460
And that's fine.

312
00:19:36,460 --> 00:19:38,830
So we'll now have this simple list.

313
00:19:38,830 --> 00:19:46,930
So these are all the different um values for all the actual prices for the cars from for the test set.

314
00:19:46,930 --> 00:19:50,020
And then we'll do the same for the model's prediction.

315
00:19:50,020 --> 00:19:53,560
So let's copy here and then paste out.

316
00:19:53,650 --> 00:19:58,450
So for y test instead of y test now we have the model which takes in x.

317
00:19:58,450 --> 00:20:02,320
So we have this test that is we have what we know.

318
00:20:02,320 --> 00:20:03,430
This is what we know.

319
00:20:03,430 --> 00:20:06,400
And now we're trying to compare it with what the model predicts.

320
00:20:06,400 --> 00:20:10,030
So instead of y test we're taking x or big x.

321
00:20:10,030 --> 00:20:11,560
So we have big x.

322
00:20:11,890 --> 00:20:18,010
Um the model takes in all this um outputs um 100 by one.

323
00:20:18,010 --> 00:20:20,590
And then we have all and then zero.

324
00:20:20,590 --> 00:20:28,600
So let's print out y pred y predicted what the model predicts y pred.

325
00:20:28,870 --> 00:20:32,140
And then here we have that shape.

326
00:20:32,140 --> 00:20:32,980
There we go.

327
00:20:32,980 --> 00:20:35,380
Let's run that and see what we get.

328
00:20:35,380 --> 00:20:36,640
You see we have 100.

329
00:20:36,640 --> 00:20:40,900
Let's just let's print all this out and let's see numpy.

330
00:20:41,440 --> 00:20:42,430
NumPy.

331
00:20:42,430 --> 00:20:44,830
And then here we have numpy.

332
00:20:44,890 --> 00:20:46,900
So we could now take this off.

333
00:20:46,900 --> 00:20:49,900
So we now have our y true and our y pred.

334
00:20:49,900 --> 00:20:52,690
So this is what the the model predicts.

335
00:20:52,690 --> 00:20:57,970
And then um just below we have what the model or what was supposed to be predicted.

336
00:20:57,970 --> 00:20:59,470
So now we have all this.

337
00:20:59,470 --> 00:21:01,210
Let's run this and then see what we get.

338
00:21:01,420 --> 00:21:04,990
Let's take this off and we can now look at our bar chart.

339
00:21:04,990 --> 00:21:09,370
You notice that, uh, we have one in blue and one in, um, orange.

340
00:21:09,370 --> 00:21:11,800
Now, this here just looking at the values.

341
00:21:11,800 --> 00:21:13,780
You see that the there's one, there's the y.

342
00:21:13,780 --> 00:21:13,990
True.

343
00:21:14,020 --> 00:21:20,350
The actual Y's, um, seem to be much larger than the what the model predicts.

344
00:21:20,350 --> 00:21:24,070
So our model is, is clearly underperforming.

345
00:21:24,070 --> 00:21:26,020
So you could see that from this chart.

346
00:21:26,050 --> 00:21:27,460
See here in blue.

347
00:21:27,460 --> 00:21:30,430
Let's let's maybe zoom in so you could see that clearer.

348
00:21:31,210 --> 00:21:34,690
Uh you see take that up okay.

349
00:21:34,690 --> 00:21:38,950
So you see just here in blue you have the what the model predicts.

350
00:21:38,950 --> 00:21:43,120
And then in orange you have, um, what was expected of the model.

351
00:21:43,600 --> 00:21:44,770
Let's get back.

352
00:21:44,950 --> 00:21:46,840
And there we go.

353
00:21:46,840 --> 00:21:47,740
So that's it.

354
00:21:47,740 --> 00:21:51,460
Um, for the section, we have just evaluated and tested our model.

355
00:21:51,460 --> 00:21:56,500
The next section, we'll look at corrective measures to make our model much more performant.
