1
00:00:00,110 --> 00:00:05,870
Hello everyone, and welcome to the session in which we are going to treat hyper parameter tuning with

2
00:00:05,870 --> 00:00:07,070
weights and biases.

3
00:00:07,100 --> 00:00:12,910
There are three methods available for us to implement hyper parameter tuning in weights and biases.

4
00:00:12,920 --> 00:00:17,830
That is the random search, the grid search and the bias and search method.

5
00:00:17,840 --> 00:00:24,050
At the end of the session, you'll be able to search for the most suitable parameter values which optimize

6
00:00:24,050 --> 00:00:24,920
the accuracy.

7
00:00:24,950 --> 00:00:30,380
Previously we saw how to implement hyper parameter tuning with Tensorboard.

8
00:00:30,410 --> 00:00:38,510
We created this model tuning method which takes as argument h params as hyper parameters which are actually

9
00:00:38,510 --> 00:00:40,840
going to be tuned as you could see here.

10
00:00:40,850 --> 00:00:47,180
And then we have an output of this model tuning method, the accuracy.

11
00:00:47,180 --> 00:00:53,960
So what we're trying to do here is we're trying to tune or we're trying to obtain the optimal values

12
00:00:53,960 --> 00:00:59,970
for this different hyper parameters which maximize the accuracy of the model.

13
00:01:00,000 --> 00:01:05,070
The exact method we used was a grid search with a grid search method.

14
00:01:05,070 --> 00:01:12,690
We actually go through each and every option or each and every possibility, and for each possibility

15
00:01:12,690 --> 00:01:17,550
we are going to log its accuracy as we did right here.

16
00:01:17,580 --> 00:01:24,480
Now, with weights and biases, we are going to see how to redo this, Bert, in an easier and more

17
00:01:24,480 --> 00:01:25,830
reliable manner.

18
00:01:25,830 --> 00:01:28,740
So let's check out the documentation right here.

19
00:01:28,770 --> 00:01:31,410
You see, we have this documentation here.

20
00:01:31,410 --> 00:01:33,090
We have hyper parameter tuning.

21
00:01:33,090 --> 00:01:40,020
We can click on the sweeps quick start, which shows us globally what hyperparameter tuning with weights

22
00:01:40,020 --> 00:01:41,190
and biases entails.

23
00:01:41,190 --> 00:01:43,160
We set up weights and biases.

24
00:01:43,170 --> 00:01:50,730
We configure the sweep, we initialize a sweep, we launch agents, we visualize the results and then

25
00:01:50,730 --> 00:01:52,440
finally stop the agent.

26
00:01:52,440 --> 00:01:56,730
From here we will see how to run this sweeps in Jupiter.

27
00:01:56,730 --> 00:02:03,270
So the weights and biases sweeps allow you to easily try out a large number of hyper parameters while

28
00:02:03,270 --> 00:02:09,450
tracking model performance and logging all the information you need to reproduce experiments.

29
00:02:09,450 --> 00:02:17,970
Now we're going to focus on this pure Python method where the sweep configurations are in the form of

30
00:02:17,970 --> 00:02:19,560
a dictionary like this.

31
00:02:20,190 --> 00:02:27,630
The way this sweeps work or the way weights and biases sweep work is we have a central sweep server

32
00:02:27,660 --> 00:02:31,140
that is this one here, the central sweep server.

33
00:02:31,260 --> 00:02:36,180
And then we have this different agents right here.

34
00:02:36,180 --> 00:02:38,130
We can have many more agents.

35
00:02:38,130 --> 00:02:45,810
Then once we set the configurations, the sweep configurations in this central sweep server right here,

36
00:02:45,840 --> 00:02:53,760
the agents now take over and do the actual hyper parameter tuning that is the search for the best hyper

37
00:02:53,760 --> 00:03:01,620
parameters which help in optimizing the model and this parallel method of implementing sweeps in weights

38
00:03:01,620 --> 00:03:06,630
and biases helps make the hyper parameter tuning process even more efficient.

39
00:03:06,630 --> 00:03:10,620
As we could see here, this could be done in just two steps.

40
00:03:10,620 --> 00:03:18,120
We initialize the sweep and put all necessary information or give all necessary information for this

41
00:03:18,120 --> 00:03:21,750
central or to this central sweep server.

42
00:03:21,750 --> 00:03:25,800
And then we run this different agents right here.

43
00:03:25,830 --> 00:03:32,520
Now that said, we have this sweep configuration which looks similar to what we had with when we're

44
00:03:32,520 --> 00:03:34,210
working with Tensorboard.

45
00:03:34,230 --> 00:03:45,240
So here we had this configuration right here where we specified for each hyper parameter the values

46
00:03:45,480 --> 00:03:46,200
you can take.

47
00:03:46,200 --> 00:03:52,320
Given that we're having a grid search, we're making use of grid search algorithm, or we're specifying

48
00:03:52,320 --> 00:03:53,850
simply all those different values.

49
00:03:53,850 --> 00:03:54,810
And that was it.

50
00:03:54,810 --> 00:04:01,860
And now coming back to weights and biases you see here, you just need to specify, for example, the

51
00:04:01,860 --> 00:04:07,590
hyper parameter, the number of epochs give the values learning rate, for example, you give the minimum

52
00:04:07,590 --> 00:04:11,700
and maximum value and so on and so forth.

53
00:04:11,700 --> 00:04:17,730
But then note that here this as in this example, given the documentation, the method you're using

54
00:04:17,730 --> 00:04:23,430
is not the grid search that is then going through each and every possibility in the list of values.

55
00:04:23,430 --> 00:04:28,950
What you're doing here is actually a random search which happens to be more efficient than the grid

56
00:04:28,950 --> 00:04:37,260
search algorithm, since it's possible in a shorter period of time to get very optimal values of the

57
00:04:37,260 --> 00:04:38,760
hyper parameters.

58
00:04:39,860 --> 00:04:44,600
As compared to the grid search algorithm where you need to go through each and every value.

59
00:04:45,050 --> 00:04:46,140
Now let's get back.

60
00:04:46,160 --> 00:04:52,520
Here we have the name, we have the method, and we have the parameters to understand where all this

61
00:04:52,520 --> 00:04:53,180
come from.

62
00:04:53,210 --> 00:04:56,120
Check out this sweep configuration right here.

63
00:04:56,120 --> 00:05:01,160
So clicking on the sweep configuration, we have the structure of a sweep configuration.

64
00:05:01,160 --> 00:05:05,750
Then here we have the different keys and the descriptions.

65
00:05:05,780 --> 00:05:09,200
You will notice this method which we had seen here.

66
00:05:09,230 --> 00:05:12,950
Let's open this in another tab, open a new tab and have that.

67
00:05:12,950 --> 00:05:15,230
So let's have it this way.

68
00:05:15,260 --> 00:05:16,040
There we go.

69
00:05:16,040 --> 00:05:21,410
We have this name, method and parameters which we could see here.

70
00:05:21,440 --> 00:05:24,920
We have name method parameters.

71
00:05:25,130 --> 00:05:31,580
And then for each of these configuration keys, like those method, for example, there is this more

72
00:05:31,580 --> 00:05:33,320
explicit description.

73
00:05:33,320 --> 00:05:36,140
So here we have the method we're going to use.

74
00:05:36,170 --> 00:05:41,700
If it's a grid grid search iterates over all possible combinations of parameter values.

75
00:05:41,700 --> 00:05:47,460
The random search chooses a random set of values on each iteration, and weights and biases also comes

76
00:05:47,460 --> 00:05:51,060
with this other method, which is the bias method.

77
00:05:51,090 --> 00:05:57,000
This bias in hyper parameter search method uses a Gaussian process to model relationship between the

78
00:05:57,000 --> 00:06:03,090
parameters and the model metric and chooses parameters to optimize the probability of improvement.

79
00:06:03,090 --> 00:06:06,180
This strategy requires the metric key to be specified.

80
00:06:06,750 --> 00:06:13,350
So right here we see now we have this three different methods and that is why your when we wanted to

81
00:06:13,350 --> 00:06:17,010
work with the random, it suffices to just put this out here.

82
00:06:17,040 --> 00:06:20,390
Either you put it random or you put grid or you put bias.

83
00:06:20,400 --> 00:06:23,760
Now we move to the next we have the parameters.

84
00:06:23,940 --> 00:06:25,830
Oh yeah, we have the parameters.

85
00:06:25,830 --> 00:06:29,130
Anyway, the name just you could put the name for the sweep.

86
00:06:29,130 --> 00:06:30,570
That depends on you.

87
00:06:30,570 --> 00:06:37,230
Like here's my sweep, then the parameters here, it gets a little bit more tricky because here you

88
00:06:37,230 --> 00:06:45,060
have this year, this key, and then the value is a dictionary, which itself is having its own keys

89
00:06:45,060 --> 00:06:46,410
and values.

90
00:06:46,410 --> 00:06:48,570
So we have this parameters.

91
00:06:48,570 --> 00:06:51,120
Let's get back to parameters here.

92
00:06:51,120 --> 00:06:58,440
We have the parameters where we we have parameters and then we have this different possible values and

93
00:06:58,440 --> 00:06:59,550
the descriptions.

94
00:06:59,550 --> 00:07:06,090
So getting back here, you'll notice that this hyper parameter epochs takes values in this list that

95
00:07:06,090 --> 00:07:17,370
is either ten, 20 or 50, while the learning rate can be chosen between 0.0010.0001 and the maximum

96
00:07:17,370 --> 00:07:18,330
0.1.

97
00:07:18,330 --> 00:07:25,200
So each hyper parameter has its own distinct way of describing the values it can take.

98
00:07:25,200 --> 00:07:31,770
Now, getting back here we have okay, we have the different we have this parameters.

99
00:07:31,770 --> 00:07:37,350
And then for those values, you see, you can take you can say values, you can say a value.

100
00:07:37,350 --> 00:07:43,620
So sometimes, like here we may decide and say, okay, one does epoch to take only just one value,

101
00:07:43,620 --> 00:07:45,210
then we just have value.

102
00:07:45,210 --> 00:07:51,390
And then in the case where once several values you see specifies all valid values for this hyper parameter

103
00:07:51,390 --> 00:07:53,550
compatible with grid and all of that.

104
00:07:53,550 --> 00:08:01,110
Now here in some cases you have a distribution and this select a distribution from the distribution

105
00:08:01,110 --> 00:08:02,100
table below.

106
00:08:02,100 --> 00:08:07,800
So you could see here the specifies how values will be distributed if they are selected randomly, e.g.

107
00:08:07,800 --> 00:08:09,270
with a random or bias method.

108
00:08:09,270 --> 00:08:18,810
So when we when working with the random or bias methods, you may want to select your values based on

109
00:08:18,810 --> 00:08:21,750
a given distribution and here are the list of distributions you can use.

110
00:08:21,750 --> 00:08:30,540
You have constant categorical int uniform uniform distribution or uniform log uniform q log uniform,

111
00:08:30,540 --> 00:08:34,530
normal distribution or normal log normal and log normal.

112
00:08:34,800 --> 00:08:38,130
Then from this distribution we now move to Min max Min.

113
00:08:38,130 --> 00:08:42,350
Max is where you actually saw your here.

114
00:08:42,380 --> 00:08:43,640
You had this min max.

115
00:08:43,640 --> 00:08:50,180
So here you're simply saying you want your learning rate to fall or to have a minimum value of this

116
00:08:50,180 --> 00:08:51,350
and the maximum value of that.

117
00:08:51,350 --> 00:08:56,270
So we randomly pick values between this in this range and that's it.

118
00:08:56,270 --> 00:09:02,450
You have mu mu min parameter for normal or log normal distributed hyper parameters.

119
00:09:02,450 --> 00:09:08,660
So here you're having a normally distributed hyper parameters and you're specifying the mean.

120
00:09:08,660 --> 00:09:11,840
While here you're specifying the standard deviation.

121
00:09:11,870 --> 00:09:16,790
Then for cure you have the quantization step size for quantized hyper parameter.

122
00:09:17,450 --> 00:09:19,370
Our next key is the metric.

123
00:09:19,370 --> 00:09:26,360
And with the metric we have to define the name of the metric, the goal and the target.

124
00:09:26,390 --> 00:09:33,230
Now here you could have, for example, like your validation loss, you have this metric validation

125
00:09:33,230 --> 00:09:33,500
loss.

126
00:09:33,500 --> 00:09:39,230
You could say you want to optimize your parameters or you want to choose the hyper parameters.

127
00:09:39,690 --> 00:09:42,600
Which minimize the validation loss.

128
00:09:42,690 --> 00:09:47,970
Now, you could also change this into an accuracy, let's say, validation accuracy or adjust the accuracy

129
00:09:47,970 --> 00:09:49,500
or just the train accuracy.

130
00:09:49,500 --> 00:09:56,780
In that case, you want to choose the hyper parameters which maximize the accuracy and hence here at

131
00:09:56,790 --> 00:09:59,790
the level of the goal, you either minimize or maximize.

132
00:09:59,790 --> 00:10:01,530
The default is minimize.

133
00:10:01,530 --> 00:10:07,170
So if you have a validation, loss is needless specifying the goal because by default it's minimized.

134
00:10:07,170 --> 00:10:15,390
And for now, as in this documentation, there is really no automatic way of deciding whether it's minimize

135
00:10:15,390 --> 00:10:17,490
or maximize anyway.

136
00:10:17,490 --> 00:10:21,230
You can always do that manually by specifying.

137
00:10:21,240 --> 00:10:24,120
Now the next we have is the target.

138
00:10:24,120 --> 00:10:31,140
So here with the target, as you can see here, for example, 0.95, it means that if you happen to

139
00:10:31,140 --> 00:10:40,180
get a set of hyper parameters which permit you get this validation accuracy to this target value, then

140
00:10:40,180 --> 00:10:44,890
at that point you will stop the process of searching for the.

141
00:10:45,940 --> 00:10:47,980
Optimal hyper parameters.

142
00:10:48,100 --> 00:10:56,380
And what happens exactly is all agents with active runs will finish their jobs, but no new runs will

143
00:10:56,380 --> 00:11:01,420
be launched in the sweep since we've already attained the objective.

144
00:11:01,450 --> 00:11:07,900
From here we have early Terminate, which is an optional feature that's a piece of hyper parameter search.

145
00:11:07,900 --> 00:11:13,450
By stopping poorly performing runs, we get back here, copy out this code.

146
00:11:13,740 --> 00:11:14,350
Just copy it.

147
00:11:14,350 --> 00:11:15,400
Okay, copy it.

148
00:11:15,580 --> 00:11:19,390
We get back now to our hyper parameter tuning.

149
00:11:19,660 --> 00:11:25,450
We paste out the code here, and then we're just going to try to replicate what we have done already

150
00:11:25,450 --> 00:11:26,820
with Tensorboard.

151
00:11:26,830 --> 00:11:36,130
So here we paste this out here and then we have this NUM units one, we have this NUM units one this

152
00:11:36,130 --> 00:11:39,220
year and have that no minus one values.

153
00:11:39,460 --> 00:11:40,600
There we go.

154
00:11:40,600 --> 00:11:43,360
Let's copy this out and paste here.

155
00:11:43,510 --> 00:11:49,460
So we replace this values which we had already and then the next will be num units two.

156
00:11:49,490 --> 00:11:50,450
It's kind of similar.

157
00:11:50,450 --> 00:11:52,880
So we should not minus two.

158
00:11:52,880 --> 00:11:53,420
Okay.

159
00:11:53,420 --> 00:11:58,040
So here we should just copy this and paste right here.

160
00:11:58,070 --> 00:12:00,800
Num units, two values.

161
00:12:00,800 --> 00:12:02,180
Yeah, it's kind of similar.

162
00:12:02,360 --> 00:12:05,120
Okay, we have that and then there we go.

163
00:12:05,120 --> 00:12:07,910
So we have num minus one, no, minus two.

164
00:12:07,910 --> 00:12:11,840
And then the next dropout rate we're going to have learning rate.

165
00:12:11,840 --> 00:12:14,480
So let's just have the dropout rate here.

166
00:12:14,480 --> 00:12:23,060
So we have the dropout rate and then this dropout rate, we take values of 0.120.3.

167
00:12:23,090 --> 00:12:25,970
Now here, what we're going to use is like this min max.

168
00:12:25,970 --> 00:12:33,470
So let's just copy this out here and then let's copy this out and then paste it out here.

169
00:12:33,830 --> 00:12:41,300
Okay, so we have dropout rate, but here we're going to go from 0.1 to 0 point, let's say 0.4 and

170
00:12:41,300 --> 00:12:42,920
this random actually not great.

171
00:12:42,920 --> 00:12:44,000
So we have that.

172
00:12:44,000 --> 00:12:51,320
And then here, okay, name, let's say name malaria Prediction sweep.

173
00:12:52,310 --> 00:12:54,980
Okay, Method random parameters is fine.

174
00:12:54,980 --> 00:12:56,810
So we're getting each and every parameter.

175
00:12:56,810 --> 00:12:59,690
Now we have the dropout rate set.

176
00:12:59,720 --> 00:13:02,840
We now move to regularization rate.

177
00:13:02,870 --> 00:13:07,970
Regularization rate values between 0.001 and 0.1.

178
00:13:08,300 --> 00:13:09,020
Okay.

179
00:13:09,020 --> 00:13:11,840
So we have 0.010.1.

180
00:13:11,840 --> 00:13:18,140
And then here, what we could do now is we make use of, let's say, distribution and then we have your

181
00:13:18,140 --> 00:13:18,830
uniform.

182
00:13:18,830 --> 00:13:23,990
So we specify that we want to make use of a uniform distribution and then here.

183
00:13:23,990 --> 00:13:29,780
So we're going to use the same again here for the learning rate distribution.

184
00:13:30,620 --> 00:13:31,880
Um, uniform.

185
00:13:32,420 --> 00:13:33,350
There we go.

186
00:13:33,350 --> 00:13:35,690
We have that uniform and that's fine.

187
00:13:36,710 --> 00:13:46,190
But here we're going to go from this, let's say one E negative for mean and then Max one E negative

188
00:13:46,220 --> 00:13:48,520
two, one E, negative two.

189
00:13:48,520 --> 00:13:51,040
Okay, so looks good.

190
00:13:51,040 --> 00:13:52,780
Let's take this off now.

191
00:13:52,780 --> 00:13:56,320
Let's take off our set here and then we have that.

192
00:13:56,320 --> 00:14:03,070
So from this we'll be able to create now this sweep ID, we'll get our sweep ID by running this one

193
00:14:03,340 --> 00:14:07,600
sweep and passing in the sweep configuration which to run an agent.

194
00:14:07,600 --> 00:14:13,270
The first step is we're going to define a function to run the training based on those hyper parameters.

195
00:14:13,270 --> 00:14:21,970
And then we're going to pass that function with the sweep ID here in this one DB agent method.

196
00:14:22,120 --> 00:14:29,560
So now let's copy this code out and then paste it out here, paste this out here, and then you'll also

197
00:14:29,560 --> 00:14:35,830
notice that this is kind of like similar to what we had done here, because here we had defined this

198
00:14:35,830 --> 00:14:44,320
method which takes in the hyper parameters which we're trying to tune, and then we go through this

199
00:14:44,920 --> 00:14:48,380
method here for each and every sweep.

200
00:14:48,770 --> 00:14:50,330
Now let's get back here.

201
00:14:50,330 --> 00:14:51,830
We have the model.

202
00:14:51,860 --> 00:14:54,170
We could simply make use of this model.

203
00:14:54,170 --> 00:14:56,280
So let's copy out some part.

204
00:14:56,330 --> 00:14:58,250
Let's, let's just copy out this here.

205
00:14:58,850 --> 00:15:08,960
Let's copy that out and paste this here, paste it out and we have model tune, um, net model which

206
00:15:08,960 --> 00:15:12,080
we create here, hyper parameters and all of that.

207
00:15:12,080 --> 00:15:20,120
And then instead of what we had here, where we passed, we had the compile and the feed method, what

208
00:15:20,120 --> 00:15:22,910
will return here will be just this model.

209
00:15:22,910 --> 00:15:26,030
So we'll return the net model right here.

210
00:15:26,030 --> 00:15:33,800
So let's have this net model, let's take out this part of the code and that's fine.

211
00:15:34,010 --> 00:15:36,860
Okay, so we have this model tune here.

212
00:15:37,460 --> 00:15:44,540
Um, instead of make model, we have model tune and we'll pass the configuration in here.

213
00:15:44,570 --> 00:15:45,140
We add.

214
00:15:45,270 --> 00:15:48,540
The project and the entity in our init method.

215
00:15:48,900 --> 00:15:56,580
Then we've modified this keys right here to match with those of our one DB configurations which we have

216
00:15:56,580 --> 00:15:57,410
seen already.

217
00:15:57,420 --> 00:15:58,890
Let's get back here.

218
00:15:58,890 --> 00:16:02,520
Let's just copy this and put right here so you could see that.

219
00:16:02,820 --> 00:16:08,250
So as you could see, instead of number of units, one we have number of dense one and here we have

220
00:16:08,250 --> 00:16:09,480
number of dense two.

221
00:16:09,480 --> 00:16:10,950
So that's it.

222
00:16:10,980 --> 00:16:13,320
We have the different values you could take.

223
00:16:13,350 --> 00:16:19,740
We have the dropout, the regularization rate and the learning rate.

224
00:16:19,770 --> 00:16:26,250
Now, right here in this model, this whole model we should call model tune, we shall make use of this

225
00:16:26,250 --> 00:16:26,610
one.

226
00:16:26,610 --> 00:16:30,420
DB configurations, which we've already passed in here.

227
00:16:30,420 --> 00:16:41,130
And so here we have config and config and we're going to do the same for all this other different hyper

228
00:16:41,130 --> 00:16:41,880
parameters.

229
00:16:41,880 --> 00:16:49,330
Like here we'll have number of filters as your config number of filters.

230
00:16:49,480 --> 00:16:55,960
And then you should note that in this example we are not going to tune all the hyper parameters.

231
00:16:55,960 --> 00:17:04,240
So the hyper parameters which we want to tune should be in this parameter or parameters dictionary.

232
00:17:04,240 --> 00:17:07,330
In our sweep configuration.

233
00:17:07,480 --> 00:17:14,770
The method is random and the metric is the accuracy with the goal of maximizing it.

234
00:17:14,770 --> 00:17:19,090
So if you want to tune any parameter or hyper parameter, you put that in here.

235
00:17:19,090 --> 00:17:23,800
For the rest, we're just going to use this configurations which we've set already.

236
00:17:23,800 --> 00:17:26,470
So you just have some fixed values.

237
00:17:26,470 --> 00:17:31,840
So that said, we would modify all this and there we go.

238
00:17:31,840 --> 00:17:35,170
So we now have all this different modifications.

239
00:17:35,470 --> 00:17:45,490
We run this, run this, and then back here on this train, we could now replace all this with the compilation

240
00:17:45,490 --> 00:17:49,060
and the fit method.

241
00:17:50,110 --> 00:17:52,600
Now again, here we would have this learning rate.

242
00:17:52,600 --> 00:18:04,810
So we have learning rate and then, um, we could modify this number of epochs to say three or actually

243
00:18:04,810 --> 00:18:05,530
the config.

244
00:18:05,530 --> 00:18:08,800
So we have config number of epochs.

245
00:18:09,670 --> 00:18:13,150
And you should note that we're carrying this only on the validation set.

246
00:18:13,150 --> 00:18:18,550
So you could try this on the full training dataset that will take much more time.

247
00:18:18,550 --> 00:18:27,100
And also you could feel free to carry out this hyper parameter tuning on more parameters.

248
00:18:27,100 --> 00:18:34,330
So you could take up, say, the image size like this parameter here, you could add that up and then

249
00:18:34,840 --> 00:18:40,090
see how this image size affects the model performance.

250
00:18:40,090 --> 00:18:43,750
So that said, we have all this already.

251
00:18:44,290 --> 00:18:48,150
We have our agent, which takes the sweep ID, which we've defined already.

252
00:18:48,180 --> 00:18:53,250
We have the function, which is the string right here, and then we have this count, which is the number

253
00:18:53,250 --> 00:18:55,620
of runs to execute before we proceed.

254
00:18:55,620 --> 00:19:00,900
Note that we specify this configuration, which is essentially this configuration we have here.

255
00:19:00,900 --> 00:19:04,380
So we could just take this off and say, this is our configuration.

256
00:19:04,620 --> 00:19:07,230
Okay, so that's our configuration.

257
00:19:07,230 --> 00:19:13,500
And now we're ready to let our agents do their job.

258
00:19:13,500 --> 00:19:18,450
So let's run this now after 20 runs, here's what we get.

259
00:19:18,540 --> 00:19:19,380
Start from here.

260
00:19:19,410 --> 00:19:25,500
You see, that starts by picking the drop out rate, the learning rate, the number of dense one number

261
00:19:25,500 --> 00:19:33,750
of dense two, and then the regularization rate, while obviously keeping the other parameters constant.

262
00:19:33,960 --> 00:19:36,870
Here we have this loss and its corresponding accuracy.

263
00:19:37,320 --> 00:19:40,470
And you could check out for the other sweeps down here.

264
00:19:40,500 --> 00:19:48,280
You see we have 20 different runs and we could get right here, you see.

265
00:19:48,280 --> 00:19:51,250
And we click on this to view the sweep.

266
00:19:52,120 --> 00:19:58,840
And now on this page you could see the different runs we have here from one up to 20, the epochs or

267
00:19:58,840 --> 00:20:02,740
rather the run, the accuracy and its corresponding loss.

268
00:20:02,740 --> 00:20:06,250
And then, uh, we'll skip this tool for now.

269
00:20:06,250 --> 00:20:07,450
Let's look at this.

270
00:20:08,350 --> 00:20:13,690
We'll go ahead and check out the hyper parameters which produce the highest accuracy, which is this

271
00:20:13,690 --> 00:20:14,380
here.

272
00:20:14,500 --> 00:20:17,260
Let's highlight that we have this.

273
00:20:17,260 --> 00:20:19,750
Well, let's, let's pick this out from here.

274
00:20:20,200 --> 00:20:21,970
Um, there we go.

275
00:20:21,970 --> 00:20:23,140
We have this.

276
00:20:23,170 --> 00:20:24,520
We have this.

277
00:20:24,550 --> 00:20:32,050
We have this that comes down here, we have this and then we have this.

278
00:20:33,070 --> 00:20:38,890
And so here we have the hyper parameter values which give us the highest accuracy score.

279
00:20:39,040 --> 00:20:42,340
Then coming into this, let's take this off.

280
00:20:43,030 --> 00:20:44,890
We could check out this different parameters.

281
00:20:45,270 --> 00:20:48,750
In all parameter importance with respect to the accuracy.

282
00:20:48,750 --> 00:20:55,380
So you see that the most important is the learning rate, followed by the runtime number of dense to

283
00:20:55,380 --> 00:20:58,890
dropout rate, regularization rate, number of dense one.

284
00:20:58,890 --> 00:21:05,940
And here we have this other hyper parameters which we did not modify.

285
00:21:06,720 --> 00:21:09,400
Now let's click on this here and see what we have.

286
00:21:09,420 --> 00:21:14,700
We told automatically show, um, shows the most useful parameters.

287
00:21:14,700 --> 00:21:17,040
So let's check this out.

288
00:21:17,700 --> 00:21:23,880
You see that after clicking on that, the other fixed hyper parameters and even the runtime is taken

289
00:21:23,880 --> 00:21:24,180
off.

290
00:21:24,180 --> 00:21:29,700
So we are left with only this hyper parameters which we had fixed, or better still, which we had put

291
00:21:29,700 --> 00:21:32,130
in this, uh, sweep configuration.

292
00:21:32,910 --> 00:21:37,050
Then one other point you could note from here is this correlation.

293
00:21:37,050 --> 00:21:40,020
So apart from importance, we could check out the correlation.

294
00:21:40,020 --> 00:21:46,060
And here we told that this learning rate has a negative correlation because you could see with the red

295
00:21:46,090 --> 00:21:51,070
the red indicates negative correlation, while the green indicates positive correlation.

296
00:21:51,070 --> 00:21:52,150
So you could check out the values.

297
00:21:52,150 --> 00:21:56,320
See, this is 0.051 while this is -3.53.

298
00:21:56,350 --> 00:22:01,630
Now what this simply means is the lower the learning rate, then the higher the accuracy.

299
00:22:01,630 --> 00:22:07,990
And then in this plot to the left, we have the accuracies for each and every run.

300
00:22:08,740 --> 00:22:10,420
So that's it for the section.

301
00:22:10,420 --> 00:22:13,300
Thank you for getting up to this point and see you next time.