1
00:00:01,260 --> 00:00:10,480
‫Now in this lecture we are going to be on model using Cato's before that we will do one small thing.

2
00:00:10,510 --> 00:00:18,660
‫We will take out end of this small set of training examples as validation set of validation set is used

3
00:00:18,660 --> 00:00:22,000
‫to tune the model hyper barometers.

4
00:00:22,170 --> 00:00:30,720
‫So we'll just take our first 5000 examples as validation set and the remaining we will see them in another

5
00:00:30,720 --> 00:00:39,630
‫variable called partial training say so here and setting the noses of postpaid him in a variable called

6
00:00:39,630 --> 00:00:44,090
‫value indices.

7
00:00:44,210 --> 00:00:50,600
‫Now I knew this variable could take all the information off plus 5000 and stored it into the validation

8
00:00:50,600 --> 00:00:51,070
‫images

9
00:00:54,050 --> 00:01:00,660
‫for the first 5000 brain images are now stored in value images.

10
00:01:00,660 --> 00:01:05,030
‫The remaining part of the images is stored in part underscored brain images

11
00:01:08,780 --> 00:01:11,280
‫we do the same thing with labels.

12
00:01:11,600 --> 00:01:19,670
‫The First 5000 labels are stored in VAT labels and the rest of the labels are stored in part print labels

13
00:01:23,000 --> 00:01:25,050
‫so we have created two parts.

14
00:01:25,080 --> 00:01:32,060
‫Although these trading examples one is validation say and one is the partial dating reality partial

15
00:01:32,060 --> 00:01:33,340
‫dating site to trendy model.

16
00:01:33,980 --> 00:01:39,920
‫And the validation set will be used to do only hyper parameters and you will see the use of validation

17
00:01:39,930 --> 00:01:44,700
‫set in becoming victims.

18
00:01:44,740 --> 00:01:50,210
‫Now there are two ways in which we can define our model using get us.

19
00:01:50,560 --> 00:01:55,240
‫One is using sequential EPA and the other is using functional EPA

20
00:01:57,890 --> 00:01:58,460
‫sequencing.

21
00:01:58,460 --> 00:02:05,090
‫EPA is used when we want to make a normal neural network with linear stack of layers.

22
00:02:05,090 --> 00:02:12,290
‫The functional EPA is used for a complex network structure where you have multiple usages of several

23
00:02:12,290 --> 00:02:21,020
‫small one it looks will see an example of functional EPA when we build a regression model for this classification

24
00:02:21,020 --> 00:02:25,980
‫example we will use sequential EPA.

25
00:02:26,570 --> 00:02:30,750
‫Now we'll be taking three steps in defining our model.

26
00:02:30,770 --> 00:02:34,250
‫The first step is defining the network structure.

27
00:02:34,250 --> 00:02:42,710
‫This includes setting up the number of layers number of neurons in each layer and the activation function

28
00:02:42,800 --> 00:02:45,150
‫to be used in each layer.

29
00:02:45,170 --> 00:02:52,100
‫This is captured and this part of decode.

30
00:02:52,300 --> 00:03:00,510
‫The second part is configuring the learning process this includes selecting the lost function and optimizer

31
00:03:01,200 --> 00:03:05,050
‫and some metrics to be monitored.

32
00:03:05,070 --> 00:03:12,050
‫This is this part of the code.

33
00:03:12,130 --> 00:03:17,340
‫The third is specifying the operations and blending the model.

34
00:03:17,350 --> 00:03:21,010
‫This is done in this last part of this code.

35
00:03:23,710 --> 00:03:28,330
‫So let's start discussing each line of code one by one.

36
00:03:28,670 --> 00:03:37,490
‫Here we start by creating a new variable called model and this variable will contain the information

37
00:03:37,610 --> 00:03:41,020
‫of the structure of our network.

38
00:03:41,030 --> 00:03:50,030
‫This is the function we use for using sequence and EPA gave us model underscored sequences and then

39
00:03:50,030 --> 00:03:51,960
‫to start defining the structure.

40
00:03:52,250 --> 00:03:55,330
‫We use this type symbol.

41
00:03:55,730 --> 00:04:01,890
‫This pipe operator comes with the magnetite package which is auto installed when we installed the cameras

42
00:04:01,940 --> 00:04:03,920
‫package.

43
00:04:03,950 --> 00:04:08,790
‫This is used for passing values as arguments to function.

44
00:04:09,050 --> 00:04:17,090
‫We can do away with a decimal also but the use of this symbol makes the code more readable and compact.

45
00:04:17,090 --> 00:04:20,950
‫So as a good practice we will use this operator too.

46
00:04:20,990 --> 00:04:23,490
‫This is a pipe operator.

47
00:04:24,860 --> 00:04:34,340
‫And if you remember even this operator that we used by assigning new plane images and green labels this

48
00:04:34,340 --> 00:04:40,370
‫multiple assignment operator this has come into the alert package which was also part of the clearance

49
00:04:40,400 --> 00:04:42,540
‫package.

50
00:04:42,740 --> 00:04:53,090
‫We're using this also because it makes the code compact so first we flatten the input layer my flattening

51
00:04:53,450 --> 00:05:00,860
‫I mean that we have 28 micro indeed image we can turn it into one dimension by putting all these pixels

52
00:05:01,070 --> 00:05:02,650
‫in one line.

53
00:05:02,660 --> 00:05:12,800
‫For example if you have this three by 3 two dimensional Eddy you can flatten it by putting these multiple

54
00:05:12,800 --> 00:05:14,440
‫roles in front of each other.

55
00:05:14,780 --> 00:05:17,060
‫So this will become a one dimensional edit

56
00:05:20,270 --> 00:05:26,630
‫this step is important as we have to give a straight shape of input values in place offered to the input

57
00:05:28,460 --> 00:05:31,790
‫you can converted in one dimensional using edit easier.

58
00:05:31,790 --> 00:05:36,830
‫Function also when we have hit us why should we bother.

59
00:05:36,920 --> 00:05:47,190
‫Just specify here layer flatten like this and specify the input shape.

60
00:05:47,750 --> 00:05:52,380
‫That is what is the kind of input that this layer is having.

61
00:05:52,460 --> 00:06:00,380
‫It will automatically convert this to endured by two indeed input in 2 7 a report pixel values for the

62
00:06:00,380 --> 00:06:03,660
‫next layer.

63
00:06:04,050 --> 00:06:09,660
‫Next we specify any details of our dense hidden layer.

64
00:06:09,660 --> 00:06:17,730
‫That is we are telling that this layer is dense meaning each neuron is connected to all neurons of the

65
00:06:17,730 --> 00:06:18,290
‫next layer.

66
00:06:19,290 --> 00:06:26,760
‫And then this layer we want one twenty eight neurons and the activation function for all these neurons

67
00:06:27,090 --> 00:06:29,010
‫will meet the loop.

68
00:06:29,010 --> 00:06:33,270
‫That is rectified linear Unit 2 in this way.

69
00:06:33,330 --> 00:06:35,730
‫We have defined one hidden layer.

70
00:06:35,850 --> 00:06:37,320
‫It's a densely.

71
00:06:37,320 --> 00:06:43,020
‫It has 120 neutrons and they look at and function.

72
00:06:43,190 --> 00:06:45,560
‫Next we specify the output layer.

73
00:06:45,740 --> 00:06:55,790
‫You can add more layers also but here I am using only one had a layer and one output in this last net.

74
00:06:55,890 --> 00:07:03,750
‫We have 10 neutrons whiten because we have 10 classes to be predicted.

75
00:07:03,750 --> 00:07:09,840
‫Each of these 10 neutrons will be predicting the probability of one class such as whether it is a shirt

76
00:07:10,020 --> 00:07:11,410
‫or a boot.

77
00:07:11,700 --> 00:07:19,410
‫And if you remember from your theory lecture this soft Max activation just make sure that the sum of

78
00:07:19,440 --> 00:07:22,770
‫all the probabilities come out to 1.

79
00:07:22,860 --> 00:07:28,330
‫So this last layer has 10 neurons with soft Max activation.

80
00:07:28,980 --> 00:07:31,210
‫That's all for the structure.

81
00:07:31,230 --> 00:07:39,590
‫In short it is a 780 for hyphen 128 different in neural network.

82
00:07:39,810 --> 00:07:46,020
‫Once we have run the entire code I would suggest that you come back to this point and experiment a little

83
00:07:46,020 --> 00:07:52,560
‫bit here try to see what is the effect of having more lives and what is the effect of increasing or

84
00:07:52,560 --> 00:07:56,810
‫decreasing the number of neurons in different layers.

85
00:07:56,940 --> 00:07:59,650
‫Next underscored.

86
00:08:01,350 --> 00:08:08,740
‫You can see that a new variable called model is created and it has the structure stored in it.

87
00:08:08,760 --> 00:08:13,400
‫Now let's look at the second step at this step.

88
00:08:13,460 --> 00:08:18,880
‫We configured the learning process in this.

89
00:08:18,970 --> 00:08:23,650
‫The first thing is specifying optimizer.

90
00:08:23,760 --> 00:08:28,080
‫We have discussed the concept behind stochastic gradient descent.

91
00:08:28,140 --> 00:08:29,280
‫This is deadly.

92
00:08:29,340 --> 00:08:38,850
‫Is that only data other optimizer also with small differences other optimizes include Adam Artemus prof

93
00:08:39,510 --> 00:08:41,720
‫and few others.

94
00:08:42,030 --> 00:08:45,710
‫In fact in the coming team we may see few more added to this list.

95
00:08:46,980 --> 00:08:54,420
‫But to answer the question which should be used then ideally the optimizer depends on the shape of the

96
00:08:54,420 --> 00:08:56,300
‫edit function go.

97
00:08:56,880 --> 00:08:58,780
‫But we do not know that shape.

98
00:08:59,100 --> 00:09:08,360
‫So we do not know the ideal optimize it but practically in most of these scenarios all of these work

99
00:09:08,400 --> 00:09:09,040
‫very well.

100
00:09:09,930 --> 00:09:17,160
‫It's just that for some scenarios as did converges it and for some situations outermost prop converges

101
00:09:17,160 --> 00:09:19,160
‫faster.

102
00:09:19,170 --> 00:09:24,540
‫So my suggestion would be labeled readonly model once with duty.

103
00:09:24,840 --> 00:09:30,540
‫If you think it is taking too long to convert and you are also not seeing much improvement in training

104
00:09:30,930 --> 00:09:36,720
‫it is always worth a shot to Brio Artemus prop optimizer also.

105
00:09:37,400 --> 00:09:41,870
‫Let's move on to this again parameter which is lost function.

106
00:09:42,090 --> 00:09:49,500
‫We have this because this is the theory part for classification models views cross entropy and for regression

107
00:09:49,500 --> 00:09:57,990
‫models we use the means squared added values but within cross entropy you will find three options.

108
00:09:58,230 --> 00:10:03,930
‫Which of these three should you use depends on the type of problem you have.

109
00:10:03,930 --> 00:10:10,830
‫These are the three names sparse categorical cross entropy binary cross entropy and categorical cross

110
00:10:10,880 --> 00:10:21,430
‫entropy if your problem has two classes to be predicted like whether a is spam of North spam we use

111
00:10:21,430 --> 00:10:24,190
‫the binary cross entropy.

112
00:10:25,400 --> 00:10:32,210
‫If you have multiple classes such as this problem where we have ten fashion objects and each example

113
00:10:32,390 --> 00:10:42,260
‫is exclusive meaning each image contains only one object to be predicted then we use sparse categorical

114
00:10:42,260 --> 00:10:43,630
‫cross and Ruby.

115
00:10:44,700 --> 00:10:52,160
‫So that is why I am right and logical loose pass categorical course and repeated if we have multiple

116
00:10:52,160 --> 00:11:00,380
‫classes and one observation can belong to many classes for example if we are labeling whether an email

117
00:11:00,380 --> 00:11:06,410
‫is from someone you know or not and we are also labeling whether the email is important or not.

118
00:11:07,700 --> 00:11:10,730
‫Here one email can be bought.

119
00:11:10,730 --> 00:11:17,780
‫It can be from someone you know and it can be important so it may belong to two classes at the same

120
00:11:17,780 --> 00:11:18,970
‫time.

121
00:11:18,980 --> 00:11:23,670
‫In this scenario we use categorical growth and Ruby.

122
00:11:24,230 --> 00:11:26,420
‫I hope you understood this.

123
00:11:26,420 --> 00:11:29,210
‫Here's the gist of what I just said.

124
00:11:29,600 --> 00:11:36,980
‫You can look at this commented part here to understand the three cross and rupees.

125
00:11:37,400 --> 00:11:39,920
‫The third parameter is metrics.

126
00:11:39,920 --> 00:11:45,650
‫This is not mandatory but we specify this to monitor the performance of model on the training set.

127
00:11:48,350 --> 00:11:54,960
‫Basically we would like to see the improvement in accuracy of a classification model or the mean squared.

128
00:11:55,000 --> 00:11:59,760
‫Edit offer the regression model over each epoch.

129
00:12:00,440 --> 00:12:09,350
‫As I told you we go over the entire training dataset several times each time we will calculate the accuracy

130
00:12:09,590 --> 00:12:10,700
‫of our model.

131
00:12:10,790 --> 00:12:17,510
‫At that instant and store it so that we can see if the learning process is having any improvement in

132
00:12:17,510 --> 00:12:18,490
‫accuracy or not.

133
00:12:20,990 --> 00:12:27,980
‫So with these three parameters say we can done this part of the code.

134
00:12:28,090 --> 00:12:30,350
‫Now we have configured the learning process also.

135
00:12:30,700 --> 00:12:38,170
‫This brings us to the third part where we actually train not more than training is done using different

136
00:12:38,290 --> 00:12:41,660
‫function within function.

137
00:12:41,680 --> 00:12:45,460
‫We have to specify the input variable first.

138
00:12:46,120 --> 00:12:54,980
‫That is the posture training dataset that we will input then comes the actual output corresponding to

139
00:12:54,980 --> 00:12:59,980
‫those inputs to the actual output is stored in partial train labeled.

140
00:13:00,500 --> 00:13:02,200
‫So that is the second parameter.

141
00:13:04,690 --> 00:13:07,600
‫Next we specify the epoch number.

142
00:13:08,140 --> 00:13:13,970
‫This is the number of times our entire training data will be put into the model.

143
00:13:14,140 --> 00:13:21,030
‫We set this to 30 for this example then we have that size.

144
00:13:21,040 --> 00:13:27,460
‫This is the number of observations which will be used during each forward and backward propagation step

145
00:13:27,980 --> 00:13:30,380
‫so we take a battery of hundred

146
00:13:33,730 --> 00:13:42,030
‫lastly we tell that we have a separate validation data also which is a list of valid images and value

147
00:13:42,040 --> 00:13:48,140
‫labels and we would like to see the accuracy score on this validation data.

148
00:13:48,200 --> 00:13:57,280
‫As keep in mind that only this part brain images and bartering labels will be used to bring any model

149
00:13:58,460 --> 00:14:01,680
‫this validation data is like this data.

150
00:14:01,700 --> 00:14:09,260
‫In this scenario that is our model will not have seen the validation data when it calculates the accuracy

151
00:14:09,260 --> 00:14:11,420
‫on this validation data.

152
00:14:11,680 --> 00:14:15,960
‫Now let's run this line of code as well and this will be number one

153
00:14:24,960 --> 00:14:31,880
‫well you can see that neural network model is getting green and the accuracy in lost value is being

154
00:14:32,180 --> 00:14:40,090
‫recorded for each epoch in the next video we will see the performance of this train more model.