1
00:00:02,910 --> 00:00:03,990
‫Hello everyone.

2
00:00:03,990 --> 00:00:05,680
‫Welcome back.

3
00:00:05,700 --> 00:00:09,300
‫In this lecture we are going to learn about two concepts.

4
00:00:09,300 --> 00:00:14,340
‫One is how to build a neural network for regression problems.

5
00:00:14,340 --> 00:00:19,550
‫And the second is how to do it using functional EPA.

6
00:00:19,630 --> 00:00:25,470
‫They know we have been using sequential EPA and the selective utility how to use functionally being

7
00:00:26,520 --> 00:00:34,830
‫functionally being is basically used for defining complex models such as Marty input on multi output

8
00:00:34,830 --> 00:00:39,260
‫models or models which I've shared led.

9
00:00:39,440 --> 00:00:47,310
‫So first really made a normal model using functional EPA which could also be done using sequentially.

10
00:00:48,250 --> 00:00:55,380
‫But then we will create a complex neural network structure which can only be handled by a functional

11
00:00:55,380 --> 00:00:57,200
‫EPA.

12
00:00:57,510 --> 00:01:04,650
‫Also we'll be solving a regression problem which means our output variable is a continuous variable.

13
00:01:04,830 --> 00:01:09,240
‫That is it is not due to one type it can have any value without any boundaries.

14
00:01:11,090 --> 00:01:19,200
‫For this problem will be using Boston housing data say it is a very standard data set in which we have

15
00:01:19,350 --> 00:01:20,310
‫14 variables

16
00:01:23,570 --> 00:01:26,110
‫13 of the predictive variables and 14.

17
00:01:26,110 --> 00:01:33,540
‫One is the value of the house basically using the values of 13 predictor variables.

18
00:01:33,590 --> 00:01:39,180
‫We want to predict the value of house.

19
00:01:39,240 --> 00:01:46,740
‫This is also an in big data say hey not give us liberty so we can imported using this line

20
00:01:50,040 --> 00:01:56,340
‫if we want to know more about the most housing data say you can visit this link it has details of only

21
00:01:56,370 --> 00:02:02,940
‫13 predictive variables the predicted variables include variables like crime rate number of hotel rooms

22
00:02:02,940 --> 00:02:14,130
‫etc. You can see that Boston housing data set is now imported you can look at this by clicking on it.

23
00:02:14,580 --> 00:02:19,800
‫The Boston housing data center has two parts green part and the best part.

24
00:02:20,380 --> 00:02:28,230
‫Within brain we have four hundred for observations of thirteen predictive variables that is in the ex

25
00:02:29,160 --> 00:02:37,880
‫and we have the labels that is the value of holes to we predicted invite in test.

26
00:02:37,920 --> 00:02:40,600
‫We have a set of hundred two observations.

27
00:02:40,680 --> 00:02:43,890
‫Again taking predictive variables and invite.

28
00:02:43,890 --> 00:02:45,440
‫We have the opportunity

29
00:02:49,250 --> 00:02:56,610
‫now as we did earlier we'll be importing the training part into training data and brain labels variable

30
00:02:57,450 --> 00:03:04,590
‫and testing part of this dataset into best data and best labeled variable.

31
00:03:04,980 --> 00:03:06,440
‫Next run these two lines of code

32
00:03:10,040 --> 00:03:15,640
‫and now we have these new variables test data and bring data.

33
00:03:15,770 --> 00:03:22,790
‫These are the predictor part of the data and test levels and grade levels.

34
00:03:22,820 --> 00:03:24,560
‫These are the output part of the

35
00:03:27,500 --> 00:03:30,350
‫next guns preparing the data.

36
00:03:30,350 --> 00:03:37,600
‫And one of the important steps that we saw earlier was normalizing the data in the previous problem.

37
00:03:37,700 --> 00:03:41,240
‫We had only pixel data which was homogeneous.

38
00:03:41,240 --> 00:03:47,090
‫So we simply divided it by two to five to get the skilled version of that data.

39
00:03:47,310 --> 00:03:56,180
‫But now we have heterogeneous data that is all these 13 variables representing 13 different things.

40
00:03:56,180 --> 00:03:57,950
‫It is not easy to scale.

41
00:03:57,950 --> 00:03:58,750
‫So it's kind of meta

42
00:04:02,110 --> 00:04:03,820
‫to normalize this data.

43
00:04:03,820 --> 00:04:12,030
‫We used this function sked this scale function automatically finds out the meaning of every variable

44
00:04:12,360 --> 00:04:14,970
‫and the standard deviation of every variable.

45
00:04:15,450 --> 00:04:18,230
‫And it uses that formula that I showed you earlier.

46
00:04:18,960 --> 00:04:25,080
‫It's a proxy meaning from each value and divides that value by the standard deviation.

47
00:04:25,290 --> 00:04:34,220
‫So simply using the scale function you can normalize our training data to normalize the test data.

48
00:04:34,650 --> 00:04:40,920
‫We do not use the mean and standard deviation of vectors data we use to mean and standard deviation

49
00:04:40,920 --> 00:04:43,580
‫of training data.

50
00:04:43,660 --> 00:04:48,240
‫The concept is we know only the training part of the data.

51
00:04:48,290 --> 00:04:51,440
‫Our model does not know any other detail of the word.

52
00:04:52,130 --> 00:04:54,010
‫So we have only the training.

53
00:04:54,020 --> 00:04:59,420
‫But from that we find out the mean and standard deviation of each variable.

54
00:04:59,640 --> 00:05:06,540
‫We assume that this standard deviation and mean of each variable applies to the entire dataset of the

55
00:05:06,540 --> 00:05:06,880
‫world.

56
00:05:08,300 --> 00:05:14,620
‫So using those mean and standard deviations you'll be scaling our best data.

57
00:05:14,620 --> 00:05:14,980
‫Also

58
00:05:18,100 --> 00:05:27,460
‫so in this line we will scale our training data using this scale function and this line we will find

59
00:05:27,460 --> 00:05:27,660
‫out.

60
00:05:27,660 --> 00:05:35,790
‫The column means indicating data and we'll be storing that information in this variable called mean

61
00:05:35,790 --> 00:05:36,220
‫stream

62
00:05:39,240 --> 00:05:40,110
‫in this line.

63
00:05:40,110 --> 00:05:46,530
‫We are finding out the standard deviation of these variables in training data and storing them in the

64
00:05:46,530 --> 00:05:51,050
‫variable called call standards.

65
00:05:51,290 --> 00:05:58,740
‫Now using the means of training data and standard deviations we use this scale function.

66
00:05:58,750 --> 00:06:05,610
‫It is the same scale function but here we are specifying the mean and the standard deviation to be used

67
00:06:06,090 --> 00:06:10,400
‫for scaling this test data.

68
00:06:10,410 --> 00:06:12,240
‫Now our data is ready.

69
00:06:12,360 --> 00:06:13,990
‫Our training data is normalized.

70
00:06:14,040 --> 00:06:18,390
‫Our test data is also normalized.

71
00:06:18,480 --> 00:06:23,780
‫Any new data on which you want to predict the outcome of the model.

72
00:06:24,120 --> 00:06:27,780
‫You have to scale it again using this scale function

73
00:06:31,330 --> 00:06:35,890
‫now comes the part when we define neural network.

74
00:06:35,890 --> 00:06:42,590
‫This time we'll be using functionally be functionally B.A. has two different parts.

75
00:06:42,640 --> 00:06:47,110
‫One is the input and one is output inputs.

76
00:06:47,310 --> 00:06:52,240
‫Early model about all the variables that we are inputting in the model.

77
00:06:53,560 --> 00:07:01,840
‫So basically in the input layer we clearly model that we have an input layer of shape is equal to number

78
00:07:01,840 --> 00:07:03,820
‫of variables.

79
00:07:03,820 --> 00:07:10,110
‫I could have written 13 here because I know that the number of variables are 13 in this particular dataset

80
00:07:11,530 --> 00:07:16,890
‫but even if you change your train data you need not update your model.

81
00:07:17,050 --> 00:07:23,710
‫If you divided like this if you died this way This means that you want to get this second dimension

82
00:07:23,830 --> 00:07:25,780
‫of declining data.

83
00:07:25,930 --> 00:07:30,620
‫So basically training data has these two dimensions.

84
00:07:31,060 --> 00:07:35,860
‫It has 400 photos and 13 columns.

85
00:07:35,860 --> 00:07:41,740
‫We want this dimension because this represents a number of variables in this training data.

86
00:07:41,740 --> 00:07:42,630
‫So that is what we do.

87
00:07:43,120 --> 00:07:44,320
‫We have written to here.

88
00:07:45,250 --> 00:07:52,130
‫So using this even if you change your brain data to any other dataset you'd need not update your shape

89
00:07:52,150 --> 00:07:57,500
‫for this neural network it will automatically get updated.

90
00:07:57,580 --> 00:08:01,320
‫The second part is the output layer.

91
00:08:01,390 --> 00:08:09,520
‫And this layer we first include the input layer to this outwardly which is the same as the input layer

92
00:08:09,550 --> 00:08:12,290
‫that we created earlier.

93
00:08:13,030 --> 00:08:21,160
‫This is important because this creates the connection between the input and output layer if we do not

94
00:08:21,160 --> 00:08:25,460
‫specify that this output layer has this input layer.

95
00:08:25,540 --> 00:08:29,460
‫Then there will be no connection between these two.

96
00:08:29,680 --> 00:08:36,080
‫So in the output layer the first thing is always the input layer that it will take.

97
00:08:36,190 --> 00:08:44,640
‫Then comes the hidden layers which is similar to the way that we specify in these sequential EPA too.

98
00:08:44,860 --> 00:08:53,560
‫In this scenario we are using to the layers both with sixty four neurons the activation function for

99
00:08:53,560 --> 00:08:57,440
‫both of these is real need.

100
00:08:57,470 --> 00:09:03,730
‫Lastly that is the output layer has only one neuron and it has no activation function because it is

101
00:09:03,730 --> 00:09:06,250
‫a regression problem.

102
00:09:06,460 --> 00:09:10,600
‫So let's run these two lines of code.

103
00:09:11,110 --> 00:09:16,610
‫This creates one input into input.

104
00:09:16,610 --> 00:09:20,800
‫Now this will create another output center which is predictions

105
00:09:23,730 --> 00:09:31,270
‫9 functionally we create the model using get us model function.

106
00:09:31,270 --> 00:09:33,370
‫It takes in two parameters.

107
00:09:33,370 --> 00:09:37,730
‫One is the inputs and 1 is output input.

108
00:09:37,750 --> 00:09:43,780
‫We have a name as importantly and the output has been named as predictions.

109
00:09:43,780 --> 00:09:47,420
‫So input particular input output is equal to predictions.

110
00:09:47,620 --> 00:09:50,750
‫And this defines the models architecture.

111
00:09:52,250 --> 00:10:00,210
‫So our models architecture is we have 13 variables which are coming in as input in the first event.

112
00:10:00,370 --> 00:10:03,230
‫We have 64 neurons in the second layer.

113
00:10:03,250 --> 00:10:10,120
‫We have another 64 neurons and in the output layer we have one output neuron.

114
00:10:10,480 --> 00:10:18,390
‫So when I done this a model is created and its architecture is specified.

115
00:10:18,400 --> 00:10:23,580
‫Now we configured this model in configuration we specify to optimize it.

116
00:10:23,710 --> 00:10:32,740
‫We can use a duty Optimus probe on whichever you like lost function for immigration problems is a messy

117
00:10:33,400 --> 00:10:40,720
‫mean square the matrix is not a necessity however we have used mean absolute error

118
00:10:43,890 --> 00:10:50,750
‫to win and then model is configured.

119
00:10:51,330 --> 00:10:54,910
‫Now we train our model using different function.

120
00:10:55,200 --> 00:11:02,050
‫Again we input the training data training levels epochs Andy Betsy's

121
00:11:13,750 --> 00:11:22,510
‫you can see that the model is running for 30 bucks and a loss that is the MDC is steadily decreasing.

122
00:11:23,140 --> 00:11:31,890
‫I mean absolute but it is also because the model has run for 30 bucks.

123
00:11:32,320 --> 00:11:39,810
‫We can check the performance of this model on our test it does is similar to what we have done when

124
00:11:39,820 --> 00:11:49,520
‫we were using sequentially B used to evaluate function and importantly test time best labeled and do

125
00:11:49,520 --> 00:11:54,140
‫it to list on the test loss and test absolute error.

126
00:11:54,140 --> 00:12:02,320
‫We can run these two commands and we can see that the test lost is thirty two point five six and this

127
00:12:02,440 --> 00:12:04,390
‫absolute edit is four point forty eight.

128
00:12:06,670 --> 00:12:14,620
‫So in this video we saw how to use functional EPA to be another one more on this model could have been

129
00:12:14,620 --> 00:12:17,640
‫built using sequential EPA as well.

130
00:12:18,700 --> 00:12:24,850
‫And in fact it would have been easier to use sequentially being hit but in the next lecture we will

131
00:12:24,850 --> 00:12:32,950
‫see if we have a complex neural network architecture how functionally B helps us in building that scene

132
00:12:32,950 --> 00:12:33,640
‫in the next one.