﻿1
00:00:00,830 --> 00:00:05,260
‫Here is a summary table of a classification neural network architecture.

2
00:00:06,420 --> 00:00:12,030
‫In the second table, you can see I have put four columns, first column in both the first table and

3
00:00:12,030 --> 00:00:14,110
‫the second table is for hyper parameters

4
00:00:15,090 --> 00:00:19,370
‫These are the values that we have to set prior to training our model.

5
00:00:20,730 --> 00:00:28,220
‫For example, how many layers will our neural network have is something we have to decide and give beforehand.

6
00:00:29,520 --> 00:00:35,520
‫So the common classification neural network hyper parameters are mentioned in the first row of table

7
00:00:35,520 --> 00:00:36,300
‫one and two.

8
00:00:39,130 --> 00:00:46,960
‫The second, third and fourth column in the second table are for three classification senarios. First is

9
00:00:47,020 --> 00:00:56,090
‫binary classification, which is classifying into two classes like marking a mail as spam or not spam.

10
00:00:57,820 --> 00:01:03,880
‫Second is multilabel binary classification, which means there are multiple binary variables.

11
00:01:04,330 --> 00:01:08,590
‫For example, the first variable is whether a mail is spam or not.

12
00:01:08,980 --> 00:01:13,180
‫And the second variable could be whether a mail is important or not.

13
00:01:15,550 --> 00:01:20,500
‫The third one is multiclass classification, which we discussed in last lecture also

14
00:01:20,530 --> 00:01:26,720
‫if we have four classes, trousers, shirts, socks and ties.

15
00:01:27,700 --> 00:01:31,330
‫This scenario falls under multiclass classification.

16
00:01:34,570 --> 00:01:41,320
‫Now, let's see what values of hyper parameters do we usually use for these three types of classification

17
00:01:41,320 --> 00:01:41,950
‫scenarios.

18
00:01:45,110 --> 00:01:48,400
‫The first parameter is number of input neurons.

19
00:01:49,540 --> 00:01:55,450
‫Number of input neurons are always equal to the number of input features.

20
00:01:56,500 --> 00:02:01,750
‫So if you have 16 input variables, you will have 16 input neurons.

21
00:02:02,980 --> 00:02:05,760
‫So you'll always have one neuron.

22
00:02:05,860 --> 00:02:06,970
‫per input feature.

23
00:02:09,040 --> 00:02:13,280
‫The second hyper parameter is how many hidden layers do we want in our network?

24
00:02:14,270 --> 00:02:16,120
‫Ideally, this depends on the problem.

25
00:02:16,660 --> 00:02:24,340
‫But typically we keep the number of the layers between one to five, keeping more than five hidden layers

26
00:02:24,340 --> 00:02:28,780
‫only increase the computational effort for our system.

27
00:02:31,530 --> 00:02:33,810
‫Third hyper parameter is hidden activation.

28
00:02:34,030 --> 00:02:39,300
‫That is the activation function that we put on the neurons in the hidden layers.

29
00:02:40,870 --> 00:02:42,420
‫This is usually RELU.

30
00:02:43,210 --> 00:02:44,890
‫We discussed rectifier linear unit.

31
00:02:45,700 --> 00:02:50,140
‫It is a very common function which is used for hidden layer activation.

32
00:02:51,130 --> 00:02:57,490
‫I told you earlier, also we used relu because it is very fast to execute in our systems.

33
00:02:59,650 --> 00:03:04,600
‫Other three hyper parameters vary with the type of classification that we are doing.

34
00:03:06,280 --> 00:03:10,880
‫So the number of output neurons in binary classification is one.

35
00:03:11,600 --> 00:03:15,820
‫But in multilabel binary classification, it is one per label.

36
00:03:17,080 --> 00:03:24,730
‫For example, if we are classifying an email as spam or not spam and the other label is important or

37
00:03:24,730 --> 00:03:30,490
‫not important, we will need to neurons in the output layer.

38
00:03:31,060 --> 00:03:33,860
‫One neuron would tell us whether it is spam or not spam.

39
00:03:34,360 --> 00:03:42,160
‫And the other neuron will tell us whether it is important or not important for multiclass classification.

40
00:03:42,640 --> 00:03:45,640
‫We have one output neuron per class.

41
00:03:47,500 --> 00:03:55,510
‫So for example, if we are classifying images into shirts, trousers, socks and ties, we will have

42
00:03:55,720 --> 00:03:58,820
‫four different output neurons for each of these class.

43
00:04:00,070 --> 00:04:06,850
‫And we will put a softmax activation layer on top of it to get the probability of each class happening

44
00:04:08,440 --> 00:04:15,850
‫next, hyper parameter is output layer activation in binary classification, logistic or sigmoid function

45
00:04:15,850 --> 00:04:16,570
‫works very well.

46
00:04:17,680 --> 00:04:20,230
‫You can use step function also.

47
00:04:21,290 --> 00:04:25,990
‫But as we have discussed, logistic function performs much better than a step function.

48
00:04:26,860 --> 00:04:32,650
‫So for binary classification and multilabel binary classification, we use the sigmoid function.

49
00:04:33,730 --> 00:04:39,400
‫But in multiclass classification, after the sigmoid function, we have to put an additional layer of

50
00:04:39,670 --> 00:04:43,750
‫softmax activation. last

51
00:04:43,750 --> 00:04:45,730
‫hyper parameter is lost function.

52
00:04:47,220 --> 00:04:52,090
‫We will be using cross-entropy as the lost function for all types of classifications.

53
00:04:53,770 --> 00:05:00,220
‫So these are the hyper parameters that you have to mention when you are running neural network model

54
00:05:00,610 --> 00:05:01,420
‫in a software.

55
00:05:02,620 --> 00:05:05,230
‫The values that are given here are typical values.

56
00:05:05,590 --> 00:05:11,770
‫That is, these are commonly used values, but it is not a hard and fast rule to use these values

57
00:05:11,830 --> 00:05:18,370
‫only. You can customize your neural network by using any other hyper parameter value.

58
00:05:23,070 --> 00:05:30,330
‫Next is the summary table of regression neural network architecture. Here on the left

59
00:05:30,360 --> 00:05:35,670
‫We have hyper parameters and on the right we have typical values that we use for this hyper parameters.

60
00:05:37,970 --> 00:05:40,350
‫The first one is number of input neurons.

61
00:05:40,770 --> 00:05:48,720
‫Again, it is one per input feature number of hidden layers, that depends on the problem.

62
00:05:48,870 --> 00:05:51,910
‫But usually we keep one to five hidden layers.

63
00:05:54,120 --> 00:05:58,630
‫Then comes number of neurons per 100 layers, again this depends on the problem.

64
00:05:59,100 --> 00:06:07,710
‫But typically we take 10 to 100 neurons in the hidden layer. Then is output neurons

65
00:06:09,540 --> 00:06:12,960
‫If we are predicting only one thing, we need only one output neuron.

66
00:06:13,620 --> 00:06:17,960
‫If we are predicting multiple things, we need one output neuron.

67
00:06:18,390 --> 00:06:21,570
‫per the number of things that we want to predict.

68
00:06:22,770 --> 00:06:28,240
‫For example, if you are predicting house price, that requires only one output neuron.

69
00:06:29,550 --> 00:06:36,120
‫On the other hand, if you are predicting the length and width of a petal of a flower form the image

70
00:06:36,120 --> 00:06:43,710
‫of the flower that requires two output neurons one for the length of the petal and second for the width of the petal

71
00:06:46,170 --> 00:06:52,160
‫Next is hidden activation hyper parameter, which means what will be the activation function in the hidden layers.

72
00:06:52,200 --> 00:07:01,410
‫most commonly used activation function is Relu in the hidden layers relu cannot be used as activation

73
00:07:01,410 --> 00:07:02,940
‫function in output layer.

74
00:07:04,020 --> 00:07:10,920
‫So for regression neural network in the output activation, we do not really need any activation function

75
00:07:11,010 --> 00:07:11,580
‫as such.

76
00:07:12,360 --> 00:07:18,870
‫If you want to play any particular boundary condition on the output, for example, if you want that

77
00:07:18,870 --> 00:07:20,760
‫the output should only be positive.

78
00:07:21,390 --> 00:07:25,000
‫Then you can play a RELU kind of function on top of it.

79
00:07:25,770 --> 00:07:31,510
‫Otherwise, there is no requirement of an activation function on the output layer.

80
00:07:33,630 --> 00:07:37,950
‫Last hyper parameter is lost function for regression neural network.

81
00:07:38,460 --> 00:07:43,500
‫The squared error can also work very well as a lost function.

82
00:07:44,790 --> 00:07:46,620
‫You cannot use cross entropy here.

83
00:07:47,190 --> 00:07:55,230
‫So we often use mean squared error, which is the mean of squared errors that we calculate for individual

84
00:07:55,230 --> 00:07:56,160
‫training examples.

85
00:07:58,020 --> 00:08:01,840
‫So these are all the hyper parameters that you need to specify.

86
00:08:02,010 --> 00:08:08,640
‫While running a regression neural network in the software on the right, you'll see the typical values.

87
00:08:08,730 --> 00:08:10,290
‫These are not fixed.

88
00:08:10,920 --> 00:08:15,750
‫You can still customize your neural network by changing these hyper parameter values.

89
00:08:16,620 --> 00:08:17,160
‫That's all.

90
00:08:17,220 --> 00:08:18,780
‫See you in the practical lectures.

