1
00:00:00,630 --> 00:00:06,750
‫In the last lecture, we have created the structure for our multi level Perceptron model.

2
00:00:08,190 --> 00:00:12,930
‫Now, before training this model, we need to set up the learning processes.

3
00:00:15,300 --> 00:00:18,930
‫And to do that, we will use the compile method.

4
00:00:20,810 --> 00:00:22,860
‫We will first give the lost function.

5
00:00:23,310 --> 00:00:25,290
‫Then we will give the optimizer.

6
00:00:26,130 --> 00:00:34,530
‫And then the metrics we want to calculate to judge the performance of our model we are using.

7
00:00:34,590 --> 00:00:38,270
‫Loss function as sparse, categorical cross entropy

8
00:00:39,870 --> 00:00:47,640
‫We are using this because our y  data is available in the form of labels in our data.

9
00:00:47,760 --> 00:00:55,830
‫We have specific labels for ten different items and that's why we are using this sparse categorical

10
00:00:55,830 --> 00:00:56,910
‫Cross entropy

11
00:00:58,570 --> 00:01:06,600
‫If instant we had probabilities per class in our  Y variable, then we had to use Categorical

12
00:01:06,660 --> 00:01:07,580
‫Cross entropy

13
00:01:08,670 --> 00:01:13,620
‫But since we have labels, we are using the sparse, categorical Cross entropy.

14
00:01:16,090 --> 00:01:22,620
‫And suppose we had binary labels such as yes or no or true or false.

15
00:01:23,110 --> 00:01:27,040
‫In that case, we had to use Binary Cross entropy.

16
00:01:29,300 --> 00:01:35,300
‫You can get details of all these loss functions in the official keras documentation.

17
00:01:36,530 --> 00:01:39,150
‫I have provided the link of that documentation.

18
00:01:39,740 --> 00:01:47,960
‫So if you open it, you will get details of all the parameters that this compile method can take.

19
00:01:49,370 --> 00:01:56,840
‫You can look at all other optimizers and lost function and metricses in the following documentation's.

20
00:02:00,250 --> 00:02:09,040
‫Then for optimizer, we are using sgd, sgd simply stands for stochastic gradient descent.

21
00:02:10,960 --> 00:02:20,020
‫In other words, we are just telling keras to perform back propagation algorithm and for metrics, we are

22
00:02:20,020 --> 00:02:24,790
‫using accuracy since we are building a classifier.

23
00:02:25,300 --> 00:02:26,740
‫We have to use accuracy.

24
00:02:27,010 --> 00:02:33,670
‫If you are using the regression model, you can use mean square error and so on.

25
00:02:35,720 --> 00:02:41,090
‫So basically, we have to provide this information before fitting our training data.

26
00:02:42,320 --> 00:02:46,130
‫So just run this command, we are giving three parameters

27
00:02:46,940 --> 00:02:47,630
‫Compiling.

28
00:02:51,920 --> 00:03:00,770
‫Snow, we have compiled our model. Next step, is to fit x train and y train data in this model.

29
00:03:02,770 --> 00:03:09,020
‫This is the syntax of fitting the model we are calling dot fit method.

30
00:03:09,350 --> 00:03:14,590
‫And then we are providing x train y train, the number of epochs.

31
00:03:15,410 --> 00:03:22,770
‫I hope you remember what epochs are we have discussed in our theory lectures and by default the epochs value

32
00:03:22,850 --> 00:03:23,840
‫is set to one.

33
00:03:25,550 --> 00:03:30,080
‫So if you don't mention epoch, by default, the value is one.

34
00:03:32,660 --> 00:03:35,180
‫And then since we have validation data as well

35
00:03:35,540 --> 00:03:43,690
‫So we are providing X valid and y valid datasets that we have created in our previous lectures.

36
00:03:44,540 --> 00:03:49,700
‫We are storing this object into an another object, which we are calling it model history.

37
00:03:51,200 --> 00:03:53,210
‫So let's run this.

38
00:04:02,560 --> 00:04:10,350
‫You can see at each epoch during the trianing, keras display the number of instances process.

39
00:04:10,720 --> 00:04:11,380
‫So far.

40
00:04:15,730 --> 00:04:21,730
‫You can see there is a progress bar and we are getting information of each epochs.

41
00:04:23,420 --> 00:04:32,710
‫And then we are also getting the loss, accuracy, validation, loss and validation accuracy during

42
00:04:32,800 --> 00:04:33,790
‫each epoch.

43
00:04:38,870 --> 00:04:42,890
‫So it will take some time depending on your system configurations.

44
00:04:44,540 --> 00:04:46,670
‫So I'm just fast forwarding this.

45
00:04:59,130 --> 00:05:00,760
‫Now the training is complete.

46
00:05:01,300 --> 00:05:08,560
‫You can see that the loss on our training data is zero point zero eight accuracy is zero point nine

47
00:05:08,560 --> 00:05:17,920
‫seven for our validation set loss is zero point three nine and accuracy is zero point eight eight.

48
00:05:19,570 --> 00:05:23,200
‫So if you food just compared with the first epoch

49
00:05:23,200 --> 00:05:29,700
‫Value the accuracy on our validation set during our first epoch was zero point eight nine.

50
00:05:33,820 --> 00:05:41,290
‫Now you can see our validation, accuracy is oscillating but our training accuracy.

51
00:05:42,670 --> 00:05:47,740
‫So during the first epoch, the accuracy score was four nine five two.

52
00:05:48,790 --> 00:05:54,730
‫And after the last epoch the accuracy score is zero point nine seven.

53
00:05:55,960 --> 00:06:01,400
‫So in each epoch, the training accuracy is increasing little by little.

54
00:06:04,430 --> 00:06:06,000
‫So now we have trained our data

55
00:06:08,210 --> 00:06:12,770
‫There are few more parameters that are available with fit method.

56
00:06:15,080 --> 00:06:18,050
‫One important parameter is class weights.

57
00:06:19,460 --> 00:06:24,110
‫So if you have some uneven distribution of your classes in youe y variable.

58
00:06:25,010 --> 00:06:33,200
‫So suppose all over 60000 reports, 50000 were shirts and rest if

59
00:06:33,200 --> 00:06:37,820
‫Nine categories are spread across the remaining 10000 records.

60
00:06:38,570 --> 00:06:49,880
‫Then we have to use class weights to give larger weight to underrepresented classes and to give lower weights

61
00:06:50,060 --> 00:06:51,530
‫to over represened classes.

62
00:06:51,560 --> 00:07:00,470
‫since in our dataset the categories are uniformly spread and there is no uneven distribution

63
00:07:00,680 --> 00:07:01,700
‫of categories.

64
00:07:01,880 --> 00:07:04,440
‫That's why we are not using class weights.

65
00:07:06,140 --> 00:07:13,550
‫But if in your example, there is some under representations of some specific classes, then you have

66
00:07:13,550 --> 00:07:22,750
‫to use class weights after fitting your model you can call different attributes of our model

67
00:07:22,760 --> 00:07:24,020
‫than history object.

68
00:07:24,980 --> 00:07:27,840
‫So you can call parameters.

69
00:07:28,550 --> 00:07:33,490
‫This will give you information of all the parameters that we have use in training this model.

70
00:07:36,770 --> 00:07:45,570
‫We have another parameter that is dot epoch that will give you details of each epoch and the most important

71
00:07:45,600 --> 00:07:47,460
‫attribute is history.

72
00:07:47,820 --> 00:07:50,970
‫So if you write your object name.

73
00:07:51,390 --> 00:07:52,980
‫And then write dot history.

74
00:07:54,990 --> 00:08:02,030
‫This will give you all the loss accuracy, validation loss and validation accuracy in the form of

75
00:08:02,040 --> 00:08:02,700
‫dictionary.

76
00:08:04,380 --> 00:08:09,960
‫So this is the lost value on our training set for the thirty epochs

77
00:08:10,260 --> 00:08:15,930
‫Then we have the accuracy value on our training set for the thirty epochs

78
00:08:17,460 --> 00:08:20,770
‫Then we have the validation loss of our thirty epochs

79
00:08:21,330 --> 00:08:26,940
‫And lastly, the validation accuracy for thirty epochs

80
00:08:28,850 --> 00:08:33,380
‫So all the information which you were getting while training your data.

81
00:08:33,560 --> 00:08:38,170
‫You can also access that information by using stree attribute.

82
00:08:39,760 --> 00:08:49,970
‫You can also plot this information to visualize how our accuracy scores are changing with each epoch.

83
00:08:51,050 --> 00:08:55,820
‫So here I am just plotting model history history.

84
00:08:55,910 --> 00:08:57,690
‫The information that we have here.

85
00:08:58,910 --> 00:09:02,360
‫And then we want the grids in our plot.

86
00:09:03,080 --> 00:09:09,770
‫And then we want our Y-axis to be between 0 and 1. If you plot this

87
00:09:09,830 --> 00:09:14,240
‫You will get a graph of this kind on top.

88
00:09:14,990 --> 00:09:18,170
‫We have an orange line of training accuracy.

89
00:09:18,440 --> 00:09:21,590
‫Then we have a red line of validation accuracy.

90
00:09:22,160 --> 00:09:29,450
‫Then we have a green line of validation loss and then a blue line of training loss.

91
00:09:31,120 --> 00:09:39,470
‫If you can see with each epoch the training, accuracy and the valuation, accuracy is increasing and

92
00:09:39,770 --> 00:09:41,360
‫the loss is decreasing.

93
00:09:42,560 --> 00:09:51,140
‫You can also tell that the model has not converge yet as the validation accuracy is still going up and

94
00:09:51,140 --> 00:09:53,660
‫the validation loss is still going low.

95
00:09:55,640 --> 00:10:00,380
‫So for our next try, we should run it for some more epochs.

96
00:10:02,690 --> 00:10:04,880
‫And if you call the fit method again.

97
00:10:06,410 --> 00:10:10,040
‫Keras will continue to train this model where you left off.

98
00:10:11,120 --> 00:10:15,160
‫So that's why if you just run this code again.

99
00:10:15,950 --> 00:10:21,740
‫The kera will train this model for 30 more epochs and you will get graph from here.

100
00:10:23,240 --> 00:10:28,260
‫So try running it for 30 more epochs in the next video.

101
00:10:28,670 --> 00:10:33,110
‫We will learn how to predict values using this model.

102
00:10:33,710 --> 00:10:34,130
‫Thank you.