﻿1
00:00:00,390 --> 00:00:08,370
‫Now, we have a fairly good idea about our X and Y variables, our X variable is present in the form

2
00:00:08,370 --> 00:00:18,600
‫of 2D array of 28 into 28 pixel intensities, where each individual pixel intensity lies between zero

3
00:00:18,750 --> 00:00:20,130
‫and 255.

4
00:00:21,420 --> 00:00:29,730
‫And since we are going to use gradient descent to compile our model, we need to normalize this pixel

5
00:00:29,790 --> 00:00:30,720
‫intensities.

6
00:00:32,010 --> 00:00:39,390
‫By normalizing, I mean, we have to restrict this pixel intensities between zero and one.

7
00:00:41,230 --> 00:00:48,240
‫A very simple way to do this is by dividing all the pixel intensities by 255.

8
00:00:49,660 --> 00:00:57,880
‫So zero will remain zero and 255, which stands for completely white pixel becomes one.

9
00:00:58,880 --> 00:00:59,880
‫And so on.

10
00:01:01,640 --> 00:01:07,940
‫So to normalize, we can just divide our x train full data set by 255.

11
00:01:08,210 --> 00:01:12,050
‫And similarly, we have to normalize our test data set as well.

12
00:01:12,590 --> 00:01:13,790
‫So for tests also.

13
00:01:13,970 --> 00:01:18,160
‫We are dividing all the pixel intensities by 255.

14
00:01:20,380 --> 00:01:27,700
‫This normalization is different from the normalization we generally do for machine learning algorithms.

15
00:01:28,920 --> 00:01:35,880
‫Since here, we know that all these values are on an absolute scale of zero to 255.

16
00:01:36,480 --> 00:01:38,690
‫We can directly divide it by 255.

17
00:01:39,690 --> 00:01:44,310
‫But for general machine learning databases, we don't know the absolute scale.

18
00:01:44,970 --> 00:01:53,040
‫So we generally subtract the mean from these numbers and divide it by their standard deviations.

19
00:01:54,240 --> 00:02:01,170
‫But that process is not needed here, since we know that the pixel intensities lies between zero and

20
00:02:01,170 --> 00:02:01,860
‫255.

21
00:02:02,640 --> 00:02:05,940
‫So here we can directly divide it by two fifty five.

22
00:02:07,050 --> 00:02:11,490
‫And one thing you can notice is that we are not dividing it by 255.

23
00:02:11,610 --> 00:02:14,910
‫We are dividing it by two fifty five point zero.

24
00:02:15,720 --> 00:02:22,950
‫That because we want the final output in the form of floating numbers between zero and one. If we divided it

25
00:02:22,950 --> 00:02:25,830
‫by just integer values of 255.

26
00:02:26,490 --> 00:02:33,630
‫So since the intensities are integer value, there might be some cases with some Python version where

27
00:02:33,630 --> 00:02:39,890
‫we get the output as integer since we want the whole greyscale between zero and one.

28
00:02:40,950 --> 00:02:46,230
‫We have to use two fifty five point zero with recent Python version.

29
00:02:46,350 --> 00:02:52,800
‫You don't have to do this but to make sure that the code is compatible with all other Python versions.

30
00:02:54,080 --> 00:03:00,680
‫It's better to divide  it with a floating number so that the final output is in the form of floating number

31
00:03:00,680 --> 00:03:01,790
‫between zero and one.

32
00:03:05,610 --> 00:03:08,130
‫Just create this.

33
00:03:10,470 --> 00:03:16,970
‫We're calling our normalize data sets, as x underscore train underscore N and X underscore

34
00:03:16,980 --> 00:03:18,190
‫tests underscore N.

35
00:03:21,620 --> 00:03:25,570
‫As I told you earlier, our train dataset is of

36
00:03:25,580 --> 00:03:27,590
‫Six thousand observations.

37
00:03:27,700 --> 00:03:32,330
‫And our test dataset is of another 10000 observations.

38
00:03:33,530 --> 00:03:39,410
‫We will further divide our train dataset into training and validation sets.

39
00:03:40,460 --> 00:03:45,410
‫We will use the first 5000 observations as our validation test.

40
00:03:46,100 --> 00:03:50,210
‫And next, fifty five thousand as our training dataset.

41
00:03:51,810 --> 00:03:57,510
‫So to do that, we can just divide it using this simple operations.

42
00:03:58,430 --> 00:04:04,410
‫We are saving our 0 to 5000 datasets into X validation.

43
00:04:05,340 --> 00:04:09,420
‫And from 5001 to 60000.

44
00:04:10,380 --> 00:04:16,730
‫Into x train, similarly, we have to do this for our y dataset.

45
00:04:16,740 --> 00:04:22,020
‫Also, we are saving first 5000 observations into  X validation.

46
00:04:22,860 --> 00:04:25,470
‫And next 55000 observations.

47
00:04:25,570 --> 00:04:26,980
‫Into y train.

48
00:04:28,170 --> 00:04:30,720
‫And our X test will remain the same.

49
00:04:30,900 --> 00:04:35,610
‫So we are just saving our normalized data into X test data.

50
00:04:38,310 --> 00:04:40,390
‫So just run this.

51
00:04:42,100 --> 00:04:43,550
‫Now we have three datasets.

52
00:04:44,830 --> 00:04:46,980
‫First is the validation set of 5000.

53
00:04:47,920 --> 00:04:51,000
‫Then the training set of 55000.

54
00:04:51,820 --> 00:04:57,280
‫And then an another dataset of 10000 observations in our test dataset.

55
00:05:00,810 --> 00:05:03,810
‫We will be using train dataset to train our model.

56
00:05:04,770 --> 00:05:09,100
‫We will be using validation set to optimize the performance of automotive.

57
00:05:09,900 --> 00:05:15,950
‫And then after tuning all the hyper parameters, we will be using test data set.

58
00:05:17,170 --> 00:05:19,540
‫To evaluate the performance of our model.

59
00:05:20,500 --> 00:05:26,460
‫To view the values of this dataset, you can just call the dataset.

60
00:05:29,720 --> 00:05:33,710
‫You can see now the values are between zero and one.

61
00:05:34,550 --> 00:05:37,250
‫Just look at the first value.

62
00:05:41,880 --> 00:05:45,570
‫Here you can see there are some values which are between zero and one.

63
00:05:46,440 --> 00:05:51,360
‫And now our data isnormalized. In the next lecture

64
00:05:52,140 --> 00:05:58,530
‫We'll look at different methods that are available to create neural network using keras.

65
00:05:58,890 --> 00:05:59,280
‫Thank you.

