1
00:00:00,400 --> 00:00:08,550
‫Now we have a fairly good idea about our x and y variables our x variable is present in the form of

2
00:00:08,580 --> 00:00:20,900
‫2D array of 28 and 220 pixel intensities where each individual pixel intensity lies between 0 and 255.

3
00:00:21,420 --> 00:00:29,730
‫And since we are going to use gradient descent to compile our model we need to normalize this pixel

4
00:00:29,840 --> 00:00:41,650
‫intensities by normalizing I mean we have to restrict this pixel intensities between 0 and 1 a very

5
00:00:41,650 --> 00:00:49,660
‫simple way to do this is by dividing all the pixel intensities by 255.

6
00:00:49,660 --> 00:00:58,630
‫So Zillow will remain 0 and 255 which stands for completely white pixel becomes 1.

7
00:00:58,870 --> 00:00:59,890
‫And so on.

8
00:01:01,640 --> 00:01:08,240
‫So to normalize we can just be very low at X strength for let's say by 255.

9
00:01:08,240 --> 00:01:12,590
‫And similarly we have to normalize our test data set as well.

10
00:01:12,590 --> 00:01:22,960
‫So for tests also we are dividing all the pixel intensities by 255 this normalization is different from

11
00:01:23,350 --> 00:01:31,800
‫the normalization we generally do for machine learning algorithms since here we know that all these

12
00:01:31,800 --> 00:01:36,430
‫values are on an absolute scale of 0 to 255.

13
00:01:36,480 --> 00:01:43,970
‫We can radically divided by 255 but for the general machine learning databases we don't know the absolute

14
00:01:43,980 --> 00:01:44,370
‫scale.

15
00:01:44,970 --> 00:01:54,870
‫So we generally subtract the mean from these numbers and divided by their standard deviations but that

16
00:01:54,870 --> 00:01:56,820
‫process is not needed here.

17
00:01:56,910 --> 00:02:05,190
‫Since we know that the pixel in densities lies between 0 and 255 so here we can directly divide it by

18
00:02:05,250 --> 00:02:06,620
‫255.

19
00:02:07,050 --> 00:02:11,580
‫And one thing you can notice is that we are not dividing it by 255.

20
00:02:11,610 --> 00:02:15,370
‫We are dividing it by two fifty five point zero.

21
00:02:15,720 --> 00:02:22,110
‫That because we want the final output in the form of floating numbers between 0 and 1.

22
00:02:22,180 --> 00:02:26,440
‫If we divide it by just integer values of 255.

23
00:02:26,490 --> 00:02:33,630
‫So since the intensities are integer value there might be some cases with some python version where

24
00:02:33,630 --> 00:02:40,850
‫we get the output as integer since we won the whole grade scale between 0 and 1.

25
00:02:40,950 --> 00:02:46,350
‫We have to use to fifty five point zero with three cent python version.

26
00:02:46,350 --> 00:02:52,770
‫You don't have to do this but to make sure that the code is compatible with all other Python versions

27
00:02:54,150 --> 00:03:00,680
‫it's better to do it with a floating number so that the final output is in the form of floating number

28
00:03:00,680 --> 00:03:05,470
‫between 0 and 1.

29
00:03:05,610 --> 00:03:16,930
‫Just figured this we're calling over normalized datasets as Xander screen underscore and an X underscore

30
00:03:16,980 --> 00:03:18,180
‫tests underscore and

31
00:03:21,620 --> 00:03:30,830
‫as I told you earlier our trained dataset is of 6000 observations and no test dataset is of another

32
00:03:30,830 --> 00:03:33,530
‫10000 observations.

33
00:03:33,530 --> 00:03:41,810
‫We will further divide our green data set and two screening and validation sets we will use the first

34
00:03:41,810 --> 00:03:45,790
‫5000 observations as our validation test.

35
00:03:46,100 --> 00:03:50,230
‫And next five posing as a training dataset.

36
00:03:51,840 --> 00:03:55,680
‫So to do that we can just do.

37
00:03:55,800 --> 00:04:04,890
‫Using this simple operations we are saving over 0 to 5000 data sets and 2 x validation.

38
00:04:05,340 --> 00:04:12,230
‫And from five thousand one to 60000 and too extreme.

39
00:04:13,260 --> 00:04:20,820
‫Similarly we have to do this for a world wide dataset also we are saving first 5000 observations and

40
00:04:20,820 --> 00:04:22,540
‫2 x validation.

41
00:04:22,860 --> 00:04:27,040
‫And next fifty five thousand observations into victory.

42
00:04:28,170 --> 00:04:30,860
‫And our x test will remain the same.

43
00:04:30,900 --> 00:04:42,050
‫So we are just saving our normalized data and do X test data so just run this.

44
00:04:42,080 --> 00:04:44,790
‫Now we have three datasets.

45
00:04:44,810 --> 00:04:47,720
‫First is the validation set of 5000.

46
00:04:47,900 --> 00:04:51,140
‫Then the training set of fifty five thousand.

47
00:04:51,830 --> 00:04:57,320
‫And then add another dataset of 10000 observations in our has dataset

48
00:05:00,810 --> 00:05:04,630
‫we will be using green data set to train our model.

49
00:05:04,770 --> 00:05:09,770
‫We will be using validation set to optimize the performance of our model.

50
00:05:09,900 --> 00:05:18,940
‫And then after tuning all the hyper parameters we will be using test data set to evaluate the performance

51
00:05:18,940 --> 00:05:23,370
‫of automotive to view the values of this dataset.

52
00:05:23,410 --> 00:05:26,380
‫You can just call the data.

53
00:05:29,720 --> 00:05:34,470
‫You can see now the values are between 0 and 1.

54
00:05:34,550 --> 00:05:37,240
‫Just look at the first when you

55
00:05:41,910 --> 00:05:50,580
‫here you can see there are some values which are between 0 and 1 and now what it has normalized in the

56
00:05:50,580 --> 00:05:58,840
‫next lecture will look at different methods that are available to create neural network using kid us.

57
00:05:58,890 --> 00:05:59,250
‫Thank you.