1
00:00:00,440 --> 00:00:05,430
‫Now, we are going to start with a complete end to end project. In this project.

2
00:00:06,240 --> 00:00:11,310
‫We will try to classify colored images of cats and dogs.

3
00:00:14,070 --> 00:00:21,690
‫So if take up dataset from Cegal, Cegal is a website where a lot of data science competitions are

4
00:00:21,690 --> 00:00:22,200
‫being held.

5
00:00:23,040 --> 00:00:29,340
‫There was a competition which was held in 2003 in which thousands of images of cats and dogs were given

6
00:00:30,060 --> 00:00:35,280
‫and a model was to be built to classify those images into cats and dogs.

7
00:00:37,650 --> 00:00:42,030
‫The best accuracy achieved in that competition was nearly ninety eight percent.

8
00:00:43,650 --> 00:00:51,480
‫We are going to use a subset of that data and try to build our model and will try to achieve over

9
00:00:51,480 --> 00:00:53,580
‫90 percent accuracy with our model.

10
00:00:57,260 --> 00:00:59,480
‫Here are some of the details of this project.

11
00:01:01,310 --> 00:01:08,720
‫This is a binary classification problem, unlike fashion amnesty in which there were 10 categories to

12
00:01:08,720 --> 00:01:09,350
‫be predicted.

13
00:01:10,370 --> 00:01:11,630
‫Here we have only two.

14
00:01:12,410 --> 00:01:15,980
‫Either that image is of a cat or it is of a dog.

15
00:01:18,050 --> 00:01:19,220
‫So only two classes.

16
00:01:19,400 --> 00:01:21,710
‫That is why it is a binary classification problem.

17
00:01:23,720 --> 00:01:26,300
‫Then this is a data set of coloured images.

18
00:01:27,440 --> 00:01:32,960
‫That is, we will have three channels R G and B instead of only one channel.

19
00:01:33,290 --> 00:01:36,740
‫As we have amnesty data set

20
00:01:37,490 --> 00:01:40,880
‫We do not have a standard dimension of all these images.

21
00:01:42,680 --> 00:01:48,350
‫As you saw in the previous project, we were using 28 by 28 pixel images.

22
00:01:49,340 --> 00:01:53,350
‫But here are dataset does not have one standard dimension.

23
00:01:54,680 --> 00:02:01,810
‫So when we are feeding the data to our model, we will have to convert the images to one standard dimension.

24
00:02:02,630 --> 00:02:04,040
‫So that is one additional step.

25
00:02:07,010 --> 00:02:08,580
‫Then we are using a Caggle dataset.

26
00:02:09,320 --> 00:02:16,070
‫If you are interested, you can go to the Caggle website and see this cat versus dog competition.

27
00:02:18,050 --> 00:02:19,790
‫You can also see the leaderboard there.

28
00:02:20,390 --> 00:02:22,100
‫How much accuracy people have achieved.

29
00:02:23,270 --> 00:02:26,630
‫And you can compare your model with other people's model.

30
00:02:28,550 --> 00:02:32,180
‫And the last point is we are going to use a subset of the total data.

31
00:02:32,780 --> 00:02:37,520
‫The total data had over 50000 images in our model.

32
00:02:37,670 --> 00:02:48,080
‫We are going to use only 4000 images, 2000 to train, 1000 for validation dataset and 1000 for

33
00:02:48,080 --> 00:02:48,500
‫testing.

34
00:02:51,290 --> 00:02:58,490
‫So using only this small part of the data, we are still going to achieve accuracy's, which are comparable

35
00:02:58,850 --> 00:03:01,760
‫to the other models built by people in the competition.

36
00:03:05,700 --> 00:03:07,990
‫So here is how we have structured the data.

37
00:03:10,200 --> 00:03:19,890
‫These zip file that you download from the link that we have provided has 4000 images and those images

38
00:03:20,490 --> 00:03:22,050
‫are structured in this format.

39
00:03:23,820 --> 00:03:27,060
‫So the first folder will have three folders inside of it.

40
00:03:28,920 --> 00:03:33,180
‫These three folders will be train, tighter, valid and tested

41
00:03:34,800 --> 00:03:38,760
‫The train folder will further have two folders.

42
00:03:39,420 --> 00:03:41,730
‫These folders will be cats and dogs.

43
00:03:42,480 --> 00:03:46,890
‫So Class A here is cat and Class B is dogs.

44
00:03:48,330 --> 00:03:51,420
‫And this folder, we will have thousand images of cats.

45
00:03:52,380 --> 00:03:55,020
‫And in this folder, we will have thousand images of dogs.

46
00:03:56,730 --> 00:04:03,780
‫Similarly, in validation dataset, there'll be two folders, one containing 500 images of cats, the

47
00:04:03,780 --> 00:04:08,470
‫other containing 500 images of dogs in the testing dataset.

48
00:04:08,820 --> 00:04:10,890
‫We'll have 1000 images.

49
00:04:11,490 --> 00:04:16,530
‫So in total, there are 4000 images, 2000 will be used for training.

50
00:04:16,530 --> 00:04:21,280
‫the Model 1000 will be used for validation set.

51
00:04:22,110 --> 00:04:27,460
‫And the last 1000 images will be used for testing the accuracy on previously unseen data.

52
00:04:31,870 --> 00:04:38,320
‫So the process we are going to follow while building this project is this first we will be creating

53
00:04:38,440 --> 00:04:43,480
‫a CNN model with four convolutional layers.

54
00:04:44,140 --> 00:04:48,580
‫So it will have four different convolutional
‫layers paired with pooling layers

55
00:04:50,140 --> 00:04:57,340
‫And this model will be able to achieve accuracy in the range of 70 to 75 percent.

56
00:04:57,760 --> 00:04:59,440
‫I'm talking about validation accuracy here.

57
00:05:01,300 --> 00:05:04,990
‫So this model will be able to achieve somewhere between 70 to 75.

58
00:05:07,000 --> 00:05:16,450
‫Then because we have a small dataset, we can improve the performance of our model by doing data augmentation.

59
00:05:17,810 --> 00:05:24,130
‫Data augmentation is the process of creating artificial images using these small dataset that you have.

60
00:05:25,780 --> 00:05:29,020
‫So in the second step, we will augment our data.

61
00:05:29,380 --> 00:05:37,360
‫And then run our model again, for example, if you have this image of a cat, you can create a new

62
00:05:37,360 --> 00:05:44,770
‫image by zooming in a small part of this image, or you can create a new image by rotating this image

63
00:05:44,770 --> 00:05:45,460
‫of a cat.

64
00:05:47,110 --> 00:05:54,010
‫And there are many more transformations that you can do to this image to create a similar image of a

65
00:05:54,010 --> 00:05:56,230
‫cat using an existing image.

66
00:05:57,760 --> 00:06:02,920
‫So using one image, you'll be able to create multiple images just by transforming the image a little

67
00:06:02,920 --> 00:06:03,160
‫bit.

68
00:06:03,920 --> 00:06:10,060
‫Transformations include linear transformations, rotations, zooming in, zooming out, etc..

69
00:06:11,740 --> 00:06:19,020
‫So after you do this and you run the model again, you'll be able to achieve an accuracy over 80 percent.

70
00:06:22,260 --> 00:06:30,240
‫Lastly, we'll use one of the architectures that we have discussed previously, and we will try to implement

71
00:06:30,270 --> 00:06:39,270
‫those pre-learnt architectures to try to classify this dog vs cat dataset, using that Pre trained architecture

72
00:06:39,810 --> 00:06:42,720
‫will be able to achieve an accuracy over 90 percent.

73
00:06:45,600 --> 00:06:55,140
‫So after this project, you'll have understanding of how to import images, how to run binary or multiclass

74
00:06:55,140 --> 00:07:03,480
‫classification using CNN and how to use Pre trained architectures to solve the problem that you have with

75
00:07:03,480 --> 00:07:03,630
‫you.