1
00:00:03,760 --> 00:00:10,450
Hi and welcome to the section or chapter where we take a look at convolution, how the convolution operation

2
00:00:10,780 --> 00:00:12,280
works on color images.

3
00:00:12,970 --> 00:00:14,330
So let's take a look at this.

4
00:00:14,350 --> 00:00:17,680
Let's go back to our gray scale scenario.

5
00:00:17,980 --> 00:00:19,770
Remember, we built the edge detector here.

6
00:00:19,780 --> 00:00:21,490
This is our edge detector, colonel.

7
00:00:21,910 --> 00:00:26,530
This is our grayscale two dimensional image, and this is a feature of output.

8
00:00:26,650 --> 00:00:29,650
And you can see it's a fairly simple, straightforward calculation.

9
00:00:30,190 --> 00:00:35,530
So a lot of times beginners have trouble understanding when there's tree dimensions here, as you can

10
00:00:35,530 --> 00:00:42,310
see in the slide now, tree components how and we do have tree kernels one for each color.

11
00:00:42,360 --> 00:00:47,770
I'll tell you why that's important for people are often confused with why is the output two dimensional?

12
00:00:48,280 --> 00:00:50,890
And the reason the output is still two dimensional here.

13
00:00:50,890 --> 00:00:56,500
The feature map is because we take the colors here and we sum them up.

14
00:00:57,100 --> 00:00:58,590
So you can see this.

15
00:00:58,590 --> 00:01:00,340
A one is not too visible here.

16
00:01:02,020 --> 00:01:08,610
So you can see it's one plus one plus one gives this tree and we just some each each.

17
00:01:09,580 --> 00:01:15,940
Think of it like a like a hole that goes through each of these, these color components, and you can

18
00:01:15,940 --> 00:01:19,960
just some through each cell here so that each other corresponds to the same location.

19
00:01:20,360 --> 00:01:21,340
It just keeps summing it.

20
00:01:21,340 --> 00:01:27,580
And that's how you get the output can also that now the output, you know, is still a two by a tree

21
00:01:27,580 --> 00:01:31,730
by tree, two dimensional image, but not not a three dimensional image.

22
00:01:31,750 --> 00:01:33,490
That's what important point is here.

23
00:01:34,390 --> 00:01:39,180
So what are the advantages of having color components as your filters?

24
00:01:39,730 --> 00:01:43,270
Now, so imagine we're looking for a red, a stop sign.

25
00:01:43,630 --> 00:01:48,940
But however, imagine this there's a similar sign that screen and a green stop sign.

26
00:01:49,490 --> 00:01:53,950
The way we detect colors is by having these different color coded notes here.

27
00:01:54,310 --> 00:01:57,740
So one of these candidates could be corresponding to that red stop sign.

28
00:01:58,090 --> 00:02:03,490
The other one could be corresponding could be a filter that's meant to detect green signs and blue signs

29
00:02:03,490 --> 00:02:04,750
and different color combinations.

30
00:02:05,080 --> 00:02:10,480
So that's why it's quite important to have it each offensive, each color component.

31
00:02:11,500 --> 00:02:19,060
So this is how we treating volumes work, and we're going to take a look at basically how it works when

32
00:02:19,060 --> 00:02:20,530
we have multiple filters.

33
00:02:20,980 --> 00:02:24,250
So you previously so examples where we just had one filter.

34
00:02:24,250 --> 00:02:30,340
However, in reality, as I said, CNN's tend to have many, many filters that can have hundreds, even

35
00:02:30,340 --> 00:02:32,470
thousands, although that's a bit excessive.

36
00:02:32,770 --> 00:02:39,610
Most in most cases, it has like between 64 and 256 filters for most simple classifiers.

37
00:02:40,420 --> 00:02:41,690
So let's take a look at this.

38
00:02:41,710 --> 00:02:46,930
This is now with two filters, and you can see we have a five by five by tree image here.

39
00:02:46,960 --> 00:02:51,250
Remember, this is a color component of each image had a red, green and blue RGV.

40
00:02:51,730 --> 00:02:53,410
And this here is the pixel.

41
00:02:53,680 --> 00:03:00,550
So remember, each pixel point does zero by zero top left pixel has three color components associated

42
00:03:00,550 --> 00:03:00,910
with it.

43
00:03:01,300 --> 00:03:06,460
That's why it's a five by five by tree grid of volume right now.

44
00:03:07,300 --> 00:03:09,100
And here we have.

45
00:03:09,100 --> 00:03:12,520
We're using two filters now instead of one, like, you know, previous examples.

46
00:03:12,970 --> 00:03:17,770
So it's a tree by tree, by tree by two, because it's two filters now.

47
00:03:18,400 --> 00:03:22,210
And this allows us now to produce two feature maps instead of one.

48
00:03:22,720 --> 00:03:28,420
So you can see now the future maps are directly related to the number of future maps is directly related

49
00:03:28,420 --> 00:03:30,610
to the number of keynotes here.

50
00:03:31,780 --> 00:03:34,930
So this is how you actually calculate the feature map volume here.

51
00:03:35,410 --> 00:03:39,520
You can see end by end by see and see which is the in depth.

52
00:03:39,880 --> 00:03:41,380
So it's five by five by tree.

53
00:03:41,920 --> 00:03:43,750
This one here is ftf.

54
00:03:43,750 --> 00:03:45,940
That's a feature or this sort of filter.

55
00:03:46,570 --> 00:03:50,470
So you have tree by tree and we have tree depth again here.

56
00:03:51,430 --> 00:03:56,020
And this is what the of the output feature size is going to be.

57
00:03:56,530 --> 00:04:01,600
It's going to be an minus f plus one, which is tree.

58
00:04:02,230 --> 00:04:10,130
Similarly, Tree Hill multiplied by an F, which is two, and what is NF and F is a number of filters.

59
00:04:10,150 --> 00:04:16,600
So that's basically a mathematical way of just showing how these sizes are mapped back to the output

60
00:04:16,600 --> 00:04:17,320
of the feature map.

61
00:04:17,920 --> 00:04:20,280
It's not that complicated, it's just way too.

62
00:04:20,290 --> 00:04:23,110
It just it's just a way to establish some guidelines.

63
00:04:23,110 --> 00:04:28,000
So when you're actually calculating what your future map should be, you have a formula to work with.

64
00:04:28,210 --> 00:04:30,820
This is this formula right here at the end of it here.

65
00:04:31,360 --> 00:04:39,220
So that concludes this chapter on how you operate convolutions on treaty images or color images.

66
00:04:39,730 --> 00:04:44,230
Next, we'll take a look at kernel size, which is a filter size and depth.

67
00:04:44,470 --> 00:04:45,700
So stay tuned for that.

68
00:04:45,880 --> 00:04:46,330
Thank you.