1
00:00:00,480 --> 00:00:04,920
Now, let's take a look at the Inception network has a very cool name, doesn't it?

2
00:00:04,980 --> 00:00:07,200
It's also called Google in that architecture.

3
00:00:07,740 --> 00:00:11,490
So but it's more popularly known as the Inception network.

4
00:00:12,120 --> 00:00:14,730
So let's take a look at what problem this solved.

5
00:00:14,820 --> 00:00:19,800
So as you've seen before, it's a lot of parameter tweaking involved and CNN's, isn't it?

6
00:00:20,310 --> 00:00:25,170
We have things like filter sizes, stride padding, fully connected layers, sizes.

7
00:00:25,560 --> 00:00:29,220
So inception aim to solve the first part of this problem.

8
00:00:29,580 --> 00:00:33,930
The filter size problem because remember, you can use tree by tree five by five, seven by seven,

9
00:00:34,380 --> 00:00:36,510
depending on what type of features in your image.

10
00:00:36,510 --> 00:00:38,160
But thing is, you don't know.

11
00:00:38,550 --> 00:00:43,290
So Inception basically proposed a solution that looks like this.

12
00:00:43,840 --> 00:00:50,340
So introduced in bay in 2014 by Google, it would achieve the state of the art performance on the image

13
00:00:50,340 --> 00:00:54,360
and data set on in the 2014 to 2014 challenge.

14
00:00:54,990 --> 00:00:56,370
So how did it do it?

15
00:00:56,580 --> 00:00:59,730
Basically, it would use something called the inception model here.

16
00:01:00,060 --> 00:01:05,610
There's no vision here, and that is inception module with dimension reductions, which is quite important

17
00:01:05,610 --> 00:01:05,970
as well.

18
00:01:06,030 --> 00:01:06,810
We'll discuss that.

19
00:01:06,870 --> 00:01:07,290
No.

20
00:01:07,710 --> 00:01:11,310
So how does Inception use multiple filters?

21
00:01:11,340 --> 00:01:12,510
Well, this is what it does.

22
00:01:12,990 --> 00:01:16,710
It takes applies multiple filters through the input image here.

23
00:01:17,160 --> 00:01:23,790
Maintaining a straight and putting of one does to get maintain consistency, and then it produces this

24
00:01:23,790 --> 00:01:25,950
final output of feature maps here.

25
00:01:25,950 --> 00:01:30,900
But it's basically a block of feature maps at this point, so you can have different number of sizes

26
00:01:30,900 --> 00:01:31,470
as well.

27
00:01:31,770 --> 00:01:36,790
So we get a final output of 28 by 28 by 256 in this case here.

28
00:01:37,710 --> 00:01:44,250
So you can see we're using multiple multiple filter sizes here, but there's a problem with this.

29
00:01:44,850 --> 00:01:46,440
So what's the problem?

30
00:01:47,070 --> 00:01:48,300
The problem is computation.

31
00:01:48,300 --> 00:01:51,720
To do something like this is very computationally expensive.

32
00:01:51,720 --> 00:01:56,040
It's a hundred and twenty million computations required to consider that right.

33
00:01:56,250 --> 00:01:57,000
That explains it.

34
00:01:57,900 --> 00:01:59,630
So how do we reduce this?

35
00:01:59,640 --> 00:02:03,630
Well, by using one by one convolutions to reduce the competition course?

36
00:02:04,200 --> 00:02:07,800
So remember in mobile that we use something called point ways convolutions.

37
00:02:08,250 --> 00:02:10,110
Well, it's a similar concept here, actually.

38
00:02:10,260 --> 00:02:16,560
We use something called a bottleneck lelo to implement it a bit differently so that one by one convolution

39
00:02:16,560 --> 00:02:22,380
block now is between the initial the previous conflict here.

40
00:02:22,860 --> 00:02:24,180
This is defined by five one.

41
00:02:24,600 --> 00:02:28,260
So it's not a point of filter point in the middle at bottlenecks it.

42
00:02:28,830 --> 00:02:35,670
And this greatly reduces to calculations here because no, at this kind block, we have 2.4 million

43
00:02:35,670 --> 00:02:36,360
computations.

44
00:02:36,870 --> 00:02:42,900
And then in this second conflict, because of the output that it produces, we get ten million calculations

45
00:02:42,930 --> 00:02:44,100
or computations.

46
00:02:44,700 --> 00:02:49,230
So we shrink the representation and then increase the size afterward by doing this.

47
00:02:49,650 --> 00:02:52,320
And this gives us a 10x computation cost.

48
00:02:52,860 --> 00:02:54,120
This is what it looks like here.

49
00:02:54,330 --> 00:02:57,480
You can see they have it's assumed as the previous activation here.

50
00:02:57,570 --> 00:03:00,180
So this is the feature maps of some previous layer.

51
00:03:00,750 --> 00:03:05,690
We have the one by one, one by one convolution here going into the five by five entry by tree.

52
00:03:05,700 --> 00:03:08,460
We don't have it here because it's already a one by one convolution.

53
00:03:09,000 --> 00:03:11,280
And then we concatenate all the outputs here.

54
00:03:11,940 --> 00:03:16,980
We have a are playing max maximal here as well and pass it to a another one by one convolution.

55
00:03:17,460 --> 00:03:22,860
This gives us the full size that we expected before that size was this size here.

56
00:03:22,980 --> 00:03:24,930
Twenty eight by twenty by 256.

57
00:03:26,280 --> 00:03:27,180
And then we go.

58
00:03:27,390 --> 00:03:31,050
That's how we actually saw the inception block looks like.

59
00:03:31,560 --> 00:03:37,380
And then ception network design basically has a bunch of these inception blocks, as you can see here

60
00:03:37,800 --> 00:03:38,310
over and over.

61
00:03:38,310 --> 00:03:39,810
This is what I've circled here and read.

62
00:03:40,500 --> 00:03:42,240
And what are these hidden pink?

63
00:03:42,250 --> 00:03:49,920
Though these are side branch, these side branches basically apply a regularization effect to the network,

64
00:03:50,430 --> 00:03:51,950
so it helps inception as well.

65
00:03:51,960 --> 00:03:53,010
So it's it's quite good.

66
00:03:53,430 --> 00:03:59,970
And there are many tweaks in this network to variations basically including the improvement of the evolution

67
00:03:59,970 --> 00:04:01,200
of the inception network.

68
00:04:01,200 --> 00:04:04,560
It went through a very fast evolution, in my opinion.

69
00:04:04,560 --> 00:04:11,390
But then again, these things happen in the deep learning world is when when you release a paper and

70
00:04:11,390 --> 00:04:16,230
then other authors review it and give you give you feedback, you can make some quick improvements,

71
00:04:16,230 --> 00:04:21,390
which is what most likely happened from Inception version one, two and three to one we looked at in

72
00:04:21,390 --> 00:04:23,250
this slide was actually a vision tree.

73
00:04:23,820 --> 00:04:28,710
And there's also a vision for which includes a resonant module inside of it.

74
00:04:29,100 --> 00:04:34,740
So remember we did residents before where we have the Short-Circuit module that was basically incorporated

75
00:04:34,740 --> 00:04:36,240
in Inception version four.

76
00:04:36,600 --> 00:04:38,640
And you can take a look at the history of it here.

77
00:04:39,360 --> 00:04:42,450
This is an interesting link by this guy in the cool of the league.

78
00:04:42,810 --> 00:04:48,390
He has a he's one of the Google Guys who published who is of the publication of Inception.

79
00:04:48,930 --> 00:04:51,630
And here's a fun fact Why is it called inception?

80
00:04:51,660 --> 00:04:58,330
Well, remember Christopher Nolan's movie with Leonardo DiCaprio called Inception that came out in 2013

81
00:04:58,330 --> 00:04:59,160
and 2014?

82
00:05:00,500 --> 00:05:05,780
Remember, it is a fun meme, and it actually was cited in the paper where we were talking about dreams,

83
00:05:05,780 --> 00:05:07,730
and he was saying, we need to go deeper.

84
00:05:08,360 --> 00:05:14,090
That's basically analogous to performance back then of CNN's where everyone was saying, If we want

85
00:05:14,090 --> 00:05:16,790
to get better performance, you just have to make deeper networks.

86
00:05:17,180 --> 00:05:20,630
So Inception basically took that and made fun of it.

87
00:05:20,960 --> 00:05:24,950
But Inception does solve some of the problems with going deeper, just like resonance.

88
00:05:25,340 --> 00:05:27,670
So that's why it was such a popular network, too.

89
00:05:28,550 --> 00:05:29,420
So we'll stop there.

90
00:05:29,930 --> 00:05:37,280
And next one, take a look at Squeeze It, which is a mobile, a little high efficiency, low size,

91
00:05:37,280 --> 00:05:41,630
little parameters CNN meant for embedded devices and mobile devices.

92
00:05:42,080 --> 00:05:44,240
So I'll see you in that section shortly.

93
00:05:44,270 --> 00:05:44,690
Thank you.
