1
00:00:00,060 --> 00:00:02,940
Hi and welcome to the lecture on Efficient Detect.

2
00:00:03,360 --> 00:00:06,150
This is an object detection model from Google.

3
00:00:06,480 --> 00:00:07,590
So let's get started.

4
00:00:08,160 --> 00:00:15,810
So recall that efficient net was an effective scaling methodology for coming up with a very good model

5
00:00:15,810 --> 00:00:16,860
for different CNN's.

6
00:00:16,860 --> 00:00:24,270
Depending on the computational requirements we had with scaling up scaling resolution scaling as well

7
00:00:24,270 --> 00:00:25,410
as component scaling.

8
00:00:26,040 --> 00:00:32,760
And a combination of these methods produced a number of efficient net networks that provided very good

9
00:00:32,760 --> 00:00:35,640
accuracy, as well as very good performance.

10
00:00:35,790 --> 00:00:39,870
So they were computationally quite fast, quick to train and quick to inference.

11
00:00:40,920 --> 00:00:46,470
So what the researchers at Google, who are the ones who were behind efficient detect what they wanted

12
00:00:46,470 --> 00:00:53,520
to see was was it possible to build a scalable object detection architecture that offered both high

13
00:00:53,520 --> 00:00:58,380
accuracy and better efficiency across a wide variety of performance constraints?

14
00:00:59,010 --> 00:01:02,880
So firstly, I know it's efficient that that's the official name.

15
00:01:02,890 --> 00:01:08,360
However, it seems quite morbid, so I'm going to see if it should detect for this presentation.

16
00:01:08,370 --> 00:01:09,090
I hope you don't mind.

17
00:01:09,690 --> 00:01:16,830
So Efficient Detect is also a single shot detector, just like Estes or resident internet, which we

18
00:01:16,830 --> 00:01:19,020
haven't discussed much and use of.

19
00:01:19,830 --> 00:01:25,800
It utilizes philosophies from efficient that and has made several improvements, especially to model

20
00:01:25,800 --> 00:01:29,400
scaling multi-skilled features of feature fusion.

21
00:01:31,200 --> 00:01:38,250
So efficient detect solves two main problems the researchers at Google sort to solve the problems.

22
00:01:38,250 --> 00:01:42,480
The first one they wanted to solve was efficient multi-skilled feature fusion.

23
00:01:43,200 --> 00:01:50,280
So in most typical object detectors, feature pyramid networks have become probably the most popular

24
00:01:50,280 --> 00:01:53,040
method of using multi-skilled features together.

25
00:01:53,670 --> 00:01:56,610
However, they simply sum them up without any distinction.

26
00:01:56,970 --> 00:02:01,800
And we don't know, and we all know that not all the features contribute equally to the outfit.

27
00:02:02,310 --> 00:02:03,840
So we needed a better strategy.

28
00:02:04,560 --> 00:02:11,940
Also, one of the other problems was that they wanted to solve model scaling, so most detectors relied

29
00:02:11,940 --> 00:02:16,650
on improving the backbone for improving the accuracy, which in my experience works quite well.

30
00:02:16,710 --> 00:02:22,200
However, the researchers noted that scaling up to the feature network and box class prediction networks

31
00:02:22,680 --> 00:02:26,850
are also quite important because you need to consider the overall network model.

32
00:02:27,330 --> 00:02:32,370
So it also was quite important for accuracy and efficiency concerns.

33
00:02:33,150 --> 00:02:37,830
So efficient detect solve this by using something called a by FPN.

34
00:02:38,250 --> 00:02:40,260
That's a bi feature pyramid network.

35
00:02:40,890 --> 00:02:48,000
And basically it offered multiscale fusion, which aggregated features at several different resolutions.

36
00:02:48,510 --> 00:02:51,840
And you can see this is what future permits do the architecture for it.

37
00:02:52,470 --> 00:02:53,830
This is another one planet.

38
00:02:54,300 --> 00:03:00,270
This is nurse paeans, which launched this on the fly and this system by FPN here.

39
00:03:01,920 --> 00:03:02,760
So you can see.

40
00:03:02,760 --> 00:03:04,290
Let's talk a bit more about this.

41
00:03:04,500 --> 00:03:05,790
So why by FPN?

42
00:03:06,420 --> 00:03:12,980
One of the feature Pyramid Network's limitations was that it was limited to a top down flow of information,

43
00:03:12,990 --> 00:03:14,850
and you can see that here now.

44
00:03:14,850 --> 00:03:21,870
Pundit did it, but did it the top down and a bottom up network aggregation that worked to be precise

45
00:03:22,410 --> 00:03:28,960
enough snipped, which is also the predecessor to efficient detect, basically did a network, a grid

46
00:03:28,980 --> 00:03:34,890
search to find an irregular featured network topology and then repeatedly applied the same block like

47
00:03:34,890 --> 00:03:35,430
this here.

48
00:03:36,210 --> 00:03:42,780
However, inefficient detect they used it to buy FP and network to grid success, which improved better,

49
00:03:42,780 --> 00:03:45,270
which improved accuracy and got better efficiency.

50
00:03:46,110 --> 00:03:52,140
So this is what this is basically an overview of the full efficient detect architecture.

51
00:03:52,650 --> 00:03:57,150
And you can see this is basically two different features that have been aggregated here to the by FP

52
00:03:57,180 --> 00:03:57,540
and layer.

53
00:03:57,540 --> 00:04:01,740
This is one block of it and you have multiple concurrent blocks as well.

54
00:04:02,310 --> 00:04:06,930
And then it goes to the conflicts that do the box prediction as well as the class prediction.

55
00:04:07,680 --> 00:04:10,800
And let's take a look at some of the architecture designs.

56
00:04:10,800 --> 00:04:19,050
You can see the number of layers, number of channels, the backbone C used in efficient net, the use

57
00:04:19,050 --> 00:04:20,940
d zero to D7 to call it.

58
00:04:21,180 --> 00:04:23,130
So it'd be zero to be seven to be six.

59
00:04:23,700 --> 00:04:30,120
But an efficient detect there has to have to call a D as well for detection, so you can see that there.

60
00:04:31,320 --> 00:04:35,940
And now let's take a look at the performance and you can see they don't have you here, unfortunately,

61
00:04:35,940 --> 00:04:36,600
to compare it.

62
00:04:37,110 --> 00:04:43,830
But efficient detect is actually quite good is probably second best to your in my experience, and you

63
00:04:43,830 --> 00:04:51,000
can see it was definitely getting extremely good performance with very low flops even up to the tree

64
00:04:51,210 --> 00:04:52,590
and D4 is quite good.

65
00:04:52,590 --> 00:04:55,020
You can see how much better look at this here.

66
00:04:55,020 --> 00:04:59,780
D4 compares all the way to a Meebo net amiibo that now.

67
00:04:59,900 --> 00:05:05,420
Have been +y and had so much more flops involved in processing that model.

68
00:05:05,690 --> 00:05:08,330
So you can see how good efficient that act really is.

69
00:05:08,630 --> 00:05:10,670
It lives up to its name, efficient.

70
00:05:11,600 --> 00:05:14,680
So this is basically a summary of all the performance here.

71
00:05:14,690 --> 00:05:21,320
You can see the different speed ups when you compare it to similar networks, and you can see basically

72
00:05:22,130 --> 00:05:26,450
how impressive the map scores are, especially for efficient detect D7.

73
00:05:27,050 --> 00:05:30,680
They got 51 on a map score, which is quite good, actually.

74
00:05:31,340 --> 00:05:38,150
So in summary, I'll say efficient detect is a very good, very useful model that in my experience,

75
00:05:38,150 --> 00:05:41,960
works almost or probably even better than yellow in some situations.

76
00:05:42,560 --> 00:05:50,270
Next, we'll take a look at the tech, too, which is basically entire framework from Facebook and offers

77
00:05:50,270 --> 00:05:52,940
a number of different features, not just object detection.

78
00:05:53,360 --> 00:05:55,940
So we'll take a look at detection too shortly.

79
00:05:55,940 --> 00:05:57,470
So I'll see you in the next section.

80
00:05:57,560 --> 00:05:58,040
Thank you.
