1
00:00:00,550 --> 00:00:06,520
Welcome back to the course in this lesson, we'll take a look at efficient net, and I'll show you how

2
00:00:06,520 --> 00:00:09,460
efficient it improved accuracy and efficiency.

3
00:00:09,790 --> 00:00:10,100
True.

4
00:00:10,120 --> 00:00:13,030
Google's auto ml and model scaling techniques.

5
00:00:13,420 --> 00:00:14,410
So let's get started.

6
00:00:15,220 --> 00:00:16,150
So efficient.

7
00:00:16,150 --> 00:00:24,700
That was introduced in 2019 by researchers at Google and the main motivation behind Efficient that was

8
00:00:24,700 --> 00:00:30,160
that the researchers wanted to figure out a way how do we optimize the architecture of our networks

9
00:00:30,160 --> 00:00:34,990
because CNN's are typically designed at a fixed resource cost and then it scaled up?

10
00:00:35,470 --> 00:00:37,480
This isn't the best way of doing this.

11
00:00:37,480 --> 00:00:40,600
You actually can waste a lot of time and efficiency.

12
00:00:41,050 --> 00:00:46,270
So we need to figure out things like how do we increase steps like number of layers or wit number of

13
00:00:46,270 --> 00:00:46,780
filters?

14
00:00:47,140 --> 00:00:48,950
How do these things affect our accuracy?

15
00:00:48,970 --> 00:00:55,810
How do we design an experiment that optimizes the architecture of our CNN's so that we get the best

16
00:00:55,810 --> 00:00:56,470
results?

17
00:00:56,980 --> 00:01:03,190
So the posited that there must be a principled method of scaling up CNN's.

18
00:01:03,730 --> 00:01:06,010
So that's what efficient net does.

19
00:01:06,460 --> 00:01:13,600
This method uniformly scales each dimension with depth to resolution, width to fix out of scaling coefficients.

20
00:01:13,630 --> 00:01:19,000
So imagine it's looking at like a grid search and a searching for the best parameters in optimizing

21
00:01:19,000 --> 00:01:19,690
the architecture.

22
00:01:20,350 --> 00:01:25,090
It utilizes Google's new AutoML, which uses efficient net behind the scenes.

23
00:01:25,540 --> 00:01:31,480
It's also a way of optimizing to provinces of any existing CNN.

24
00:01:31,630 --> 00:01:39,880
So it's it's a very, very cool way to get better performance from your custom CNN or another CNN you

25
00:01:39,880 --> 00:01:41,410
wish to use on your dataset.

26
00:01:42,190 --> 00:01:48,520
So they were able to surpass the best state of the art networks and achieve a 10x better efficiency,

27
00:01:48,520 --> 00:01:49,540
which is remarkable.

28
00:01:50,710 --> 00:01:52,180
So let's talk about scaling.

29
00:01:52,300 --> 00:01:58,020
So researchers systematically studied the effects of scaling up different dimensions and those dimensions

30
00:01:58,030 --> 00:01:59,380
I'll discuss in the next slide.

31
00:01:59,800 --> 00:02:06,010
And it was found that balancing scaling in all dimensions actually resulted in the best overall performance,

32
00:02:06,580 --> 00:02:08,230
which intuitively makes sense.

33
00:02:08,230 --> 00:02:13,660
But you wouldn't have known this unless you did the experiments and figured that figured out in a principled

34
00:02:13,660 --> 00:02:16,030
way to scale these networks.

35
00:02:17,800 --> 00:02:23,380
So how does efficient net actually determine the best parameter, the best CNN design?

36
00:02:23,920 --> 00:02:26,680
So the methodology is actually quite simple.

37
00:02:27,160 --> 00:02:33,280
What we effectively are doing here is a grid search, and what we do is that we fix the other constraints

38
00:02:33,280 --> 00:02:33,460
here.

39
00:02:33,460 --> 00:02:35,200
So imagine we have these four constraints.

40
00:02:35,200 --> 00:02:40,630
We're looking at number of channels, which is like the depth of the image color of grayscale, the

41
00:02:40,630 --> 00:02:41,080
width.

42
00:02:41,080 --> 00:02:42,310
That's a number of filters.

43
00:02:42,640 --> 00:02:47,920
The depth, which is a number of layers and a resolution scaling, is the input resolution because typically

44
00:02:47,920 --> 00:02:50,230
we use quite small input resolution images.

45
00:02:50,530 --> 00:02:53,810
However, we can experiment by scaling that parameter as well.

46
00:02:54,580 --> 00:03:00,640
So by doing this under a fixed resource constraint and they use flops as the computational resource

47
00:03:00,790 --> 00:03:06,520
resource constraint, this allows us to determine the best or most appropriate parameter for each of

48
00:03:06,520 --> 00:03:07,180
these four.

49
00:03:07,690 --> 00:03:12,520
And then once we do that well, once we do the channels and width, we can actually start scaling up

50
00:03:12,520 --> 00:03:21,010
depth and resolution as well to get the most appropriate or best performing CNN for this specific computational

51
00:03:21,010 --> 00:03:21,610
constraint.

52
00:03:23,410 --> 00:03:30,040
So let's talk about this efficient that architecture now, firstly, as you can see, efficient net

53
00:03:30,130 --> 00:03:31,240
follows a principle.

54
00:03:31,960 --> 00:03:35,200
It's a principle of optimizing the CNN architecture.

55
00:03:35,770 --> 00:03:40,480
However, it depends heavily on the baseline network, which means it doesn't always work, but it worked

56
00:03:40,480 --> 00:03:43,420
fairly well with mobile net and resonate.

57
00:03:43,990 --> 00:03:50,590
And what did Google researchers did was that they allowed efficient, that efficient that principals

58
00:03:51,070 --> 00:03:57,100
to develop basically a new CNN architecture that optimizes both accuracy and efficiency.

59
00:03:57,100 --> 00:04:02,430
And this comes out of Google's utter emails amnesty framework, which was a predecessor for efficient

60
00:04:02,440 --> 00:04:03,370
that amnesty.

61
00:04:03,910 --> 00:04:09,280
Many of you may have heard of that on this if you're in research into research or have looked into Google's

62
00:04:09,280 --> 00:04:09,940
auto email.

63
00:04:10,540 --> 00:04:17,140
So this this new this new architecture design was actually able to develop new networks.

64
00:04:17,590 --> 00:04:25,060
And that's that's what led to mobile night Vision two, which was slightly larger but performs very

65
00:04:25,060 --> 00:04:25,330
well.

66
00:04:26,350 --> 00:04:28,960
So this here is one of the efficient that architectures.

67
00:04:28,960 --> 00:04:33,520
And if there's as many efficient nets it actually you'll see in the next slide here.

68
00:04:33,520 --> 00:04:37,750
This performance is be zero to be seven and is actually a lot of other combinations.

69
00:04:37,750 --> 00:04:45,100
But researchers initially started with these seven networks and you can see they perform quite well.

70
00:04:45,490 --> 00:04:47,980
This is the performance on the y axis here.

71
00:04:48,430 --> 00:04:53,170
You can see the image now, top point accuracy, and you can see if you should not be seven, which

72
00:04:53,170 --> 00:05:01,450
is the deepest backbone achieves any for just over 84 percent accuracy with just over 60 million parameters,

73
00:05:01,450 --> 00:05:02,470
which is quite good.

74
00:05:03,100 --> 00:05:10,690
The networks are a 16 billion parameters previously like maybe Inception Resonant Fusion two and this

75
00:05:10,760 --> 00:05:18,910
Meebo net they understood, which has a bit more is still performed less got less accuracy than efficient

76
00:05:18,910 --> 00:05:19,180
net.

77
00:05:19,210 --> 00:05:23,890
So you can see it every parameter account efficient and is a better network.

78
00:05:24,490 --> 00:05:25,900
So we'll stop there.

79
00:05:26,320 --> 00:05:28,690
And next we'll take a look at denseness.

80
00:05:29,290 --> 00:05:31,240
So I'll see you in the next section.