1
00:00:02,160 --> 00:00:03,390
Hi and welcome back.

2
00:00:04,020 --> 00:00:09,570
We're about to take a look at Cuisinart, which is another very good small CNN that contains very good

3
00:00:09,570 --> 00:00:11,880
accuracy similar to mobile it.

4
00:00:12,090 --> 00:00:19,380
So it was developed in 2016 by researchers at the University of California at Berkeley and Stanford,

5
00:00:19,380 --> 00:00:22,290
as well as a company called Deep Skill Where.

6
00:00:23,190 --> 00:00:27,480
The aim was to make a very highly accurate but small CNN analyst.

7
00:00:27,660 --> 00:00:32,340
They listed their reasons in the paper less communication across servers during distributed filling

8
00:00:32,340 --> 00:00:37,410
process because back then, trading on multiple multiple GPUs was quite difficult.

9
00:00:37,440 --> 00:00:40,410
Now, it's not that difficult at all with PyTorch.

10
00:00:41,100 --> 00:00:46,350
Also, the smaller size of the network allowed it to be used on embedded systems like mobile phones,

11
00:00:46,800 --> 00:00:53,370
as well as devices with FPGA, as does a custom architecture, CPUs or GPUs, whatever you want to call

12
00:00:53,370 --> 00:00:54,810
them, custom processes.

13
00:00:55,380 --> 00:01:01,590
And it was also faster to update these models by the cloud of less bandwidth, which is good if you

14
00:01:01,590 --> 00:01:03,540
want to update like a model.

15
00:01:03,550 --> 00:01:09,750
That's an embedded camera that's deployed somewhere and you only have like a 2G or 3G connection.

16
00:01:10,620 --> 00:01:17,580
This having a small model is hugely advantages in that situation, and it had a 50 times less parameters

17
00:01:17,580 --> 00:01:20,340
and Alex that while performing, treat him as faster.

18
00:01:20,670 --> 00:01:21,690
So that's that's quite good.

19
00:01:23,070 --> 00:01:29,130
So two key takeaways of the squeeze net architecture is that in many cases, they replace the true battery

20
00:01:29,130 --> 00:01:30,750
filters with one by one filters.

21
00:01:31,170 --> 00:01:33,960
Other filters do left at a tree by tree filters.

22
00:01:33,960 --> 00:01:39,810
And this just so you know, having a one way one filter is nine times less parameters than free battery

23
00:01:39,810 --> 00:01:40,320
filter.

24
00:01:40,890 --> 00:01:46,200
And then also, we don't sample it in the network so that the convolution layers still have large activation

25
00:01:46,200 --> 00:01:47,970
or feature maps to work with.

26
00:01:48,530 --> 00:01:51,000
And that may not explain everything to you.

27
00:01:51,000 --> 00:01:54,720
Yet you may not make sense, so let's take a look at this.

28
00:01:54,870 --> 00:02:02,220
This is the fire module, which is a basically a squeeze and expand block in the architecture of squeezing

29
00:02:02,220 --> 00:02:02,430
it.

30
00:02:02,970 --> 00:02:04,950
So what's what's going on here?

31
00:02:05,340 --> 00:02:09,060
Well, the fire module module has a squeeze convolutional LEO.

32
00:02:09,090 --> 00:02:14,190
That's what these one by one convolutional builders are that feed into and expand layers and brings

33
00:02:14,190 --> 00:02:19,500
it back up to the size that if we get that, it may have had if it was a tree by tree or a larger filter.

34
00:02:20,220 --> 00:02:24,930
And that basically took basically a combination of those that we're using to squeeze in architecture.

35
00:02:25,290 --> 00:02:27,900
We use it five five modules.

36
00:02:28,200 --> 00:02:29,040
You can see them here.

37
00:02:29,050 --> 00:02:33,990
We have a convoluted beginning, just one and then we have to fire, fire, fire all the way down to

38
00:02:33,990 --> 00:02:34,260
fire.

39
00:02:34,260 --> 00:02:36,780
It actually has five nine and then some of them.

40
00:02:38,220 --> 00:02:41,310
Maybe that's because it can't do the physics of a one subducting.

41
00:02:41,310 --> 00:02:47,490
If I want some night Leo, that's beside the point just in nomenclature terminology.

42
00:02:48,450 --> 00:02:50,340
So you can see I don't know why it is.

43
00:02:50,340 --> 00:02:51,350
Labrador is wearing a mask.

44
00:02:51,350 --> 00:02:55,020
He doesn't look too happy about it, but it still looks in this case.

45
00:02:55,200 --> 00:02:58,320
This is the example the researchers used in this paper.

46
00:02:59,040 --> 00:03:06,220
So let's take a look at squeezing its performance squeeze that was actually able to outperform Alex.

47
00:03:06,270 --> 00:03:12,030
Think it actually can see its fifty seven point five, whereas Alex Net was fifty seven point two.

48
00:03:12,180 --> 00:03:16,050
What image not, and the top five accuracy was the same.

49
00:03:16,440 --> 00:03:19,500
However, the reduction in model size was drastic.

50
00:03:19,980 --> 00:03:24,840
You can see we got up to 500 and 10x smaller model that we can.

51
00:03:25,320 --> 00:03:30,640
They applied some data compression with instead of the could, instead of cutting the data's free to

52
00:03:30,640 --> 00:03:36,210
the tooth to use it better than six bit, which I've actually never really seen that much in in reality.

53
00:03:36,210 --> 00:03:41,070
But it's pretty cool that they tested it out and tried it, and the model size went down to zero point

54
00:03:41,070 --> 00:03:42,780
four seven megs, which is tiny.

55
00:03:43,230 --> 00:03:48,480
So to achieve this type of performance on image net with such a small model is actually quite remarkable.

56
00:03:49,050 --> 00:03:54,870
Right now, the best model is give you an 80s and the percentage and the percentage scores, but fifty

57
00:03:54,870 --> 00:03:56,700
seven point five is actually quite good.

58
00:03:57,210 --> 00:03:59,400
And you can see you can compare it to Alex Net.

59
00:03:59,850 --> 00:04:01,980
How much bigger Alex that was here.

60
00:04:02,340 --> 00:04:07,170
So you can see there are different things you can do to compress Alex Net, the smallest you will get

61
00:04:07,210 --> 00:04:09,660
to do, able to get it down to a six point nine megs.

62
00:04:10,350 --> 00:04:15,300
And even then, that was still pretty much like not even comparable to this.

63
00:04:15,300 --> 00:04:16,350
This is actually smaller.

64
00:04:17,070 --> 00:04:18,690
So we'll stop there for now.

65
00:04:18,690 --> 00:04:22,850
And then we'll take a look at a very good network, which is Google's efficient.

66
00:04:22,860 --> 00:04:27,840
That solves a lot of problems for researchers and something I actually use in my day to day practice

67
00:04:27,840 --> 00:04:28,350
sometimes.

68
00:04:28,860 --> 00:04:30,630
So I'll see you in the next section.

69
00:04:30,750 --> 00:04:31,200
Thank you.
