1
00:00:00,450 --> 00:00:05,760
Hi and welcome back that, of course, in this section, we take a look at video unit, which is one

2
00:00:05,760 --> 00:00:07,080
of my favorite scenes.

3
00:00:07,440 --> 00:00:11,250
I'll explain why in this section, because I mean, generally I'll tell you why.

4
00:00:11,280 --> 00:00:18,690
Prior to diving into vogue, it's merely because Fiji is such a reliable, straightforward, simple

5
00:00:18,690 --> 00:00:21,150
network that could get quite good results.

6
00:00:21,160 --> 00:00:24,660
However, it comes at a cost which we'll see shortly.

7
00:00:25,290 --> 00:00:27,370
So let's take a look at avidity architecture.

8
00:00:27,390 --> 00:00:29,880
I wouldn't go into this architecture diagram just yet.

9
00:00:30,000 --> 00:00:32,550
Let's talk about talk a bit about the network first.

10
00:00:33,150 --> 00:00:39,630
Fiji was introduced by Oxford researchers, Oxford University researchers Karen Simonian and Andrew

11
00:00:39,720 --> 00:00:41,550
Eisenman in 2014.

12
00:00:41,550 --> 00:00:48,330
So it's about seven years old right now and in the Chiefs, ninety two point seven percent in the top

13
00:00:48,330 --> 00:00:50,520
five accuracy on the image note dataset.

14
00:00:51,060 --> 00:00:56,700
Now, we haven't discussed what top five accuracy is, but I'll explain it to you a bit now, but we'll

15
00:00:56,700 --> 00:00:57,270
go into this.

16
00:00:57,540 --> 00:01:03,150
We'll go into more detail about top five accuracy or top inaccuracies later on in this course.

17
00:01:03,150 --> 00:01:11,070
But what it is is if the predicted class for an image is in the top five results I can remember.

18
00:01:11,210 --> 00:01:14,880
Remember, we get probability outputs of of of an artwork.

19
00:01:15,360 --> 00:01:21,360
So if the five highest probability classes were the five classes with the highest probability scores,

20
00:01:22,260 --> 00:01:28,770
if one of those five were with belong to the class, the correct class we considered correct.

21
00:01:29,010 --> 00:01:31,170
So that's what top five accuracy means.

22
00:01:31,650 --> 00:01:37,320
And you can see with a thousand classes on image dataset, ninety two point seven percent is actually

23
00:01:37,320 --> 00:01:37,830
quite good.

24
00:01:37,850 --> 00:01:38,760
It's quite remarkable.

25
00:01:39,580 --> 00:01:40,710
That means it's doing fairly well.

26
00:01:41,820 --> 00:01:50,430
And just to elaborate, video has 16 to 18 conveyors and three F Celia's to basically give us six layers

27
00:01:50,430 --> 00:01:51,030
in total.

28
00:01:51,390 --> 00:01:57,270
However, there's also Vejjajiva 19, which has 16 conclaves and three fully connected layers.

29
00:01:57,270 --> 00:01:59,880
So let's take a look at the architecture that bit more detail.

30
00:02:00,360 --> 00:02:04,890
So you can see basically, let's alter the bottom, because that's where the input is considered to

31
00:02:04,890 --> 00:02:11,370
have these conversations here with 64 filters, then 128 then is Mark spoonley between those?

32
00:02:11,820 --> 00:02:15,150
Then there's two more countries, two more and then a pooling layer.

33
00:02:15,690 --> 00:02:19,200
Then on Virgin 19, we have four layers instead of three.

34
00:02:19,590 --> 00:02:25,940
And they all have 512 filters then is four more as opposed to more compared to when you comparing Fiji

35
00:02:25,950 --> 00:02:26,970
19 to 16.

36
00:02:27,480 --> 00:02:33,060
Then there's less pooling layer and then the fully connected layers with four thousand ninety six nodes

37
00:02:33,060 --> 00:02:35,110
in each one, then output.

38
00:02:35,130 --> 00:02:40,530
It outputs two thousand nodes, does a final output classes and then we have the off max on top to get

39
00:02:40,530 --> 00:02:41,580
the probability scores.

40
00:02:42,120 --> 00:02:48,120
So, as you can see, visually follows what we will now be calling the classical CNN approach.

41
00:02:48,120 --> 00:02:54,840
That's this design here, where we have multiple layers that are pooling columns, pool and so on,

42
00:02:54,840 --> 00:02:56,550
and then the fully connected layers at the end.

43
00:02:57,180 --> 00:03:02,760
Although you wouldn't notice in this classical scene in architectures that we have feature maps of increasing

44
00:03:02,760 --> 00:03:05,580
size or filters, no filters and increasing size.

45
00:03:05,970 --> 00:03:12,000
So we go 64 128 256GB, 512GB at this level of the top of at 512 here.

46
00:03:12,630 --> 00:03:14,220
So that's effectively it.

47
00:03:14,310 --> 00:03:20,160
This is another slide and that image from the visiting that paper that was published in 2014.

48
00:03:20,730 --> 00:03:24,090
What I want to show you here, I mean, this describes the architecture we can see.

49
00:03:24,450 --> 00:03:30,540
There's different flavors of the widgets that we're busy, which is the Widget 19 D, which is Figure

50
00:03:30,540 --> 00:03:35,340
16 is another C, which is also 16 layers, 13 11.

51
00:03:35,790 --> 00:03:38,590
And they're just variations of the widget network.

52
00:03:38,630 --> 00:03:39,630
You can make it one one.

53
00:03:40,080 --> 00:03:42,450
In my previous course, I made one call.

54
00:03:43,090 --> 00:03:50,490
I think it was Minifig was the name of it, and it basically just had it just cut it, just cut the

55
00:03:50,640 --> 00:03:56,970
top players out and left these Italy as I left Italy is here a six and six layers, and then this series

56
00:03:56,970 --> 00:03:59,340
of nine layers and total is what I left.

57
00:04:00,240 --> 00:04:05,470
So going back to here, what it wanted to show you was take a look at this.

58
00:04:05,490 --> 00:04:09,870
This is 144 million parameters and between BGT nineteen.

59
00:04:10,440 --> 00:04:18,600
That is quite a bit which is often taken with big is that it's often very, very slow to train.

60
00:04:19,200 --> 00:04:22,470
So while you do get generally good results, it's reliable.

61
00:04:23,070 --> 00:04:25,800
You will always get a good network out of gig.

62
00:04:26,190 --> 00:04:30,810
However, it's just so slow the trim and an inference becomes a problem.

63
00:04:31,170 --> 00:04:37,110
The model size is also a problem if you want to have a small model or an embedded system, which is

64
00:04:37,110 --> 00:04:38,040
not going to work.

65
00:04:38,040 --> 00:04:39,960
In that case, it's too big to slow.

66
00:04:40,740 --> 00:04:44,280
I mean, it will work is just too slow at that point to be practical sometimes.

67
00:04:44,820 --> 00:04:51,300
So that's it for big will now take a look at resonance and resonance of my current favorite model.

68
00:04:51,750 --> 00:04:56,520
For good reason, they're just so good, and it solves so many problems with deep networks.

69
00:04:56,520 --> 00:04:59,490
So we'll take a look at Raisinets in the next section.