1
00:00:01,050 --> 00:00:02,130
Hi and welcome back.

2
00:00:02,670 --> 00:00:08,070
In this section, we'll take a look at some of the underlying maps that explains how residents work

3
00:00:08,070 --> 00:00:10,980
and how they actually solve the vanishing gradient problem.

4
00:00:11,640 --> 00:00:12,690
So it's quite simple.

5
00:00:12,900 --> 00:00:13,950
So don't be scared.

6
00:00:13,980 --> 00:00:16,050
The maps in this slide is actually quite simple.

7
00:00:16,620 --> 00:00:17,430
Let's step through it.

8
00:00:17,550 --> 00:00:24,570
So remember correctly is combined with the real operation is effectively just a simple, linear operation.

9
00:00:24,960 --> 00:00:26,950
That's what this first line says here.

10
00:00:26,970 --> 00:00:28,260
It says the output.

11
00:00:28,620 --> 00:00:36,330
That's Z L one l plus one is equal to the weights times the L input here.

12
00:00:36,690 --> 00:00:45,330
Plus Tobias Ayo's input W one is the weights of this layer here, the first layer plus B one, which

13
00:00:45,330 --> 00:00:47,040
is the bias of the first conclave.

14
00:00:47,670 --> 00:00:54,720
So that's before the rhythm operation and we have the added value around it here, which is the Z one

15
00:00:55,020 --> 00:00:57,000
C L1, which is the output of this node.

16
00:00:57,870 --> 00:01:03,390
Next, we have Z two zeal to, I should say, which is the output of this node, which is this operation

17
00:01:03,390 --> 00:01:11,070
here takes the output of this here three times, by the way, we'd say plus, Tobias, of this conflict

18
00:01:11,070 --> 00:01:11,340
here.

19
00:01:11,790 --> 00:01:18,210
So again, that's a simple operation and then whipped out the short circuit, which re lu.

20
00:01:18,360 --> 00:01:19,550
This is what it looks like here.

21
00:01:19,560 --> 00:01:21,900
The output, this is just continuing here.

22
00:01:21,900 --> 00:01:23,580
So it's one two, three four.

23
00:01:24,180 --> 00:01:25,610
And then this is the output here.

24
00:01:25,620 --> 00:01:29,400
What the short circuit now pay attention to this here.

25
00:01:30,090 --> 00:01:31,500
You can see we have g.

26
00:01:31,980 --> 00:01:40,500
This is where the real activation function of Z L2, which is the output here, plus a one initial input.

27
00:01:40,920 --> 00:01:42,060
So what does that mean?

28
00:01:42,450 --> 00:01:43,410
It's actually quite simple.

29
00:01:44,040 --> 00:01:45,060
This is what it means here.

30
00:01:45,150 --> 00:01:50,700
This is when we expand out what the what L1 is and here so you can actually see the expansion.

31
00:01:51,150 --> 00:01:53,740
However, I wouldn't focus too much on the details of with that.

32
00:01:53,760 --> 00:01:55,470
I want to focus on this one here.

33
00:01:56,280 --> 00:02:03,230
Now what happens if W or oh my host, we saw the bias is very small near zero.

34
00:02:04,140 --> 00:02:05,280
Take a look at this equation here.

35
00:02:05,640 --> 00:02:09,900
That's the output of the Corsican gondolier IL +2.

36
00:02:10,440 --> 00:02:13,250
So you can see if this is close to zero.

37
00:02:13,260 --> 00:02:19,920
This is close to zero e l plus two is equal to ends up being close to equal to G of F.

38
00:02:20,220 --> 00:02:26,790
This is a good thing, however, because if if it wasn't for this E input here, this would have gone

39
00:02:26,790 --> 00:02:27,420
to zero.

40
00:02:27,900 --> 00:02:28,530
Do you see that?

41
00:02:29,250 --> 00:02:30,630
So this is what this line says.

42
00:02:30,630 --> 00:02:32,340
This is the important part of the slide.

43
00:02:33,000 --> 00:02:38,310
So you can see by having the input here, it prevents the output of this node from being zero.

44
00:02:38,910 --> 00:02:43,020
This thus allows us to solve the vanishing gradient problem quite effectively.

45
00:02:43,920 --> 00:02:50,100
So now you can see that solving that vanishing gradient problem allows Krasnov networks like the rest

46
00:02:50,100 --> 00:02:53,430
in a telephone less than 50, and as I said, his arrest resonant 150.

47
00:02:53,760 --> 00:02:56,460
It is actually a resonant tool, one you can see.

48
00:02:56,460 --> 00:03:03,270
It allows them to be very, very deep, better and that we're very big and we don't need as many fully

49
00:03:03,270 --> 00:03:08,930
connected layers and the and here we can just have some simple, fully connected layers or none at all,

50
00:03:08,940 --> 00:03:09,720
actually sometimes.

51
00:03:10,260 --> 00:03:12,630
And it seems a lot of parameters right there.

52
00:03:13,330 --> 00:03:19,090
So vigyan, sorry, the reason it has been a very, very good network hurts.

53
00:03:19,170 --> 00:03:27,300
It has achieved remarkable performance and is actually a people, a 20 or 21 people that says residents

54
00:03:27,390 --> 00:03:28,140
are all you need.

55
00:03:28,650 --> 00:03:33,660
Basically, what are people established was that even though a vision transforms vision transform,

56
00:03:33,660 --> 00:03:41,750
that networks have basically surpassed state of the art and resonance and resonance networks and seen

57
00:03:41,760 --> 00:03:43,530
an image that performance categories.

58
00:03:44,370 --> 00:03:48,060
The researchers showed that were just some minor tweaks.

59
00:03:48,240 --> 00:03:51,840
Residents are able to achieve similar and even better results.

60
00:03:52,080 --> 00:03:59,010
So I would suggest you guys, if you need to use a network radio as a network or pre-trained network,

61
00:03:59,340 --> 00:04:02,130
which will all be doing in little lessons and colab.

62
00:04:02,490 --> 00:04:05,360
So don't worry, we'll actually start using these networks.

63
00:04:05,370 --> 00:04:09,990
Get some hands on experience with loading these networks, loading pre-trained versions of these networks,

64
00:04:10,320 --> 00:04:12,210
doing transfer learning, fine tuning.

65
00:04:12,630 --> 00:04:18,060
So it's going to be exciting time when you start playing with these advanced CNN's.

66
00:04:18,450 --> 00:04:24,940
So I'll stop there for now, and we'll take a look in the next chapter at a different type of CNN mobile

67
00:04:24,940 --> 00:04:28,950
unit, which is basically not trying to achieve a remarkable performance.

68
00:04:28,950 --> 00:04:31,920
It's trying to achieve achieve remarkable efficiency.

69
00:04:32,160 --> 00:04:34,210
So I'll see you in that section.

70
00:04:34,260 --> 00:04:34,770
Thank you.