1
00:00:01,200 --> 00:00:04,560
Hi and welcome to the section on back propagation.

2
00:00:05,070 --> 00:00:07,170
This is what makes neural net trainable.

3
00:00:07,620 --> 00:00:08,760
So let's take a look at this.

4
00:00:09,150 --> 00:00:15,060
So as I just said, this is the most amazing algorithm that allows neural networks of convolutional

5
00:00:15,060 --> 00:00:16,830
neural networks to be trained.

6
00:00:17,250 --> 00:00:23,700
And what it does, it uses the loss to tell us how much to change or detail gradients so that we reduce

7
00:00:23,700 --> 00:00:24,630
the overall loss.

8
00:00:25,560 --> 00:00:31,650
So back propagation, this sounds like an ad for it, but basically this is what it is using loss to

9
00:00:31,650 --> 00:00:34,330
make gradients better, but cheesy enough.

10
00:00:35,400 --> 00:00:42,300
So anyway, here's an example of a very basic, simple example of the weights and biases and inputs

11
00:00:42,300 --> 00:00:46,950
and outputs and formulas and how we get them for a very simple neural network.

12
00:00:47,370 --> 00:00:52,710
Here we have one input tool sorry, two inputs, two hidden nodes and two output nodes.

13
00:00:53,280 --> 00:00:58,740
So you can see these are the weights here represented when the S on the edges here of these connections

14
00:00:58,740 --> 00:01:04,030
we have with W one point two, w two point three, five and so on and so on.

15
00:01:04,030 --> 00:01:09,900
And then we have two biases here and nodes here, and we also have the inputs and the output from those

16
00:01:09,900 --> 00:01:10,140
here.

17
00:01:10,560 --> 00:01:18,720
So you can see output one is given by five times each one out plus w six plus eight times it's two out.

18
00:01:19,770 --> 00:01:25,350
That's this one here and this one here, plus b two, which is the last set sets of biases.

19
00:01:26,100 --> 00:01:33,390
So using the lost value back propagation can tell us now for the next iteration.

20
00:01:33,390 --> 00:01:35,670
How much should we increase or decrease?

21
00:01:35,850 --> 00:01:40,440
I should say W five to or the actually I do have.

22
00:01:40,440 --> 00:01:43,560
I do say it here to reduce the overall loss.

23
00:01:43,680 --> 00:01:50,490
So we just want to know if we have to change W five, do we go for small increase a big, big decrease?

24
00:01:51,150 --> 00:01:51,690
How do we know?

25
00:01:52,200 --> 00:01:54,690
Well, that's what we'll discuss with back propagation.

26
00:01:55,110 --> 00:02:02,490
So no moving right to left back propagation updates to gradients node by node on a regular basis and

27
00:02:02,490 --> 00:02:04,500
this is done for all nodes, show the network.

28
00:02:05,040 --> 00:02:11,670
However, this magic just seems magical yet and right now, but it isn't explained to you yet, and

29
00:02:11,670 --> 00:02:12,570
that's where we keep going.

30
00:02:13,050 --> 00:02:19,740
So you can see how to move ahead with these arrows, and that's how the back propagation algorithm works

31
00:02:20,070 --> 00:02:20,820
in theory.

32
00:02:21,090 --> 00:02:23,460
Now let's look at the actual process behind it.

33
00:02:24,210 --> 00:02:30,900
So firstly, remember, we followed propagate and input data and input image into the network, and

34
00:02:30,900 --> 00:02:35,040
then we back propagate the to the lower weights to Lord loss.

35
00:02:36,690 --> 00:02:40,050
But this simply tunes the words for that particular input.

36
00:02:41,280 --> 00:02:45,950
We need to improve it to generalize to new data or unseen data.

37
00:02:45,960 --> 00:02:50,940
And but to do that, we need to actually input all our training data into the neural network.

38
00:02:51,210 --> 00:02:54,510
So you can imagine how many times we have two data gradients.

39
00:02:55,350 --> 00:03:01,590
This is a continuous process, and basically this is the training process of the neural network.

40
00:03:02,280 --> 00:03:05,760
So let's take a look at how we update OLEDs.

41
00:03:05,850 --> 00:03:07,770
So what do we add ingredients to look like?

42
00:03:08,220 --> 00:03:12,750
Again, let's go back to our simple neural network here, and let's take a look at this.

43
00:03:12,760 --> 00:03:15,780
The output of hidden node one that's here.

44
00:03:16,110 --> 00:03:25,520
The output is equal to AI one times W one as input one times one plus AI, two terms W2.

45
00:03:25,560 --> 00:03:28,780
You can see the connections coming in plus b one.

46
00:03:28,800 --> 00:03:33,570
That's the shared bias between these nodes and that's how we get the output of this node.

47
00:03:35,130 --> 00:03:41,850
So just so you know that these hidden notes here we're talking about in a convolutional neural network

48
00:03:41,850 --> 00:03:43,920
sense, there are two filters here.

49
00:03:44,010 --> 00:03:49,270
However, it's a lot more confusing to understand when I'm using this example as opposed to this example.

50
00:03:49,290 --> 00:03:50,610
So let's stick with this for now.

51
00:03:52,050 --> 00:03:53,190
So how does it work?

52
00:03:54,150 --> 00:03:55,230
Something called the general.

53
00:03:55,650 --> 00:03:57,690
You may have heard this from your high school math.

54
00:03:58,560 --> 00:04:05,490
Basically, if we have two functions, why f equals f of you and you Geof X, then the derivative of

55
00:04:05,490 --> 00:04:06,930
Y is given by this.

56
00:04:07,620 --> 00:04:11,520
Sorry, it's a bit too fast, so you can see it held.

57
00:04:11,520 --> 00:04:12,870
It's all just slow this down.

58
00:04:13,320 --> 00:04:13,890
You can see it.

59
00:04:13,890 --> 00:04:17,790
Mathematically, you may get a refresher from the other derivatives in high school math.

60
00:04:18,240 --> 00:04:22,410
The D y of a d y of the X is equal to divide.

61
00:04:22,410 --> 00:04:24,840
You multiply by D you the ADX.

62
00:04:26,340 --> 00:04:29,640
So that about a simple back propagation example.

63
00:04:30,300 --> 00:04:30,660
All right.

64
00:04:30,990 --> 00:04:35,100
We want to know how much changing five changes total error.

65
00:04:35,550 --> 00:04:38,490
Well, that's what exactly exactly the partial derivatives can give.

66
00:04:39,060 --> 00:04:44,400
This can tell us the change and error with respect to the change in way and w sorry.

67
00:04:45,060 --> 00:04:52,950
So we just need to find us and to do to find this, we can actually just use the general going backward

68
00:04:53,370 --> 00:04:53,820
to.

69
00:04:54,080 --> 00:04:55,290
I don't have the formula here.

70
00:04:55,740 --> 00:04:58,890
It's a bit long, but to cut to get the calculation for this.

71
00:04:59,790 --> 00:05:03,120
So you have to use some of the error of the two outputs.

72
00:05:03,120 --> 00:05:05,960
Just in case you didn't know, that's a total error u of T here.

73
00:05:07,990 --> 00:05:09,310
So let's take a look at this.

74
00:05:09,730 --> 00:05:19,420
The new W5 has a new wave of Note5 here is going to be equal to minus lambda multiplied by this change.

75
00:05:19,780 --> 00:05:26,800
Remember that change here was how much we needed to change the total Arabi by changing WW here.

76
00:05:27,820 --> 00:05:34,270
So you have me, you may have noticed we introduced a new parameter called lambda lambda is or a learning

77
00:05:34,270 --> 00:05:37,330
rate, so it controls how big a jump we take.

78
00:05:37,780 --> 00:05:44,650
So now that we know the value we need to take to update W5 so that we lower the loss, that's why it's

79
00:05:44,650 --> 00:05:45,580
a negative sign here.

80
00:05:46,480 --> 00:05:47,220
We need to know.

81
00:05:47,650 --> 00:05:51,910
We need to know, adjust this factor that we're changing by, by the lambda factor.

82
00:05:52,510 --> 00:05:58,660
Because if we change by big jumps, that's going to cause problems, which I'll explain to you in the

83
00:05:58,660 --> 00:05:59,290
next section.

84
00:05:59,680 --> 00:06:03,880
But big jumps basically cause the neural network to oscillate and never convert.

85
00:06:04,810 --> 00:06:05,510
Yeah, lot.

86
00:06:05,530 --> 00:06:08,350
We can learn a lot better, a lot faster, I should say.

87
00:06:08,830 --> 00:06:14,950
However, they can be stuck in the local minimums and sometimes never really converge to the lowest

88
00:06:14,950 --> 00:06:19,000
value, one to smoldering rates while training much more slowly.

89
00:06:19,360 --> 00:06:22,630
They tend to converge to a much better global minimum.

90
00:06:23,800 --> 00:06:27,190
So now let's stop there and we'll move on to gradient descent.

91
00:06:27,520 --> 00:06:28,000
Thank you.