WEBVTT

00:01.800 --> 00:02.820
Instructor: Hello and welcome back

00:02.820 --> 00:04.170
to the course on Deep Learning.

00:04.170 --> 00:07.320
Today we're going to wrap up with back propagation.

00:07.320 --> 00:09.990
All right, so we already know pretty much everything we need

00:09.990 --> 00:12.090
to know about what happens in the neural network.

00:12.090 --> 00:14.250
We know that there's a process called,

00:14.250 --> 00:17.550
forward propagation where information is entered

00:17.550 --> 00:20.490
into the input layer and then it's propagated forward

00:20.490 --> 00:23.670
to get our y hats, our output values,

00:23.670 --> 00:27.750
and then we compare those to the actual values that we have

00:27.750 --> 00:31.980
in our training set and then we calculate the errors.

00:31.980 --> 00:35.550
Then the errors are back propagated through the network

00:35.550 --> 00:36.990
in the opposite direction,

00:36.990 --> 00:40.290
and that allows us to train the network

00:40.290 --> 00:41.640
by adjusting the weights.

00:41.640 --> 00:45.330
So, the one key important thing to remember here is

00:45.330 --> 00:49.680
that back propagation is an advanced algorithm driven

00:49.680 --> 00:53.010
by very interesting

00:53.010 --> 00:57.630
and sophisticated mathematics, which allows us

00:57.630 --> 01:00.270
to adjust the weights, all of them at the same time.

01:00.270 --> 01:02.520
All of the weights are adjusted simultaneously.

01:02.520 --> 01:05.850
So, if we were doing this manually or

01:05.850 --> 01:07.770
if we were coming up with a different type

01:07.770 --> 01:10.440
of algorithm, then even if we calculated the error

01:10.440 --> 01:13.350
and then we were trying to understand what effect each

01:13.350 --> 01:15.360
of the weights has on the error, we would have

01:15.360 --> 01:20.040
to somehow adjust each of the weights independently

01:20.040 --> 01:20.943
or individually.

01:22.020 --> 01:24.390
The huge advantage of back propagation

01:24.390 --> 01:26.850
and this a key thing to remember is

01:26.850 --> 01:30.720
that during the process of back propagation simply

01:30.720 --> 01:35.720
because of the way the algorithm is structured, you are able

01:37.920 --> 01:40.590
to adjust all of the weights at the same time.

01:40.590 --> 01:43.980
So, you basically know which part of the error each

01:43.980 --> 01:47.400
of your weights in the neural network is responsible for.

01:47.400 --> 01:52.400
Now, that is the key fundamental underlying principle

01:52.920 --> 01:57.920
of a back propagation and this was why it picked up

01:58.230 --> 02:02.760
so rapidly in the 1980s and this was a major breakthrough,

02:02.760 --> 02:04.740
and if you'd like to learn more about that

02:04.740 --> 02:07.470
and how exactly the mathematics works

02:07.470 --> 02:10.140
in the background, then a good article,

02:10.140 --> 02:12.787
which we've already mentioned, is the,

02:12.787 --> 02:15.240
"Neural Networks and Deep Learning" is actually a book

02:15.240 --> 02:16.530
by Michael Nielsen.

02:16.530 --> 02:19.890
There you'll find the mathematics written out

02:19.890 --> 02:23.670
and it'll help you understand how exactly this is possible.

02:23.670 --> 02:27.810
But for now, for our purposes, if from an intuition point

02:27.810 --> 02:31.290
of view, the important part is to remember that

02:31.290 --> 02:33.300
that's what back propagation does.

02:33.300 --> 02:36.930
It adjusts all of the weights at the same time.

02:36.930 --> 02:39.240
And now we're going to just wrap everything up

02:39.240 --> 02:42.240
with a step-by-step walkthrough of what happens

02:42.240 --> 02:45.360
in the training of a neural network.

02:45.360 --> 02:48.360
All right, so step one, we randomly initialize the weights

02:48.360 --> 02:50.970
to small numbers, close to zero, but not zero.

02:50.970 --> 02:53.010
We didn't really focus on the initialization

02:53.010 --> 02:55.230
of weights during the initialization tutorials

02:55.230 --> 02:58.320
but the weights have to start somewhere

02:58.320 --> 03:02.640
and they're initialized with random values near zero

03:02.640 --> 03:04.830
and from there, through the process

03:04.830 --> 03:06.720
of forward propagation back propagation,

03:06.720 --> 03:11.720
these weights are adjusted until the error is minimized,

03:11.910 --> 03:13.740
until the cost function is minimized.

03:13.740 --> 03:16.830
Then step two, input the first observation

03:16.830 --> 03:19.470
of your data sets to the first row into the input layer.

03:19.470 --> 03:21.510
Each feature is one input node.

03:21.510 --> 03:22.710
So basically, take the columns

03:22.710 --> 03:25.770
and put them into the input nodes.

03:25.770 --> 03:27.060
Step three, forward propagation

03:27.060 --> 03:29.610
from left to right, the neurons are activated in a way

03:29.610 --> 03:32.430
that the impact of each neurons activation is limited

03:32.430 --> 03:33.494
by the weights.

03:33.494 --> 03:34.327
So the weights, basically,

03:34.327 --> 03:38.160
determine how important each neurons activation is.

03:38.160 --> 03:39.900
Then propagates the activations,

03:39.900 --> 03:43.950
until getting the predicted result, y hats in this case.

03:43.950 --> 03:46.680
So basically, you propagate from left to right,

03:46.680 --> 03:48.840
or you go all the way until you get to the end

03:48.840 --> 03:50.310
and you get your y hat.

03:50.310 --> 03:51.690
Then compare the predicted result

03:51.690 --> 03:55.110
to the actual result, measure the generated error.

03:55.110 --> 03:57.480
And then you do the back propagation from right to left.

03:57.480 --> 03:59.790
The error is back propagated, update the weights according

03:59.790 --> 04:02.250
to how much they're responsible for the error.

04:02.250 --> 04:05.040
Again, you are able to calculate that because

04:05.040 --> 04:09.480
of the way the back propagation algorithm is structured.

04:09.480 --> 04:10.530
The learning rate decides

04:10.530 --> 04:13.140
by how much we update the weights learning rate,

04:13.140 --> 04:17.730
it's parameter you can control in your neural network.

04:17.730 --> 04:20.970
Step six, repeat steps one to five and update the weights

04:20.970 --> 04:23.340
after each observation.

04:23.340 --> 04:26.040
That is called reinforcement learning, and in our case

04:26.040 --> 04:29.550
that was stochastic gradient descent

04:29.550 --> 04:31.500
or repeat steps one to five

04:31.500 --> 04:33.900
but update weights only after a batch of observation.

04:33.900 --> 04:37.860
So batch learning, it's either full gradient descent

04:37.860 --> 04:40.950
or batch gradient descent or mini batch gradient descent.

04:40.950 --> 04:43.170
And step seven, when the whole training set passed

04:43.170 --> 04:45.750
through the artificial neural network

04:45.750 --> 04:49.050
that makes an epoch redo more epochs.

04:49.050 --> 04:50.580
So basically you just keep doing that

04:50.580 --> 04:52.590
and doing that and doing that and

04:52.590 --> 04:55.920
to allowing your neural network to train better

04:55.920 --> 04:58.623
and better and better, and constantly adjust itself,

05:00.270 --> 05:02.730
as you minimize the cost function.

05:02.730 --> 05:06.420
So there we go, those are the steps you need to take

05:06.420 --> 05:10.020
to build your artificial neural networks and train it.

05:10.020 --> 05:13.620
And these are the steps that you will be taking together

05:13.620 --> 05:16.080
with Hadelin in the practical tutorials.

05:16.080 --> 05:17.190
Wish you the best of luck

05:17.190 --> 05:19.530
and I look forward to seeing you next time.

05:19.530 --> 05:21.453
Until then, enjoy deep learning.
