WEBVTT

00:00.840 --> 00:03.270
-: Hello and welcome back to the course on Deep Learning.

00:03.270 --> 00:05.430
Now that we've seen neural networks in action

00:05.430 --> 00:08.430
it's time for us to find out how they learn.

00:08.430 --> 00:10.470
So let's dive straight into it.

00:10.470 --> 00:13.800
There are two fundamentally different approaches

00:13.800 --> 00:16.230
to getting a program to do what you want it to do.

00:16.230 --> 00:19.920
One is hard coded coding where you actually

00:19.920 --> 00:24.920
tell the program specific rules and what outcomes you want

00:25.886 --> 00:28.592
and you just guide it throughout the whole way

00:28.592 --> 00:30.510
and you account for all the possible options

00:30.510 --> 00:33.300
that the program has to deal with.

00:33.300 --> 00:34.133
On the other hand

00:34.133 --> 00:38.020
you have neural networks where you create a facility

00:39.450 --> 00:43.170
for the program to be able to understand what it needs to do

00:43.170 --> 00:44.608
on its own.

00:44.608 --> 00:46.590
So you basically create this neural network where

00:46.590 --> 00:48.510
you provided inputs,

00:48.510 --> 00:50.100
you tell it what you want as outputs

00:50.100 --> 00:53.400
and then you let it figure everything out on its own.

00:53.400 --> 00:55.600
Two fundamentally different approaches

00:57.399 --> 00:58.770
and that is something to keep

00:58.770 --> 01:00.840
in mind as we go through these tutorials.

01:00.840 --> 01:04.200
Our goal is to create this network

01:04.200 --> 01:06.000
which then learns on its own.

01:06.000 --> 01:11.000
We, we are going to avoid trying to put in the rules.

01:11.010 --> 01:14.490
And a good example that I can give you right now is....

01:14.490 --> 01:15.750
this will come further in the course

01:15.750 --> 01:18.120
but it's just a very visual example.

01:18.120 --> 01:20.340
For instance, how do you distinguish

01:20.340 --> 01:21.780
between a dog and a cat?

01:21.780 --> 01:25.260
For on the left side on the approach that's depicted

01:25.260 --> 01:28.590
on the left, you would program things in like the

01:28.590 --> 01:30.570
the cat's ears have to be like this,

01:30.570 --> 01:32.640
Look out for whiskers

01:32.640 --> 01:34.170
look out for this type of nose

01:34.170 --> 01:37.800
look out for this type of shape of face.

01:37.800 --> 01:38.880
Look out for these colors.

01:38.880 --> 01:40.650
You kind of, you'd describe all these things

01:40.650 --> 01:41.670
and you'd have conditions like

01:41.670 --> 01:44.670
if the ears are pointy, then cats

01:44.670 --> 01:49.560
if the ears are slopping down, then possibly dog and so on.

01:49.560 --> 01:50.940
On the other hand, for a neural network,

01:50.940 --> 01:53.100
you just code the neural network.

01:53.100 --> 01:54.730
So you code the architecture

01:55.610 --> 01:58.170
and then you point the neural network at a folder

01:58.170 --> 01:59.760
with all these cats and dogs

01:59.760 --> 02:02.670
with images of cats and dogs which are already categorized.

02:02.670 --> 02:04.244
And you tell it,

02:04.244 --> 02:06.870
okay I've got some images of cats and dogs.

02:06.870 --> 02:08.880
Go and learn what a cat is.

02:08.880 --> 02:10.560
Go and learn what a dog is

02:10.560 --> 02:11.850
and the neural network will

02:11.850 --> 02:15.270
on its own understand everything it needs to understand.

02:15.270 --> 02:17.520
And then further down, once it's trained up,

02:17.520 --> 02:19.860
when you give it a new image of a cat or a dog,

02:19.860 --> 02:21.600
it'll be able to understand what it was.

02:21.600 --> 02:23.160
So there they are,

02:23.160 --> 02:25.650
those are the two fundamentally different approaches

02:25.650 --> 02:28.090
and today we're going to slowly start getting

02:29.154 --> 02:30.993
into how that second approach works.

02:31.993 --> 02:33.360
All right, so let's get straight to it.

02:33.360 --> 02:37.200
Here we have a very basic neural network with a one layer.

02:37.200 --> 02:41.208
This is called a single layer feed forward neural network

02:41.208 --> 02:42.750
and it is also called a perceptron.

02:42.750 --> 02:44.033
Now, before we proceed, one thing that we do need to

02:44.033 --> 02:47.370
adjust is that output value.

02:47.370 --> 02:48.480
Right now you can see that

02:48.480 --> 02:49.840
it's just a Y we need to

02:51.085 --> 02:51.918
put a Y hat in there.

02:51.918 --> 02:52.751
And the reason for that is

02:52.751 --> 02:55.470
usually Y stands for the actual value.

02:55.470 --> 02:56.490
And that's what we're going to be using.

02:56.490 --> 02:58.593
So Y is gonna be the actual value,

02:59.984 --> 03:01.184
which we see in reality,

03:02.043 --> 03:03.720
output value is the predicted value

03:03.720 --> 03:05.880
by the algorithm by the neural network.

03:05.880 --> 03:09.240
Y hat is the output value.

03:09.240 --> 03:11.990
Basically that's the denomination for the output value.

03:13.735 --> 03:15.330
And the perceptron was first invented in 1957

03:15.330 --> 03:17.430
by Frank Rosenblatt

03:17.430 --> 03:19.560
and his whole idea was to

03:19.560 --> 03:24.560
create something that can actually learn and adjust itself.

03:25.200 --> 03:27.933
And this is what we're going to be looking at now.

03:29.599 --> 03:30.432
So we've got our perceptron.

03:30.432 --> 03:32.040
let's see how our perceptron learns.

03:32.040 --> 03:34.980
So let's say we have some input values

03:34.980 --> 03:37.030
that have been supplied to the perceptron

03:39.101 --> 03:40.703
and or basically to our neural network.

03:42.286 --> 03:44.220
Then the activation functions applied,

03:44.220 --> 03:45.220
we have an output

03:46.409 --> 03:50.020
and now we're going to plot the output on a chart.

03:50.020 --> 03:51.780
So there it is our output Y hat.

03:51.780 --> 03:54.960
Now what we need to do is in order to be able to learn

03:54.960 --> 03:56.880
we need to compare the output value

03:56.880 --> 03:58.000
to the actual value

03:59.167 --> 04:01.560
that we want the neural network to get, right?

04:01.560 --> 04:04.330
And that is the value Y

04:05.666 --> 04:06.499
And so if we plot it here

04:06.499 --> 04:08.210
you'll see that there's a bit of a difference.

04:09.523 --> 04:10.356
Now we are going to calculate a function

04:10.356 --> 04:11.340
called the cost function

04:11.340 --> 04:13.530
is calculated as one half

04:13.530 --> 04:14.730
of the square difference

04:15.963 --> 04:17.483
between the actual value and output value.

04:19.148 --> 04:20.520
Now there are many ways you can come up with cost function.

04:20.520 --> 04:23.220
There are many different cost functions that you can use.

04:23.220 --> 04:25.870
This is probably the most commonly used cost function

04:27.940 --> 04:30.570
and why it is specifically this function we use

04:30.570 --> 04:33.534
We'll find out further down when we're talking

04:33.534 --> 04:36.010
about gradient descent, but for now

04:36.010 --> 04:37.740
we're just going to agree that this is the cost function.

04:37.740 --> 04:40.530
And basically what the cost function is telling us is,

04:40.530 --> 04:44.163
what is the error that you have in your prediction?

04:45.727 --> 04:47.670
And our goal is to minimize the cost function

04:49.175 --> 04:50.093
because the the lower the cost function,

04:50.942 --> 04:53.325
the closer the Y hat is to Y. Okay?

04:53.325 --> 04:55.725
So as long as we agree on that, let's proceed.

04:55.725 --> 04:57.240
So basically from here what happens is

04:57.240 --> 04:59.228
there's our cost function.

04:59.228 --> 05:01.830
And from here what happens is now we are going to,

05:01.830 --> 05:03.120
once we've compared,

05:03.120 --> 05:05.730
now we're going to feed this information

05:05.730 --> 05:08.970
back into the neural network.

05:08.970 --> 05:09.803
So there we go.

05:09.803 --> 05:12.730
There's the information going back into the neural network

05:13.993 --> 05:16.449
and it goes to the weights and the weights get updated.

05:16.449 --> 05:18.844
Basically, the only thing that we have control of

05:18.844 --> 05:20.190
in this very simple neural network

05:20.190 --> 05:21.023
are the weights,

05:21.023 --> 05:22.268
W one

05:22.268 --> 05:23.101
W two,

05:23.101 --> 05:23.934
all the way to WM

05:25.406 --> 05:26.760
And our goal is to minimize the cost function.

05:26.760 --> 05:29.460
So all we can do is update the weight.

05:29.460 --> 05:30.813
So we update the weights,

05:32.220 --> 05:33.550
tweak them a little bit

05:35.213 --> 05:36.995
and how exactly we'll find out further down.

05:36.995 --> 05:39.863
But for now, we, we agree that we update the weights

05:39.863 --> 05:40.696
and then we continue.

05:40.696 --> 05:44.820
So, but here I've put up this screenshot of the data

05:44.820 --> 05:48.594
just to make one point very clear that right now

05:48.594 --> 05:50.460
throughout this whole experiment,

05:50.460 --> 05:51.990
everything we're doing right now,

05:51.990 --> 05:54.000
we're dealing with just the one row.

05:54.000 --> 05:55.170
So we're dealing with,

05:55.170 --> 05:57.150
we have a data set of one row

05:57.150 --> 05:58.980
where we have, for instance

05:58.980 --> 06:02.010
we're dealing with how long you study

06:02.010 --> 06:04.950
like the variable that we're predicting is

06:04.950 --> 06:08.400
what results you're gonna get on an exam.

06:08.400 --> 06:10.770
And the independent variables that we have is,

06:10.770 --> 06:12.720
how many hours did you study for?

06:12.720 --> 06:14.170
How many hours did you sleep?

06:15.550 --> 06:18.095
and what did you get on the quiz in the mid-semester?

06:18.095 --> 06:19.876
So in the middle of the semester, there's a quiz

06:19.876 --> 06:20.950
what percentage did you get there?

06:20.950 --> 06:22.202
So based on those variables

06:22.202 --> 06:24.690
we're trying to predict what score you'll get for the exam.

06:24.690 --> 06:28.020
And in exam, the 93%, that's the actual value.

06:28.020 --> 06:28.953
So that's Y.

06:30.690 --> 06:32.280
So we feed these three values

06:32.280 --> 06:34.080
into our neural network again

06:34.080 --> 06:35.320
for the second time now

06:37.010 --> 06:39.120
and then going to be comparing the result to Y.

06:39.120 --> 06:40.770
So let's see how this works.

06:40.770 --> 06:43.800
We feed these values into the neural network

06:43.800 --> 06:46.740
everything gets adjusted and weights get adjusted.

06:46.740 --> 06:49.873
So as you can see, let's do this again.

06:49.873 --> 06:51.372
We're going to feed the values again.

06:51.372 --> 06:53.220
The point here is that we're feeding in these same values.

06:53.220 --> 06:54.510
So we only have one row

06:54.510 --> 06:56.370
We're trying to, we're training on one row

06:56.370 --> 06:59.610
This is because this is just a very simple basic example.

06:59.610 --> 07:01.770
Then we'll see what happens when there's more rows.

07:01.770 --> 07:03.460
So again, we feed these rows in

07:05.152 --> 07:06.995
our cost functions get adjusted.

07:06.995 --> 07:10.530
As you can see, everything happens along those lines again.

07:10.530 --> 07:13.710
So as you can see, every time our Y hat is changing

07:13.710 --> 07:15.030
because we've tweaked the weights,

07:15.030 --> 07:16.470
our Y hat is changing

07:16.470 --> 07:17.580
our cost function's changing

07:17.580 --> 07:18.780
let's have a look again.

07:19.656 --> 07:20.490
So we feed those in.

07:20.490 --> 07:21.480
Y hat is changing

07:21.480 --> 07:22.890
cost function is changing.

07:22.890 --> 07:24.060
We get information back,

07:24.060 --> 07:25.410
feed back to the weights

07:25.410 --> 07:27.060
so that the weights get adjusted again.

07:27.060 --> 07:29.760
We feed in the same values every time.

07:29.760 --> 07:30.960
Everything gets adjusted,

07:30.960 --> 07:31.860
goes back to the weights

07:31.860 --> 07:33.661
and one more time,

07:33.661 --> 07:34.494
we feed in.

07:34.494 --> 07:35.700
Okay?

07:35.700 --> 07:36.690
And another time.

07:36.690 --> 07:38.520
So we've adjusted the weights

07:38.520 --> 07:39.820
we feed in the information

07:40.800 --> 07:42.125
and there we go.

07:42.125 --> 07:44.430
So now this time the Y hat is equal

07:44.430 --> 07:45.990
to Y cost functional zero.

07:45.990 --> 07:48.450
Usually you won't get cost function equal to zero.

07:48.450 --> 07:50.223
But this is a very simple example.

07:51.533 --> 07:52.830
So hopefully all that made sense.

07:52.830 --> 07:56.250
Every time we feed in exactly that same row

07:56.250 --> 07:58.170
because just in this case we're just dealing

07:58.170 --> 07:59.790
with that one row,

07:59.790 --> 08:01.360
into our neural network

08:03.497 --> 08:05.610
where then the values get supplied by the weights,

08:05.610 --> 08:06.990
the activation functions applied.

08:06.990 --> 08:08.130
We get Y hat

08:08.130 --> 08:10.290
Y hat is compared to Y.

08:10.290 --> 08:12.360
Then we see how the cost function has changed.

08:12.360 --> 08:13.860
Feed back that information back

08:13.860 --> 08:14.910
into the neural network

08:14.910 --> 08:17.850
and adjust the weights again.

08:17.850 --> 08:19.560
And then we repeat the same process again

08:19.560 --> 08:21.600
with the same exact row.

08:21.600 --> 08:23.783
We're trying to minimize that cost function.

08:24.650 --> 08:27.000
So up until now we've been dealing with just that one row.

08:27.000 --> 08:29.460
Let's see what happens when you have multiple rows.

08:29.460 --> 08:31.350
So here's the full data set.

08:31.350 --> 08:33.810
We have eight rows of

08:33.810 --> 08:35.340
how many hours you slept?

08:35.340 --> 08:37.630
or maybe these are different students

08:38.470 --> 08:40.218
in taking the same exam,

08:40.218 --> 08:41.280
how many others hours they studied?

08:41.280 --> 08:44.191
how many hours they slept before the exam?

08:44.191 --> 08:47.313
what to get on the quiz and their final result on the test.

08:48.336 --> 08:50.333
And as you can see here on the left

08:50.333 --> 08:51.900
I've got eight of these perceptrons.

08:51.900 --> 08:54.750
Actually they are all the same perceptron.

08:54.750 --> 08:57.478
So this is also important to understand.

08:57.478 --> 08:59.140
I just multiplied it

08:59.140 --> 09:00.240
or like duplicated eight times just

09:00.240 --> 09:01.330
so that we can

09:03.360 --> 09:04.320
conceptually understand.

09:04.320 --> 09:05.430
But the important thing here,

09:05.430 --> 09:06.933
it's the same neural network.

09:08.417 --> 09:09.300
We're going to be feeding these

09:10.414 --> 09:11.257
into one same neural network.

09:11.257 --> 09:12.090
So let's go.

09:12.090 --> 09:12.923
Let's get started.

09:12.923 --> 09:14.880
So one epoch

09:14.880 --> 09:17.250
as you'll hear (indistinct) mentioning,

09:17.250 --> 09:19.290
one epoch is when we go

09:19.290 --> 09:20.610
through a whole data set

09:20.610 --> 09:25.610
and we train our neural network on all of these rows

09:26.310 --> 09:28.415
So let's go, let's get started.

09:28.415 --> 09:30.226
So there's our first row

09:30.226 --> 09:32.550
and there's Y hat for the first row.

09:32.550 --> 09:33.720
There's the second row

09:33.720 --> 09:36.088
there's Y hat for the second row.

09:36.088 --> 09:36.921
So again, it's being fed

09:36.921 --> 09:39.360
into the same neural network every time.

09:39.360 --> 09:41.220
I've just copied them several times

09:41.220 --> 09:44.043
so we can visually see how this is happening.

09:45.906 --> 09:47.880
Then again, it's happening again.

09:47.880 --> 09:50.670
That's third row, fourth row

09:50.670 --> 09:53.040
there's our Y hat for the fourth row and so on.

09:53.040 --> 09:55.110
Basically then we get the same values

09:55.110 --> 09:56.610
for the remaining four rows as well.

09:56.610 --> 09:57.960
So every time we just feed

09:57.960 --> 10:01.528
in a row into our neural network,

10:01.528 --> 10:02.673
we get a value.

10:04.502 --> 10:06.990
Then we compare to the actual values.

10:06.990 --> 10:08.540
So there are the actual values.

10:09.487 --> 10:11.813
So for every single row we have an actual value.

10:12.816 --> 10:14.730
And now based on all of these differences

10:14.730 --> 10:16.200
between Y hat and Y

10:16.200 --> 10:18.300
we can calculate the cost function

10:18.300 --> 10:23.300
which is the sum of all of those squared differences

10:24.210 --> 10:25.510
between Y hat and Y

10:26.460 --> 10:28.200
and all of that is halved.

10:28.200 --> 10:30.330
And there's our cost function.

10:30.330 --> 10:31.680
And basically now what we do

10:31.680 --> 10:34.320
after we have the full cost function

10:34.320 --> 10:36.963
we go back and we update the weights.

10:38.011 --> 10:39.510
We update W one, W two, W three.

10:39.510 --> 10:41.857
And the important thing to remember here is

10:41.857 --> 10:44.790
that all of these perceptrons,

10:44.790 --> 10:45.810
all of these neural networks

10:45.810 --> 10:48.223
is actually one neural network.

10:48.223 --> 10:49.793
So there's not eight of them, there's just one.

10:50.899 --> 10:51.732
And when we update the weights,

10:51.732 --> 10:53.220
we're going to update the weights

10:53.220 --> 10:54.480
in that one neural network.

10:54.480 --> 10:56.370
So basically the weights are gonna be the same

10:56.370 --> 10:57.723
for all of the rows.

10:59.184 --> 11:00.510
So it's not the case that every row has its own weights.

11:00.510 --> 11:02.880
Now all the rows share the weights.

11:02.880 --> 11:06.360
And so that's why we looked at the cost function

11:06.360 --> 11:10.203
which is the sum of the square differences.

11:11.368 --> 11:12.201
And then we updated the weights.

11:12.201 --> 11:15.300
And now from here, that was just one iteration.

11:15.300 --> 11:19.020
Next, we're going to run this whole thing again.

11:19.020 --> 11:22.320
We're going to feed every single row

11:22.320 --> 11:23.730
into the neural network,

11:23.730 --> 11:25.050
find out our cost function

11:25.050 --> 11:26.400
and do this whole process again.

11:26.400 --> 11:28.270
So just as we saw previously

11:29.394 --> 11:30.630
where we had just one row

11:30.630 --> 11:31.500
and we were doing everything

11:31.500 --> 11:32.880
again and again, again, again

11:32.880 --> 11:33.713
same thing here.

11:33.713 --> 11:35.534
But now we're gonna be doing it for

11:35.534 --> 11:37.620
eight rows or 800 rows or 8,000 rows,

11:37.620 --> 11:40.800
however many rows you have in your data set.

11:40.800 --> 11:41.940
You do this process

11:41.940 --> 11:44.190
and then you calculate the cost function.

11:44.190 --> 11:48.420
And the goal here is to minimize the cost function

11:48.420 --> 11:49.980
and to get,

11:49.980 --> 11:52.050
as soon as you found the minimum of the cost function,

11:52.050 --> 11:54.360
that is your final neural network.

11:54.360 --> 11:56.470
That means your weights have been adjusted

11:57.740 --> 12:02.483
and you have found the optimal weights

12:03.330 --> 12:07.200
for this data set that you're training on a

12:07.200 --> 12:09.102
and you're ready to proceed

12:09.102 --> 12:11.550
to the testing phase or to the application phase.

12:11.550 --> 12:14.970
And this whole process is called backpropagation.

12:14.970 --> 12:18.550
So some additional reading that you might want to do

12:20.450 --> 12:21.559
for the cost function,

12:21.559 --> 12:23.220
and I know we we just talked about one

12:24.100 --> 12:24.933
and they are many different ones.

12:24.933 --> 12:28.650
A good article is located on Cross Validated.

12:28.650 --> 12:29.483
It's called,

12:29.483 --> 12:31.500
A list of cost functions used in neural networks

12:31.500 --> 12:32.883
alongside applications.

12:33.914 --> 12:34.823
So the URL's there,

12:35.708 --> 12:38.100
but you can just Google for that exact search term

12:38.100 --> 12:39.100
or search phrase

12:40.951 --> 12:42.120
and you'll that this one will be the first one that pops up.

12:42.120 --> 12:44.110
It's actually got some good examples

12:45.096 --> 12:48.420
and application or use cases for different cost functions.

12:48.420 --> 12:50.310
So if you're interested to learn more about cost functions

12:50.310 --> 12:51.603
check out this article.

12:52.917 --> 12:54.390
And on that note, I hope you enjoy today's tutorial.

12:54.390 --> 12:56.040
I look forward to seeing you next time.

12:56.040 --> 12:58.203
Until then, enjoy Deep Learning.
