WEBVTT

00:00.510 --> 00:02.820
-: Hello and welcome back to the course on deep learning.

00:02.820 --> 00:06.300
Today we're finally at step number four, full connection.

00:06.300 --> 00:08.430
So what is this step all about?

00:08.430 --> 00:10.320
Well, in this step

00:10.320 --> 00:14.520
we're adding a whole artificial neural network

00:14.520 --> 00:17.010
to our convolutional neural network.

00:17.010 --> 00:19.800
So, to all of the things that we've done so far,

00:19.800 --> 00:22.470
which are convolution, pooling, and flattening,

00:22.470 --> 00:27.330
now we're adding a whole new ANN on the back of that.

00:27.330 --> 00:29.010
How intense is that?

00:29.010 --> 00:30.090
That is just,

00:30.090 --> 00:32.580
that is something, that is definitely something.

00:32.580 --> 00:35.580
And, so here we've got the input layer,

00:35.580 --> 00:37.200
we've got a fully connected layer and output layer,

00:37.200 --> 00:38.100
and by the way,

00:38.100 --> 00:42.120
the fully connected layer in the artificial neural networks

00:42.120 --> 00:44.070
we used to call them hidden layers,

00:44.070 --> 00:45.960
and here we're calling them fully connected layers,

00:45.960 --> 00:48.750
because they are hidden layers, but at the same time,

00:48.750 --> 00:51.210
they're a more specific type of hidden layers,

00:51.210 --> 00:52.620
they're a fully connected layer.

00:52.620 --> 00:54.070
In artificial neural networks

00:55.560 --> 00:57.540
hidden layers don't have to be fully connected,

00:57.540 --> 00:59.760
whereas in convolution neural networks,

00:59.760 --> 01:02.010
we're gonna be using fully connected layers,

01:02.010 --> 01:04.110
and that's why they're generally called

01:04.110 --> 01:05.760
fully connected layers.

01:05.760 --> 01:09.180
And so basically, that whole column or vector of outputs

01:09.180 --> 01:10.890
that we have after the flattening,

01:10.890 --> 01:12.660
we are passing it into the input layer,

01:12.660 --> 01:15.330
and here we've got a very simplified example

01:15.330 --> 01:18.120
just for illustration purposes.

01:18.120 --> 01:20.400
And what the main purpose

01:20.400 --> 01:22.080
of the artificial neural network is,

01:22.080 --> 01:25.350
is to combine our features

01:25.350 --> 01:28.980
into more attributes that predict the classes even better.

01:28.980 --> 01:33.120
So we already, in our vector of outputs,

01:33.120 --> 01:37.890
in the flattened result from what we've already done,

01:37.890 --> 01:41.730
we have some features encoded in the numbers in that vector.

01:41.730 --> 01:44.610
And they can already do probably a pretty good job

01:44.610 --> 01:49.610
at predicting what's a class we're looking at,

01:49.800 --> 01:51.060
whether it's a dog or a cat,

01:51.060 --> 01:53.910
or whether it's a tumor or not a tumor, and so on.

01:53.910 --> 01:57.450
But at the same time, we know that we have this structure

01:57.450 --> 01:59.010
called artificial neural network,

01:59.010 --> 02:02.640
which is designed, which has a purpose

02:02.640 --> 02:06.510
of dealing with attributes and coming up,

02:06.510 --> 02:09.510
or dealing with features and coming up with new attributes,

02:09.510 --> 02:11.790
and combining attributes together

02:11.790 --> 02:15.570
to even better predict things

02:15.570 --> 02:16.800
that we're trying to predict.

02:16.800 --> 02:18.690
And we know that from the previous part,

02:18.690 --> 02:20.430
so why not leverage that?

02:20.430 --> 02:22.740
And that's exactly what the plan here is,

02:22.740 --> 02:25.320
so how about we pass on those values

02:25.320 --> 02:26.670
into an artificial neural network,

02:26.670 --> 02:29.490
and let it even further optimize everything

02:29.490 --> 02:30.600
that we're doing.

02:30.600 --> 02:31.890
And so that's what we're going to be doing,

02:31.890 --> 02:34.770
but let's look at a more realistic example,

02:34.770 --> 02:36.600
because this one is a bit too simple.

02:36.600 --> 02:40.230
So here we've got a better-looking

02:40.230 --> 02:41.280
artificial neural network,

02:41.280 --> 02:43.546
where we have five attributes on the inputs,

02:43.546 --> 02:45.990
then we have in the first hidden layer,

02:45.990 --> 02:47.430
we have six neurons,

02:47.430 --> 02:51.270
in the second, or in the second fully connected layer,

02:51.270 --> 02:53.820
we have eight neurons, and then we have two outputs,

02:53.820 --> 02:55.590
one for dog and one for cat.

02:55.590 --> 02:59.370
And so an important thing to talk,

02:59.370 --> 03:00.990
for us to talk about here is that,

03:00.990 --> 03:02.220
why do we have two outputs?

03:02.220 --> 03:05.130
We are kind of used to having only one output

03:05.130 --> 03:06.900
in our artificial neural networks.

03:06.900 --> 03:09.780
Well, one output is for kind of

03:09.780 --> 03:12.660
when you're predicting a numerical value,

03:12.660 --> 03:15.780
when you're running a regression type of problem.

03:15.780 --> 03:18.240
But when you're doing classification,

03:18.240 --> 03:20.430
you need an output per class.

03:20.430 --> 03:23.790
Except for the, exception is when you have just two classes.

03:23.790 --> 03:25.560
Like, we have two classes here, dog and cat,

03:25.560 --> 03:27.990
and we could've just done one output,

03:27.990 --> 03:29.370
and made it a binary output,

03:29.370 --> 03:32.550
and said one is a dog and zero is a cat,

03:32.550 --> 03:34.050
and that would've worked totally fine.

03:34.050 --> 03:35.010
And actually, in fact,

03:35.010 --> 03:37.890
you'll see had Lon do that in the practical tutorials,

03:37.890 --> 03:39.270
and that's how they'll be structured.

03:39.270 --> 03:44.100
But at the same time, if you have more than two categories,

03:44.100 --> 03:45.810
for instance, dogs, cats, and birds,

03:45.810 --> 03:49.680
then you have to have a neuron per every category,

03:49.680 --> 03:52.440
and that's why we're going to practice with two categories

03:52.440 --> 03:55.140
in this example, so that we know what to expect

03:55.140 --> 03:58.530
if we ever have more than two categories.

03:58.530 --> 04:00.030
And so what's going to be happening here?

04:00.030 --> 04:02.130
So we've already done all the groundwork,

04:02.130 --> 04:03.270
we've done the convolution,

04:03.270 --> 04:05.550
we've done the pooling and the flattening,

04:05.550 --> 04:07.950
and now the information's gonna go

04:07.950 --> 04:09.540
through the artificial neural network.

04:09.540 --> 04:12.300
So, let's have a look at how all that all happens.

04:12.300 --> 04:15.210
There's the information going through from the very start,

04:15.210 --> 04:18.240
from the moment when the image is processed,

04:18.240 --> 04:22.020
then convolved, then pooled, flattened,

04:22.020 --> 04:23.580
and then through the artificial neural network.

04:23.580 --> 04:25.200
All four steps.

04:25.200 --> 04:28.110
And then, a prediction is made.

04:28.110 --> 04:29.670
And we'll see how this happens in a moment,

04:29.670 --> 04:30.750
it will be very, very interesting.

04:30.750 --> 04:32.910
But for now, let's just say a prediction is made,

04:32.910 --> 04:36.090
and for instance, 80% that it's a dog,

04:36.090 --> 04:37.980
but it turns out to be a cat.

04:37.980 --> 04:40.560
And then an error is calculated,

04:40.560 --> 04:43.170
a, well, what we used to call a cross function

04:43.170 --> 04:45.960
in a artificial neural network,

04:45.960 --> 04:48.630
and we use the means squared error there,

04:48.630 --> 04:51.390
or in convolutional neural networks,

04:51.390 --> 04:52.890
it's called a loss function,

04:52.890 --> 04:57.630
and we use a cross entropy function for that.

04:57.630 --> 05:00.120
And we'll talk about cross entropy and mean squared errors

05:00.120 --> 05:02.820
in a separate tutorial, and how all that happens,

05:02.820 --> 05:06.540
but for now, let's just say we have a loss type of function,

05:06.540 --> 05:08.730
which tells us how well our network is performing,

05:08.730 --> 05:10.530
and we're trying to optimize it,

05:10.530 --> 05:13.710
or minimize that function to optimize our network.

05:13.710 --> 05:15.570
So, the error is calculated,

05:15.570 --> 05:17.700
and then it's back propagated through the network,

05:17.700 --> 05:20.460
just like we had in the artificial neural networks.

05:20.460 --> 05:24.330
Its back propagated, and some things are adjusted

05:24.330 --> 05:28.020
in the network to help optimize the performance.

05:28.020 --> 05:29.880
And the things that are adjusted are as usual,

05:29.880 --> 05:31.980
the weights in the artificial neural network parts,

05:31.980 --> 05:35.310
so the blue lines that you see here, the synopsis,

05:35.310 --> 05:38.970
then also another thing that is adjusted is

05:38.970 --> 05:41.700
the feature detectors.

05:41.700 --> 05:44.490
So we know that we're looking for features,

05:44.490 --> 05:46.140
but what if we're looking for the wrong features?

05:46.140 --> 05:48.060
What if this didn't work out,

05:48.060 --> 05:49.440
because the features are incorrect?

05:49.440 --> 05:51.240
And so the feature detectors,

05:51.240 --> 05:54.210
remember those little matrices that we had,

05:54.210 --> 05:57.053
that the three by three matrices,

05:57.053 --> 06:01.980
they are adjusted so that maybe next time it'll be better,

06:01.980 --> 06:03.870
and let's see what happens type of thing.

06:03.870 --> 06:08.070
And, but of course it's all done with a lot of science

06:08.070 --> 06:11.220
in the background, and a lot of math, and it's all done

06:11.220 --> 06:14.580
through a gradient descent with back propagation,

06:14.580 --> 06:17.970
so it's all not just random perturbations,

06:17.970 --> 06:21.210
it's actually very thought through how it's done.

06:21.210 --> 06:25.920
But nevertheless, the feature detectors are adjusted,

06:25.920 --> 06:26.910
the weights are adjusted,

06:26.910 --> 06:28.800
and this whole process happens again.

06:28.800 --> 06:30.750
And then again the areas back propagated,

06:30.750 --> 06:32.730
and this keeps going on and on and on,

06:32.730 --> 06:35.160
and that's how our network is optimized,

06:35.160 --> 06:38.190
that's how our network trains on the data.

06:38.190 --> 06:40.830
And the important thing here is that the data

06:40.830 --> 06:43.170
goes through the whole network from the very start

06:43.170 --> 06:44.430
to the very end.

06:44.430 --> 06:45.993
Then the error is compared,

06:47.310 --> 06:49.980
so the error is calculated, and then is back propagated.

06:49.980 --> 06:52.590
So same story as with artificial neural networks,

06:52.590 --> 06:55.626
just a a bit longer because of that whole,

06:55.626 --> 06:57.843
the first three steps that we already had.

06:58.980 --> 07:01.650
And now, let's have a look at the interesting part,

07:01.650 --> 07:02.520
the really interesting part.

07:02.520 --> 07:05.280
How do these two classes work, because,

07:05.280 --> 07:07.170
or how do these two output neurons work?

07:07.170 --> 07:10.620
Because before we've always kind of had one output neuron.

07:10.620 --> 07:11.993
What happens when we have two?

07:11.993 --> 07:14.610
How does this situation

07:14.610 --> 07:17.640
of classification of images play out?

07:17.640 --> 07:19.650
Well, let's start with the top neuron first.

07:19.650 --> 07:22.080
We're gonna start with the dog.

07:22.080 --> 07:25.020
How do we, the main purpose what we need to do first

07:25.020 --> 07:28.950
is we need to understand what weights to assign

07:28.950 --> 07:32.100
to all of these synopsis that connect to the dog

07:32.100 --> 07:35.820
so that we know which of the previous neurons

07:35.820 --> 07:37.860
are actually important for the dog.

07:37.860 --> 07:38.940
And let's see how that is done.

07:38.940 --> 07:42.120
So let's say hypothetically we've got these numbers,

07:42.120 --> 07:46.350
in our previous layer, previous fully connected

07:46.350 --> 07:47.970
or in the final fully connected layer.

07:47.970 --> 07:51.000
And again, these numbers can be absolutely anything,

07:51.000 --> 07:52.170
they don't have to be that,

07:52.170 --> 07:55.050
they can be any numbers, but just for arguments sake,

07:55.050 --> 07:58.920
we're going to agree that we are looking

07:58.920 --> 08:01.877
specifically at numbers between zero and one,

08:01.877 --> 08:05.670
so it's easier for us to argue these things and understand.

08:05.670 --> 08:09.870
And one means that that neuron was very confident

08:09.870 --> 08:11.580
that it found a certain feature.

08:11.580 --> 08:12.960
And zero is going to mean

08:12.960 --> 08:16.080
that that neuron didn't find a feature it's looking for.

08:16.080 --> 08:18.903
So, because at the end of the day, these neurons,

08:20.550 --> 08:23.610
I like, if anything else on this left side

08:23.610 --> 08:25.470
is just looking at features at an image.

08:25.470 --> 08:27.540
This is already very, very processed,

08:27.540 --> 08:29.790
but still it's detecting a certain feature

08:29.790 --> 08:32.310
or combination of features on the image, right?

08:32.310 --> 08:34.590
Before we, in the convolved step,

08:34.590 --> 08:36.270
we had kind of recognizable features,

08:36.270 --> 08:38.370
in the pool step, they're less recognizable,

08:38.370 --> 08:40.200
then they become even less recognizable,

08:40.200 --> 08:42.600
in the flatten image, and then they get combined and so on.

08:42.600 --> 08:45.314
But nevertheless, this we are talking about here

08:45.314 --> 08:47.910
certain features that are present image,

08:47.910 --> 08:48.743
or their combination.

08:48.743 --> 08:52.230
So a one, which has been passed, and this is important

08:52.230 --> 08:55.499
has been passed to both the dog and the cat at the same time

08:55.499 --> 08:57.120
to both the output neurons.

08:57.120 --> 09:00.780
So a one means that for us, for our argument,

09:00.780 --> 09:05.100
it means that this neuron is firing up,

09:05.100 --> 09:08.430
it's really rapidly detecting that feature

09:08.430 --> 09:10.140
that you know might be an eyebrow.

09:10.140 --> 09:12.630
It might be detecting this eyebrow for, again,

09:12.630 --> 09:15.240
for simplicity's sake is detecting this eyebrow,

09:15.240 --> 09:17.430
and it's communicating that to the dog neuron

09:17.430 --> 09:18.607
to the cat neuron saying,

09:18.607 --> 09:20.310
"I can see my eyebrow, I can see my eyebrow."

09:20.310 --> 09:22.470
And then it's up to the dog and the cat neuron

09:22.470 --> 09:25.860
to understand what that means for them, right?

09:25.860 --> 09:28.590
And so in this case, which neurons are firing up?

09:28.590 --> 09:30.630
These three neurons are firing up, the eyebrow,

09:30.630 --> 09:32.977
and let's say the nose is saying,

09:32.977 --> 09:34.680
"I can see, I can see a big nose,

09:34.680 --> 09:36.690
and I can see floppy ears."

09:36.690 --> 09:39.120
And it's saying that to the dog and to the cat.

09:39.120 --> 09:40.590
And then what the dog,

09:40.590 --> 09:43.440
and then what happens is we know that this is a dog.

09:43.440 --> 09:45.600
So the dog neuron knows that the answer

09:45.600 --> 09:49.140
it is actually a dog, because at the end,

09:49.140 --> 09:52.290
we are comparing to the picture or to the label

09:52.290 --> 09:53.640
on the picture, and it knows is a dog.

09:53.640 --> 09:55.897
So basically, the dog neurons gonna say,

09:55.897 --> 09:58.800
"Aha so I should be triggered in this case,

09:58.800 --> 10:00.450
so these are my neurons."

10:00.450 --> 10:02.850
They're telling this signal that they're sending

10:02.850 --> 10:04.650
to both to me, to the dog and

10:04.650 --> 10:09.000
to the cat is actually a indication for me that it is a dog.

10:09.000 --> 10:11.520
And throughout these lots, and lots, and lots of iterations,

10:11.520 --> 10:13.980
if this happens many times the dog will learn

10:13.980 --> 10:16.740
that these neurons do indeed fire

10:16.740 --> 10:19.650
up when the feature belongs to a dog.

10:19.650 --> 10:21.480
On the other hand, the cat neuron will know

10:21.480 --> 10:22.470
that it's not a cat,

10:22.470 --> 10:24.780
and it will know that this feature is firing up,

10:24.780 --> 10:26.970
And this neuron is telling me it can see floppy ears,

10:26.970 --> 10:28.560
floppy ears, floppy ears, but

10:28.560 --> 10:31.080
at the same time it's not a cat.

10:31.080 --> 10:33.300
So basically to me, that's the signal

10:33.300 --> 10:35.550
that I should ignore this neuron

10:35.550 --> 10:36.870
like, and the more that happens,

10:36.870 --> 10:39.480
the more the cat neuron is gonna ignore

10:39.480 --> 10:41.493
this neuron about the floppy ears.

10:42.390 --> 10:46.110
And so basically that, that's how

10:46.110 --> 10:49.110
through lots and lots of iterations, if this happens often,

10:49.110 --> 10:50.100
so this is just one example,

10:50.100 --> 10:53.370
but if this happens often, maybe a one, maybe a 0.8, 0.9,

10:53.370 --> 10:54.450
maybe sometimes it won't fire.

10:54.450 --> 10:57.900
But overall, on average, this neuron is lighting

10:57.900 --> 11:01.170
up very often when it is indeed a dog.

11:01.170 --> 11:03.930
The dog neuron will start attributing

11:03.930 --> 11:05.910
higher importance to this neuron.

11:05.910 --> 11:06.743
And so there we go.

11:06.743 --> 11:08.430
That's, that's how we're going to signify it.

11:08.430 --> 11:10.770
We're going to say that these three neurons

11:10.770 --> 11:12.900
through this iterative process

11:12.900 --> 11:15.780
with many, with many, many, many, many samples and many

11:15.780 --> 11:19.140
many epox remember, so sample is rowing your dataset,

11:19.140 --> 11:22.170
and epoch is when you go through your whole dataset again

11:22.170 --> 11:25.200
and again and again through lots and lots of iterations.

11:25.200 --> 11:29.640
This dog neuron learned that this eyebrow neuron,

11:29.640 --> 11:34.290
and this big nose neuron, and this floppy ear neuron,

11:34.290 --> 11:39.210
they all seem to really contribute very well

11:39.210 --> 11:42.750
to the classification of what it's looking for.

11:42.750 --> 11:44.460
And which is a dog.

11:44.460 --> 11:45.720
So, that's how it works.

11:45.720 --> 11:48.930
And again, these ears, and nose, and eyebrows,

11:48.930 --> 11:52.740
those are very very approximate

11:52.740 --> 11:55.680
or like, very far fetched examples

11:55.680 --> 11:58.520
because by this stage in this whole convolution,

11:58.520 --> 12:00.660
convolution neural network,

12:00.660 --> 12:03.570
it is completely unrecognizable what they're looking for.

12:03.570 --> 12:06.870
But at the same time, it is something in the features

12:06.870 --> 12:09.420
of dogs, or cats, or whatever you're classifying.

12:09.420 --> 12:11.220
And then, so let's move on to the next one.

12:11.220 --> 12:12.540
Now we're going to look at the cat neuron,

12:12.540 --> 12:13.710
but these we're going to remember.

12:13.710 --> 12:15.720
That these weights are,

12:15.720 --> 12:17.910
you know, they have, we've sorted out the dog.

12:17.910 --> 12:19.650
So the dog is kind of like, pretty much

12:19.650 --> 12:22.800
ignoring all these other neurons, 1, 2, 3, 4, 5.

12:22.800 --> 12:24.450
But it's really paying attention

12:24.450 --> 12:26.550
to what these three neurons are saying.

12:26.550 --> 12:28.470
Now, what is the cat listening to?

12:28.470 --> 12:31.445
Well, whenever it is actually a cat, right?

12:31.445 --> 12:34.320
The, this is, this is an example

12:34.320 --> 12:35.610
of a situation when it's actually a cat.

12:35.610 --> 12:38.940
So you'll see that this, these three neurons,

12:38.940 --> 12:42.510
0.9, 0.9 and one, they're saying something

12:42.510 --> 12:44.610
they're saying something to both the dog and the cat.

12:44.610 --> 12:46.562
And this is again, important to remember.

12:46.562 --> 12:48.630
So, this output signal goes both ways.

12:48.630 --> 12:49.798
It's the same, right?

12:49.798 --> 12:52.680
It's saying one to the dog, is saying one to the cat,

12:52.680 --> 12:54.300
but then it's up to the dog and to the cat

12:54.300 --> 12:57.450
to decide to whether to take

12:57.450 --> 13:00.480
into account that signal and learn from it or not.

13:00.480 --> 13:04.140
And both the dog and the cat can see that this is a photo,

13:04.140 --> 13:05.610
I should have put a photo of a cat here.

13:05.610 --> 13:07.080
But basically, imagine a photo of a cat.

13:07.080 --> 13:08.100
Both the dog and the cat

13:08.100 --> 13:10.170
can see that this is actually a cat.

13:10.170 --> 13:14.310
So basically, the dog is like, "Oh, okay, so these whiskers,

13:14.310 --> 13:19.310
and these pointy triangle ears, and this small size,"

13:20.370 --> 13:23.700
I guess or I don't, oh, maybe the, these, this type

13:23.700 --> 13:26.430
you know how cats have these things in their eyes.

13:26.430 --> 13:29.430
Their eyes are like little, they're not circles

13:29.430 --> 13:32.343
they're lines or something like that.

13:33.206 --> 13:34.039
Like cat eyes, basically.

13:34.039 --> 13:37.440
"These cat eyes, they're definitely not working for me.

13:37.440 --> 13:39.330
They're not helping me out predict,

13:39.330 --> 13:41.940
because every time these neurons light up

13:41.940 --> 13:44.220
the prediction is not what I'm looking for."

13:44.220 --> 13:45.817
On the other hand, the cat is like,

13:45.817 --> 13:46.890
"Hmm, that's interesting.

13:46.890 --> 13:49.770
Every time these, this one lights up, it's

13:49.770 --> 13:51.600
or most of the time it lights up,

13:51.600 --> 13:53.880
it matches my expectation.

13:53.880 --> 13:55.320
It matches what I'm looking for.

13:55.320 --> 13:58.140
Okay, I'm gonna listen to this guy more than this one.

13:58.140 --> 13:58.973
This one, same thing.

13:58.973 --> 14:01.900
Every time it lights up, or most of the times it lights up,

14:02.760 --> 14:06.180
I happen to get a good, I happen to be rewarded

14:06.180 --> 14:09.780
for my prediction because I get it right, it's a cat.

14:09.780 --> 14:11.430
Okay? So, I'm gonna listen to him more.

14:11.430 --> 14:15.090
You know, this one useless to me because he's not actually

14:15.090 --> 14:18.060
you know, like he's, he's not even lighting up.

14:18.060 --> 14:19.950
It's a cat, but it's, he's not lighting up.

14:19.950 --> 14:21.030
So the opposite is happening.

14:21.030 --> 14:23.490
And this one as well, it's a cat, but he is not lighting up

14:23.490 --> 14:24.450
so I'm not gonna listen to him.

14:24.450 --> 14:27.630
But this one, when he, what is this?

14:27.630 --> 14:30.630
The eyes, the cat eyes light up, we can see,

14:30.630 --> 14:33.300
I can see that it's a cat, it matches most of the time.

14:33.300 --> 14:35.490
So, I'm gonna learn from that and I'm going to listen

14:35.490 --> 14:38.760
to these three guys more often than not."

14:38.760 --> 14:40.200
And so basically the cat is listening

14:40.200 --> 14:43.200
to these three and it's ignoring the other five.

14:43.200 --> 14:48.200
And that is how these final neurons learn which neurons

14:48.930 --> 14:53.700
in the final fully connected layer to listen to.

14:53.700 --> 14:56.730
So the output neurons learn which of the fully,

14:56.730 --> 14:58.330
which of the final fully connected layer

14:58.330 --> 15:00.120
neurons to listen to.

15:00.120 --> 15:02.151
And that's how they understand,

15:02.151 --> 15:05.220
basically that's how the features are propagated

15:05.220 --> 15:08.940
through the network and conveyed to the output.

15:08.940 --> 15:11.190
And so, even though these features

15:11.190 --> 15:12.660
of course don't have that much meaning

15:12.660 --> 15:15.210
to them, like floppy ears or whiskers,

15:15.210 --> 15:18.377
at the same time they do have some distinctive,

15:18.377 --> 15:21.870
they are a distinctive feature of that specific class.

15:21.870 --> 15:23.610
And that's how the network is trained.

15:23.610 --> 15:24.960
Because we also during,

15:24.960 --> 15:27.270
remember during the back propagation process,

15:27.270 --> 15:29.790
we also adjust the feature detectors.

15:29.790 --> 15:32.400
So, if a feature is useless to the output,

15:32.400 --> 15:36.000
it's going to, it is going to probably be disregarded.

15:36.000 --> 15:37.860
Because this doesn't happen over one

15:37.860 --> 15:38.790
or two, this just happens

15:38.790 --> 15:41.010
through thousands and thousands of iterations.

15:41.010 --> 15:43.920
So with time, a feature that is useless

15:43.920 --> 15:45.780
to the network is going to be disregarded

15:45.780 --> 15:47.190
and replacement feature is useful.

15:47.190 --> 15:48.870
And so, at the end of the day,

15:48.870 --> 15:51.030
in this final layer of neurons,

15:51.030 --> 15:54.270
you are likely to have lots of features or combinations

15:54.270 --> 15:57.819
of features from the image that are indeed representative

15:57.819 --> 16:00.693
or descriptive of dogs and cats.

16:01.680 --> 16:04.232
And so, then once your network is trained up,

16:04.232 --> 16:06.660
then we, this is how it's applied.

16:06.660 --> 16:07.493
So this is the next step,

16:07.493 --> 16:08.850
like we've already trained up our network.

16:08.850 --> 16:09.750
Well this happens,

16:11.135 --> 16:13.050
Let's see what happens when the, this network is applied.

16:13.050 --> 16:13.883
So let's say we pass

16:13.883 --> 16:18.210
on an image of a dog, the values are propagated

16:18.210 --> 16:20.610
through our network, we get certain values.

16:20.610 --> 16:24.960
And so this time, the dog and the cat neurons don't know,

16:24.960 --> 16:26.730
they don't have the image of the dog here,

16:26.730 --> 16:28.470
they don't know that it's a dog or a cat.

16:28.470 --> 16:30.000
They have no idea what it is.

16:30.000 --> 16:32.528
But they have learned to listen

16:32.528 --> 16:35.670
to what is being shown here, right?

16:35.670 --> 16:36.870
They have learned to listen

16:36.870 --> 16:39.090
to, dog neuron listens to these three neurons,

16:39.090 --> 16:40.890
The cat neuro listens to these three.

16:40.890 --> 16:43.230
And so the dog neuron looks at 1, 2, 3

16:43.230 --> 16:44.910
and says, "Ah-ha, these are pretty high,

16:44.910 --> 16:47.820
so my probability is gonna be high that it's a dog."

16:47.820 --> 16:50.137
The cat neuron looks at these three and says,

16:50.137 --> 16:52.170
"Okay, these, this one is pretty high,

16:52.170 --> 16:54.330
but these are pretty low. Interesting.

16:54.330 --> 16:56.630
So, my probability is gonna be 0.05."

16:56.630 --> 17:00.120
And then, and that's where you get your prediction.

17:00.120 --> 17:02.730
So then, your first choice

17:02.730 --> 17:05.730
for this neural network is dog, second choice is cat.

17:05.730 --> 17:06.960
And that's pretty much it.

17:06.960 --> 17:08.430
So, the answer is dog.

17:08.430 --> 17:11.521
And same thing happens when you pass an image of a cat

17:11.521 --> 17:14.160
you get new values and you can see

17:14.160 --> 17:16.740
that even though this one's high, these ones are low.

17:16.740 --> 17:19.530
And for the cat, this one's high, this one's high

17:19.530 --> 17:20.640
and this one's a bit low.

17:20.640 --> 17:23.970
So the probability here might not be as great as previously,

17:23.970 --> 17:26.757
but still you can see that it's a cat of 79%.

17:26.757 --> 17:29.430
And so therefore, the neural network is gonna vote

17:29.430 --> 17:30.263
that it's a cat.

17:30.263 --> 17:31.096
And so basically,

17:31.096 --> 17:33.330
or the neural network is gonna conclude that it's a cat.

17:33.330 --> 17:36.330
Voting is a term that is used for these guys.

17:36.330 --> 17:39.680
So, these neurons in the final fully connected layer,

17:39.680 --> 17:42.840
they get to vote. And these are their votes.

17:42.840 --> 17:45.660
And again, we are just, for arguments sake

17:45.660 --> 17:47.160
putting values between zero and one here.

17:47.160 --> 17:49.620
These could be any values, but they get to vote,

17:49.620 --> 17:54.510
and then these weights are the importance of their votes.

17:54.510 --> 17:57.411
So this is, these purple weights are how

17:57.411 --> 18:00.540
the dog neuron views their votes.

18:00.540 --> 18:02.550
How much importance is it assigns

18:02.550 --> 18:04.830
to these neurons, and the to those votes.

18:04.830 --> 18:09.240
And this is how much importance the cats neuron assigns

18:09.240 --> 18:12.750
to these, to the votes of these neurons.

18:12.750 --> 18:15.420
And so these neurons vote, the dog and the cats based

18:15.420 --> 18:18.840
on their learned weights, they decide who to listen

18:18.840 --> 18:20.400
to, and then they make their predictions,

18:20.400 --> 18:22.470
and then the whole neural network concludes

18:22.470 --> 18:24.600
that this is, in this case, a cat.

18:24.600 --> 18:27.000
And then that's your conclusion.

18:27.000 --> 18:28.830
And that's how you get images like this,

18:28.830 --> 18:31.590
where you have a cheetah

18:31.590 --> 18:35.010
and then you have a cheetah class with

18:35.010 --> 18:36.840
you know, like a high, high probability.

18:36.840 --> 18:37.830
So this is, you know

18:37.830 --> 18:40.050
the probability that the network has predicted

18:40.050 --> 18:40.883
and these are low.

18:40.883 --> 18:42.120
But these still exist, because

18:42.120 --> 18:44.070
they're still kind of like a small chance

18:44.070 --> 18:47.430
the other neurons are also listening to their voters

18:47.430 --> 18:49.470
and they're saying, "Oh, maybe it's actually a leopard."

18:49.470 --> 18:51.690
And the bullet train, very, very probable here,

18:51.690 --> 18:53.280
scissors, you know, this one won

18:53.280 --> 18:55.099
but hand glass was very close second.

18:55.099 --> 18:58.350
And then stethoscope, because you could see

18:58.350 --> 19:01.290
like these guy, this, this neuron, the scissors neuron,

19:01.290 --> 19:02.730
the output scissors neuron,

19:02.730 --> 19:04.560
listened to its voters

19:04.560 --> 19:07.110
and it had the predominant vote overall,

19:07.110 --> 19:10.200
but then the hand glass had a good outcome as well.

19:10.200 --> 19:11.033
So, there we go.

19:11.033 --> 19:13.770
That's how the full connection works

19:13.770 --> 19:16.650
and how this is all, this all plays out together.

19:16.650 --> 19:18.780
I hope you enjoy today's tutorial.

19:18.780 --> 19:21.390
We're gonna summarize all of this in the summary as well.

19:21.390 --> 19:22.860
And I'll see you next time.

19:22.860 --> 19:24.903
Until then, enjoy deep learning.
