WEBVTT

00:00.240 --> 00:02.730
-: Hello and welcome back to the course on deep learning.

00:02.730 --> 00:03.630
All right, today we're talking

00:03.630 --> 00:05.160
about the activation function.

00:05.160 --> 00:07.020
Let's get straight into it.

00:07.020 --> 00:08.640
So this is where we left off previously.

00:08.640 --> 00:12.000
We talked about the structure of one neuron.

00:12.000 --> 00:13.260
So there it is in the middle.

00:13.260 --> 00:15.750
We know that it has some inputs values coming in,

00:15.750 --> 00:17.100
it's got some weights.

00:17.100 --> 00:18.750
Then it adds up,

00:18.750 --> 00:22.110
it calculates the weight of some of those inputs

00:22.110 --> 00:23.730
and then applies the activation function.

00:23.730 --> 00:27.870
And on step three, it passes on the signal

00:27.870 --> 00:28.703
to the next neuron.

00:28.703 --> 00:29.790
And that's what we're talking about today.

00:29.790 --> 00:31.710
We're talking about the value

00:31.710 --> 00:32.880
that is going to be passed over,

00:32.880 --> 00:34.650
so we're talking about the activation function

00:34.650 --> 00:36.360
that's being applied.

00:36.360 --> 00:39.300
So what options do we have for the activation function?

00:39.300 --> 00:40.133
Well, we're going to look

00:40.133 --> 00:42.510
at four different types of activation functions

00:42.510 --> 00:43.440
that you could choose from.

00:43.440 --> 00:44.273
Of course there are more

00:44.273 --> 00:45.660
different types of activation functions

00:45.660 --> 00:47.280
but these are the predominant ones

00:47.280 --> 00:48.270
that you'll be hearing about

00:48.270 --> 00:50.370
and that we'll be using in this course.

00:50.370 --> 00:53.100
So here is the threshold function.

00:53.100 --> 00:54.270
This is what it looks like.

00:54.270 --> 00:58.920
So on the X axis you have the weighted sum of inputs.

00:58.920 --> 01:03.920
On the Y axis, you have just the values from zero to one.

01:03.960 --> 01:06.420
And basically the threshold function

01:06.420 --> 01:07.920
is a very simple type of function

01:07.920 --> 01:12.920
where if the value is less than zero,

01:13.470 --> 01:16.860
then the threshold function passes on zero.

01:16.860 --> 01:20.250
If the value is more than zero or equal to zero

01:20.250 --> 01:22.950
then threshold function passes on a one.

01:22.950 --> 01:26.970
So it's basically kind of like yes, no type of function.

01:26.970 --> 01:29.130
Very, very straightforward,

01:29.130 --> 01:32.160
very kind of like rigid type of function.

01:32.160 --> 01:33.510
Either yes or no,

01:33.510 --> 01:35.040
no other options.

01:35.040 --> 01:36.180
So there you go, that's how it works.

01:36.180 --> 01:37.410
Very simple function.

01:37.410 --> 01:40.617
Let's move on to something a bit more complex now.

01:40.617 --> 01:42.720
The sigmoid function,

01:42.720 --> 01:46.020
very interesting formula that we have here.

01:46.020 --> 01:47.940
You'll see just now there it is,

01:47.940 --> 01:51.510
one divided by one plus e the power of minus x.

01:51.510 --> 01:52.800
Whereas in this case, of course,

01:52.800 --> 01:57.800
X is the value of the weighted sums.

01:57.870 --> 02:02.610
And so yeah, so that this is what the sigmoid looks like.

02:02.610 --> 02:05.040
It's a function which is used

02:05.040 --> 02:07.860
in the logistic regression if you recall,

02:07.860 --> 02:09.510
from the machine learning course.

02:09.510 --> 02:10.890
So what is good about this function

02:10.890 --> 02:12.030
is that it is smooth.

02:12.030 --> 02:14.940
Unlike the threshold function,

02:14.940 --> 02:18.060
this one doesn't have those kinks in its curve,

02:18.060 --> 02:21.720
and therefore is just nice and smooth, gradual progression.

02:21.720 --> 02:25.650
So anything below zero, it just like drops off,

02:25.650 --> 02:30.090
above zero it approximates towards one.

02:30.090 --> 02:33.510
And this sigmoid function is very useful

02:33.510 --> 02:35.610
in the final layer in the output layer,

02:35.610 --> 02:38.910
especially when you are trying to predict probabilities.

02:38.910 --> 02:41.160
And we'll see that throughout this course.

02:41.160 --> 02:43.170
And then we've got the rectifier function.

02:43.170 --> 02:46.080
Rectifier function, even though it has a kink

02:46.080 --> 02:48.990
is one of the most popular functions

02:48.990 --> 02:50.910
for artificial neural network.

02:50.910 --> 02:53.850
So it goes all the way to zero.

02:53.850 --> 02:58.170
It's zero and then from there it gradually progresses

02:58.170 --> 03:01.710
as the input value increases as well.

03:01.710 --> 03:03.390
And we'll see that throughout the course.

03:03.390 --> 03:05.040
We'll see that in other intuition tutorials

03:05.040 --> 03:07.560
and we'll also see that how we use this function

03:07.560 --> 03:09.720
in the practical side of the course.

03:09.720 --> 03:12.090
And I will comment on this a bit more

03:12.090 --> 03:13.560
in a few slides from now.

03:13.560 --> 03:15.000
So just remember that rectifier function

03:15.000 --> 03:17.220
is one of the most used functions

03:17.220 --> 03:18.990
in artificial neural networks.

03:18.990 --> 03:21.270
And finally, we've got one more function

03:21.270 --> 03:22.800
that you will probably hear about.

03:22.800 --> 03:25.230
It's the hyperbolic tangent function.

03:25.230 --> 03:27.360
It's very similar to the sigmoid function,

03:27.360 --> 03:32.360
but here the hyperbolic tangent function goes below zero.

03:32.400 --> 03:35.670
So the values go from zero to one

03:35.670 --> 03:36.630
or approximately to one,

03:36.630 --> 03:39.720
and go from zero to minus one on the other side.

03:39.720 --> 03:42.360
And that can be useful in some applications.

03:42.360 --> 03:44.790
So we're not going to go into too much depth

03:44.790 --> 03:45.780
on each one of these functions.

03:45.780 --> 03:48.450
I just wanted to acquaint you of them

03:48.450 --> 03:50.130
so that you know what they look like

03:50.130 --> 03:51.750
and what they're called.

03:51.750 --> 03:54.390
If you'd like to get some additional reading,

03:54.390 --> 03:59.323
then check out this paper by Xavier Glorot called,

04:02.167 --> 04:05.760
"Deep Sparse Rectifier Neural Networks" 2011 paper.

04:05.760 --> 04:08.010
And there you will find out

04:08.010 --> 04:11.010
exactly why the rectifier function

04:11.010 --> 04:14.250
is such a valuable function,

04:14.250 --> 04:16.350
why it's so popular used.

04:16.350 --> 04:17.760
But nevertheless,

04:17.760 --> 04:20.670
for now you don't really need to know all of those things.

04:20.670 --> 04:22.500
For now, we just get to start applying them.

04:22.500 --> 04:24.240
We just start using them more and more and more.

04:24.240 --> 04:26.010
And so when you feel comfortable

04:26.010 --> 04:28.620
with the practical side of things,

04:28.620 --> 04:31.620
then you can go and refer to this paper,

04:31.620 --> 04:32.940
and then you'll be able

04:32.940 --> 04:35.340
to soak in that knowledge much quicker

04:35.340 --> 04:37.200
and it'll make much more sense.

04:37.200 --> 04:38.340
So, but just keep this in mind

04:38.340 --> 04:39.270
that when you're ready,

04:39.270 --> 04:40.650
when you feel that you're ready

04:40.650 --> 04:42.810
then you can go and refer to this paper

04:42.810 --> 04:45.480
and get some valuable knowledge from there.

04:45.480 --> 04:48.330
So just to quickly recap,

04:48.330 --> 04:50.790
we have the threshold activation function,

04:50.790 --> 04:52.560
which looks like this.

04:52.560 --> 04:54.150
The sigmoid activation function,

04:54.150 --> 04:55.740
which looks like this.

04:55.740 --> 04:57.720
We have the rectifier function

04:57.720 --> 05:00.510
and we have the hyperbolic tangent function.

05:00.510 --> 05:02.310
And now to finish off this tutorial,

05:02.310 --> 05:05.040
let's quickly do a few exercises.

05:05.040 --> 05:06.960
So we'll just do two quick exercises

05:06.960 --> 05:09.150
to help that knowledge sink in.

05:09.150 --> 05:11.610
So first one is we've got an example here

05:11.610 --> 05:14.580
of a neuro network with just one neuron,

05:14.580 --> 05:16.110
and then right away the output layer.

05:16.110 --> 05:17.580
And the question is,

05:17.580 --> 05:20.010
assuming that your dependent variable is binary,

05:20.010 --> 05:21.210
so it's either zero one,

05:21.210 --> 05:23.760
which threshold function would you use?

05:23.760 --> 05:26.070
So out of the ones that we've discussed;

05:26.070 --> 05:31.070
we have the threshold function, the sigmoid function,

05:31.167 --> 05:32.790
the rectifier function,

05:32.790 --> 05:35.630
and we've got the hyperbolic tangent function.

05:35.630 --> 05:37.980
In their raw forms,

05:37.980 --> 05:42.980
which ones would you be able to use for a binary variable?

05:43.950 --> 05:46.200
Okay, so the answers here are,

05:46.200 --> 05:49.350
there's two options that we can approach this with.

05:49.350 --> 05:52.380
So number one is the threshold activation function,

05:52.380 --> 05:54.780
because we know that it's between zero and one,

05:54.780 --> 05:57.600
and it gives you a zero under certain values,

05:57.600 --> 05:58.740
and then otherwise it gives you one.

05:58.740 --> 06:00.120
So it only can give you two values.

06:00.120 --> 06:04.410
It fits perfectly, fits this requirement perfectly

06:04.410 --> 06:06.000
and therefore you can just say

06:06.000 --> 06:11.000
Y equals the threshold function of your weighted sum.

06:13.020 --> 06:13.950
And that's it.

06:13.950 --> 06:16.410
And then the second case what you could use

06:16.410 --> 06:18.420
is the sigmoid activation function.

06:18.420 --> 06:20.550
It is actually also between zero and one,

06:20.550 --> 06:21.750
just what what we need.

06:21.750 --> 06:25.934
But at the same time, you want just zero one, right?

06:25.934 --> 06:28.980
It's not exactly the what we need,

06:28.980 --> 06:31.320
but in this case what you could use it as

06:31.320 --> 06:36.320
is the probability of Y being yes or no.

06:37.530 --> 06:40.170
So we want Y to be zero one,

06:40.170 --> 06:45.170
but instead we'll say that the sigmoid activation function

06:45.840 --> 06:50.840
tells us the probability of Y being equal to one.

06:51.870 --> 06:55.170
So basically the closer you get to the top,

06:55.170 --> 06:57.990
the more likely it is that this is indeed

06:57.990 --> 07:00.410
a one or a yes, rather than a no.

07:00.410 --> 07:02.490
And yeah, so that's very similar

07:02.490 --> 07:04.890
to the logistic regression approach.

07:04.890 --> 07:07.500
And those are just two examples

07:07.500 --> 07:09.990
of if you have a binary variable.

07:09.990 --> 07:12.840
And now let's have a look at another practical application.

07:12.840 --> 07:15.330
Let's have a look at how all this would play out

07:15.330 --> 07:17.400
if we had a neural network like this.

07:17.400 --> 07:21.090
So in the first input layer we have some inputs.

07:21.090 --> 07:23.790
They're sent off to our first hidden layer,

07:23.790 --> 07:26.100
and then an activation function is applied.

07:26.100 --> 07:27.960
And usually what you would apply here

07:27.960 --> 07:29.130
and what you'll see throughout this course

07:29.130 --> 07:32.850
is we would apply a rectifier activation function.

07:32.850 --> 07:34.560
So it would look something like that.

07:34.560 --> 07:36.600
We apply the rectifier activation function

07:36.600 --> 07:38.130
and then from there,

07:38.130 --> 07:41.850
the signals would be passed on to the output layer

07:41.850 --> 07:45.060
where the sigmoid activation function would be applied.

07:45.060 --> 07:46.830
And that would be our final output.

07:46.830 --> 07:49.080
And that could predict a probability, for instance.

07:49.080 --> 07:51.000
So this combination is gonna be quite common

07:51.000 --> 07:54.570
where in the hidden layers we apply the rectifier function

07:54.570 --> 07:58.890
and then in the output layer we apply the sigmoid function.

07:58.890 --> 07:59.850
So there we go,

07:59.850 --> 08:01.410
hope you enjoy today's tutorial.

08:01.410 --> 08:03.180
Now you are quite well versed

08:03.180 --> 08:05.040
in the four different types of activation functions,

08:05.040 --> 08:08.130
and you will get some hands on practical experience

08:08.130 --> 08:09.450
with them throughout this course.

08:09.450 --> 08:12.240
We'll be using them all over the place,

08:12.240 --> 08:13.980
so you'll get to know them quite intimately

08:13.980 --> 08:16.560
and you should be quite comfortable with them.

08:16.560 --> 08:18.510
But for now, this is the knowledge

08:18.510 --> 08:21.000
that you need to progress and understand

08:21.000 --> 08:23.910
what is going to be happening further down in this course.

08:23.910 --> 08:26.910
And on that note, I look forward seeing you next time.

08:26.910 --> 08:28.743
Until then, enjoy deep learning.
