WEBVTT

00:00.760 --> 00:07.540
High in this session, we will discuss about neural networks, so let us, first of all, see what is

00:07.540 --> 00:14.800
neural network now, neural network is a technique for building a computer program that learns from

00:14.800 --> 00:15.370
the data.

00:15.970 --> 00:19.190
Well, this is exactly what machine learning is also doing.

00:19.210 --> 00:21.610
So what is different here in this?

00:21.790 --> 00:28.510
It is based on very loosely on how we think the human brain works.

00:28.810 --> 00:34.090
So just how the human brain works in the same way neural networks also work.

00:36.060 --> 00:43.710
In our brain, there are several neurons which are connected with each other and the first message from

00:43.710 --> 00:51.630
one to another, which are decrypted by these neurons, and then we are to think or feel something.

00:52.180 --> 00:55.620
Now, how is it related to neural networks?

00:55.860 --> 01:03.600
That is in neural networks, a collection of software neurons are created and connected together, which

01:03.600 --> 01:06.510
allow them to actually send a message to each other.

01:06.960 --> 01:10.290
Now the network is asked to solve this particular problem.

01:10.290 --> 01:15.780
So we will provide a particular problem to the network and the network will try to solve the problem

01:15.990 --> 01:19.470
and it will try to solve it again and again, again and again.

01:19.710 --> 01:26.760
Then that time it actually will affect the learning process, feel that it is actually able to perform

01:26.760 --> 01:28.100
that particular calculation.

01:28.110 --> 01:30.510
It will keep on doing it again and again.

01:31.530 --> 01:39.510
So every vibration when it is doing this or learning or sending the message again and again, it actually

01:40.590 --> 01:47.820
keeps connecting the dots and actually keeps learning from the data and improves the predictions which

01:47.820 --> 01:57.810
it is making so that it actually needs to success and diminishes the particular patterns which it has

01:57.810 --> 02:01.610
learned to reduce failure cases.

02:02.370 --> 02:04.770
So this is what neural networks are doing.

02:05.130 --> 02:10.750
So how are we actually creating this and what does this actually mean?

02:11.040 --> 02:14.680
So let us have a concentration on this particular image here.

02:15.120 --> 02:18.860
So here we have a lot of circles connected.

02:20.290 --> 02:28.420
Now, these are the inputs, not the ones in the Greens, the blue nodes or other connections which

02:28.420 --> 02:33.240
we know connected to and let us see the red is the decision point.

02:33.790 --> 02:41.500
So here all we want to do is we want to send our input values that is different X values or the features

02:41.500 --> 02:45.970
or attributes from this particular input nodes.

02:47.140 --> 02:53.170
And these green neurons will pass on the information to the blue note.

02:53.980 --> 03:01.930
Now these blue laws will actually try to decrypt these patterns between these green notes and say now

03:01.930 --> 03:10.060
they're listening to the red node, which will finally decide what the output should be, if the value

03:10.060 --> 03:10.720
should be.

03:10.840 --> 03:16.990
Let us see if we are looking out for continuous value, then what value should be predicted, what price

03:16.990 --> 03:18.030
should we predicted?

03:18.250 --> 03:22.450
And in case we are looking for a class, then if it should be yes or no.

03:22.480 --> 03:27.280
So these kind of decisions can be made using this kind of a structure.

03:28.840 --> 03:36.580
Now, it might look tricky on how this will actually be of the information, but this is a very simplistic

03:36.580 --> 03:38.050
neural network example.

03:38.410 --> 03:45.460
Here we have only one little prison and there is another neural network, which is even more simpler

03:45.460 --> 03:48.190
than this, which we are looking at right now.

03:48.820 --> 03:50.410
What is that neural net?

03:50.470 --> 03:53.260
Well, you remember logistic regression.

03:53.530 --> 04:00.580
Our very own logistic regression is the very smallest example of a neural network.

04:02.450 --> 04:10.070
How that is possible, we will have a look at it in a while, but let us try to connect the dots and

04:10.070 --> 04:11.360
create a picture.

04:12.950 --> 04:18.770
So if we look at this particular image, you can see there are several input values.

04:20.130 --> 04:26.180
And these input values will be giving a certain probability to us.

04:27.640 --> 04:34.400
And how are we getting this probability this is a simple logistic regression in logistic regression.

04:34.420 --> 04:36.950
What do we do in logistic regression?

04:36.970 --> 04:45.790
We have several input values and we apply a sigmoid function on top of it that we can avoid this particular

04:45.790 --> 04:49.120
of the input values into our equation.

04:50.310 --> 04:57.780
What will the equation be, the equation will be the one next one, plus we do X to plus B that three

04:57.990 --> 05:00.830
three plus B for export and so on.

05:01.680 --> 05:07.570
Right then we apply the sigmoid function, Lonard or the logit function on it.

05:08.010 --> 05:09.780
What is the logistic function?

05:09.780 --> 05:18.660
The logistic function is one upon one plus E to the power minus the equation which we just formally.

05:20.060 --> 05:24.380
That is logit function, so that is how this.

05:25.900 --> 05:32.650
Swan also will result in the probability to us this is how the logistic regression look.

05:33.190 --> 05:34.990
Now let us have a look at it for the.

05:36.800 --> 05:43.200
Now, the neural network is an algorithm inspired by the neurons in our brain.

05:44.180 --> 05:54.170
It is designed to recognize patterns in complex data and often performs the best when recognizing patterns

05:54.170 --> 05:56.810
in audio images and videos.

05:57.560 --> 06:04.550
Now, neural networks are not something which we will go towards at the very beginning or when we are

06:04.550 --> 06:06.770
striving to solve a particular problem.

06:08.470 --> 06:16.840
Whenever we are starting to solve a particular problem in that situation, we will first of all implement

06:16.840 --> 06:24.160
the Lenie models, then we will implement the decision tree and after that we will decide if we want

06:24.160 --> 06:27.310
to go towards bigging or boosting.

06:28.400 --> 06:37.640
In case we have less amount of data, then these algorithms, which we have learned earlier, like SBM,

06:37.970 --> 06:45.560
Ganon, Nijhuis, a random forest she used, all these algorithms will be more favorable.

06:47.050 --> 06:57.730
But in case they are trying to have more precision or a better model, then in that case we will be

06:57.730 --> 06:59.750
going towards neural networks.

07:00.670 --> 07:03.500
But why are going towards neural networks?

07:03.760 --> 07:08.200
The main challenge we face is the lack of data.

07:09.550 --> 07:18.910
Because neural networks required a lot of data to brain, that is the reason why we cannot implement

07:18.910 --> 07:26.020
neural networks for all types of problems, because we need a lot of data to get a good output from

07:26.020 --> 07:27.210
the neural networks.

07:28.300 --> 07:39.520
Hence, it is used immediately for problems like audio or video data where we can have exhaustive amount

07:39.520 --> 07:43.320
of examples and then we will apply neural networks on them.

07:45.520 --> 07:55.930
Now, neural networks simply consist of neurons, which are also called as nodes, so here these small

07:55.930 --> 08:00.910
blocks, these small soakers, these are called nodes.

08:03.280 --> 08:08.190
Each neuron holds a number and each connection holds big.

08:09.050 --> 08:15.250
So each and every node will hold some information in it.

08:16.080 --> 08:22.470
It will hold some information in there and each connecting line will actually hold a vote on.

08:23.800 --> 08:25.750
So what does this week?

08:27.300 --> 08:34.020
What is the weight, these weights are nothing but the be the values which we have been talking about

08:34.020 --> 08:35.680
in logistic regression.

08:36.000 --> 08:42.180
So in case of logistic regression, we had better values with which we multiply these input values.

08:42.180 --> 08:42.430
Right.

08:42.960 --> 08:48.710
The V created the equation using using the beta values.

08:49.050 --> 08:50.250
So be not.

08:51.170 --> 09:01.340
Plus one x one plus VW x2 plus bigotry to extreme plus veto voting to X four plus veto firing the ex-wife.

09:01.490 --> 09:06.130
So here's the different veto values that we have are the rates only.

09:06.350 --> 09:09.230
So in case of neural networks, we will be calling them.

09:10.040 --> 09:15.460
So this is how it will look like each node will hold certain.

09:16.910 --> 09:25.450
Value on it, certainly dynamic, and each connecting line will actually hold some great value.

09:28.440 --> 09:35.820
Then they have activation functions which are usually non-linear, transforming functions which transform

09:35.820 --> 09:42.930
linear functions into a nonlinear form which is capable of capturing complex patterns now.

09:44.360 --> 09:52.220
Here we have the input data points that does X one, two, three and so on, here we have different

09:52.220 --> 09:56.670
values which are similar to the values which we had in the logistic regression.

09:57.140 --> 10:03.570
Then we have the net input function, which will basically, although some all of these things.

10:03.800 --> 10:06.650
So at this point, what will we have at this point?

10:06.660 --> 10:19.130
The equation will be w more or less because of the assumption W one X one plus W2 x2 plus W three extra

10:19.970 --> 10:21.160
W maksym.

10:21.950 --> 10:22.260
Right.

10:22.580 --> 10:23.870
This is what we will have.

10:24.260 --> 10:28.220
Then we have this activation function, which is nothing but.

10:29.540 --> 10:37.280
The logistic function or the logic, logic function or the sigmoid function, which we have been discussing

10:37.280 --> 10:42.560
about the equation, which was one upon one minus E to the power.

10:43.860 --> 10:51.360
B, where the B is the combination of this particular equation, a combination of these, so that this

10:51.570 --> 10:59.970
activation function now it's strictly says that the activation functions are usually nonlinear, transforming

10:59.970 --> 11:00.460
functions.

11:00.690 --> 11:09.050
So that means that this activation function could be any other function except other than the logic

11:09.060 --> 11:09.850
function also.

11:10.050 --> 11:14.040
But how will we decide what the value of the activation function should be?

11:15.600 --> 11:25.260
So consider this first thing is if we are applying this logistic function or the logic function or the

11:25.260 --> 11:27.090
sigmoid function, which we call it.

11:27.420 --> 11:31.320
So in that case, what does logistic regression give us?

11:32.690 --> 11:40.280
Because we are seeing this neural network, the very basic neural network, resembles the logistic regression.

11:40.490 --> 11:47.180
So and here we can clearly see that the equation which we are forming is similar to the logistic regression.

11:47.660 --> 11:53.450
So what does the output of the logistic regression, the output of the logistic regression is a probability.

11:55.590 --> 11:58.500
Which is usually ranging from zero to one.

11:59.920 --> 12:09.010
This means that in case we want to get probability as an output or we want to solve a classification

12:09.010 --> 12:16.080
problem, in that case, we can use this logic function based on the sigmoid function.

12:16.720 --> 12:23.740
But in case we want to find out the continuous value, then we should not use this function because

12:23.740 --> 12:26.450
this will convert the range of the data.

12:27.280 --> 12:33.520
You remember during discussion of the linear and logistic regression, we said that the equation of

12:33.520 --> 12:43.360
the line, the equation is converted to logistic equation into a global equation because of this activation

12:43.360 --> 12:43.810
function.

12:43.810 --> 12:50.680
Only if you want, you can go back to the theory of logistic regression, have a look at it and come

12:50.680 --> 12:52.170
back and see that.

12:52.330 --> 13:01.090
What I'm saying is exactly Guddi, that we have converted this linear equation into a nonlinear equation.

13:01.270 --> 13:07.330
That is something which is having a range from zero to one using this sigmoid function.

13:09.240 --> 13:17.670
Hence, in case we want to find out of continuous volume, that we want to make a prediction on a continuous

13:17.670 --> 13:24.210
value, that is, if we want to solve the immigration problem, then we don't need to have this sigmoid

13:24.210 --> 13:27.050
function at the end of the neural network.

13:28.190 --> 13:35.750
So this is one guideline that is if we are trying to solve an aggression problem, then by the end of

13:35.750 --> 13:40.910
the neural network, no matter how many layers we have here, no matter what we have on the left in

13:40.920 --> 13:47.510
sight, we will not have a sigmoid function here because the sigmoid function will change the output

13:47.780 --> 13:54.980
from a digression problem to a classification problem, that it will change the function from a linear

13:54.980 --> 13:57.550
function, the logistic function.

13:57.830 --> 13:59.550
So we don't want to do that.

13:59.570 --> 14:00.590
That is why we will.

14:01.710 --> 14:07.650
Remove this, but in case you want to have a class overclassification problem, then we will keep this

14:07.650 --> 14:08.550
logic function.

14:10.230 --> 14:18.480
Another thing to keep in mind, this, in case we are solving a problem where we are deciding if all

14:18.480 --> 14:21.870
animal is a gag or a dog.

14:23.430 --> 14:31.320
OK, this is the problem that we have, if a type of animal is a cat or dog, then how we will be solving

14:31.320 --> 14:31.440
it.

14:31.800 --> 14:39.240
So this one logic will be able to give us an answer at all if the animal is a cat or not, OK?

14:40.120 --> 14:48.820
So for solving if all animal is a cat or dog, then we need to have two lords instead of one.

14:49.630 --> 14:55.540
So in that case, what we will be having is in that case, we will have something like this.

14:56.290 --> 14:58.710
So we will have another function here.

14:59.680 --> 15:03.250
This is one node here and we have another node here.

15:06.360 --> 15:09.440
And these will be connected to this one also.

15:09.450 --> 15:12.780
So one would be connected to this and another would be.

15:14.020 --> 15:15.910
Connected to this one also.

15:17.470 --> 15:19.930
So these will also be connected to this one.

15:30.110 --> 15:32.220
And this will again be a logic function.

15:32.690 --> 15:34.130
So now what will happen?

15:35.230 --> 15:40.600
This one will give answer if it is a cat or not a get.

15:41.420 --> 15:45.560
And here, this one, Valenciennes, if something is up, doc.

15:46.900 --> 15:47.960
Or not all.

15:49.930 --> 15:55.050
OK, so this Wilhelmsen, if this is a or not, again, and this will answer.

15:55.210 --> 15:55.870
This is a guy.

15:56.290 --> 15:57.120
Well, not at all.

15:57.310 --> 16:02.560
So in case the animal is a cat, then the output will be.

16:03.660 --> 16:04.590
One here.

16:05.680 --> 16:12.810
And something near to zero here, and if the animal is a dog, then the output will be zero here.

16:13.660 --> 16:14.380
And one.

16:16.150 --> 16:18.330
OK, so this is what we will be doing.

16:20.020 --> 16:27.400
So always remember that seeing the logic function will give you one set of only X or not the X that

16:27.400 --> 16:34.840
this guy or not look at, if you're trying to find out if there is another class present or not, then

16:34.840 --> 16:40.300
it will have to be dog or not or dog, which will be a part of the class here.

16:42.150 --> 16:49.050
Now, let us have a look at this, so how this logistic regression looks like so this was the equation

16:49.050 --> 16:56.310
of logistic regression, physical to one upon one plus, one minus be done on plus with our next one,

16:56.310 --> 16:57.390
plus that we do.

16:57.810 --> 17:06.540
So this comes out of the blue note, plus the blue one makes one so the blue can be replaced with Vitara.

17:06.660 --> 17:08.700
Rita has can be replaced with the W.

17:09.180 --> 17:13.800
So here we have the blue note plus W one X1 plus WBAY x2.

17:14.700 --> 17:18.000
So this is the equation which we have got from this.

17:18.510 --> 17:24.420
The combination and this combination is when fed into this sigmoid function.

17:27.050 --> 17:28.540
At this point of time.

17:31.350 --> 17:32.580
This is the equation.

17:33.790 --> 17:39.880
And after it is the dysfunction, the equation changes to sigmoid of.

17:41.080 --> 17:48.170
W w an example of this is nothing but distinguished.

17:50.220 --> 17:57.890
So this is actually one on one plus one minus W W one X one does W do it.

17:58.410 --> 18:03.030
So this is how neural network is similar to logistic regression.

18:04.180 --> 18:11.470
But how does it actually help us, because this is a very simple logistic regression and this will be

18:11.470 --> 18:18.360
able to solve only a few problems and it will be solving only immediate problems.

18:18.730 --> 18:22.150
So how can we actually use it for complex problems?

18:22.570 --> 18:28.030
So for using it for complex problems, we will be adding several layers to this structure.

18:28.570 --> 18:35.470
So now, instead of having just this kind of a simple structure, we will be having the framework for

18:35.470 --> 18:41.740
those structures so we can have a more complex structure in comparison to this one.

18:44.580 --> 18:49.270
So the activation function, which is being used here, which is the sigmoid function.

18:49.620 --> 18:56.940
This could be replaced by several other activation functions like that is of this unique step function,

18:57.300 --> 19:02.540
which is used in Perceptron, which looks like this.

19:02.940 --> 19:04.710
Then there is a fine function.

19:06.460 --> 19:07.660
It looks like this.

19:09.260 --> 19:15.290
Then there is a linear function which looks like this office wisely function, which looks like this

19:15.650 --> 19:22.470
logistic function, which looks like this hyperbolic engine which looks like this, then we have rectifier

19:22.550 --> 19:28.700
below, which looks like this, and then we have rectifiers of plus which looks like this.

19:29.100 --> 19:30.980
Now these.

19:34.670 --> 19:40.500
Equations which we have, these equations can be used in different places.

19:40.700 --> 19:43.990
So the most frequently used is the logistics.

19:44.750 --> 19:50.870
And the second most recently used is the venue function, which is used very highly.

19:51.350 --> 19:58.960
And all of these functions, we have a linear function which may or may not be used.

19:59.330 --> 20:03.380
Why do we prefer using linear function is because.

20:03.530 --> 20:04.700
Let's have a look.

20:06.410 --> 20:07.370
Now let's us see.

20:07.640 --> 20:15.500
I am the beauty of linear regression or logistic regression, and instead of using the sigmoid function,

20:15.500 --> 20:20.960
if I apply a linear function, then the output of this will be linear equation.

20:21.230 --> 20:27.860
So after this I will get something like a human plus WOMEX one plus the mutilates do so.

20:28.100 --> 20:35.050
All the same equation will be that if I apply a linear function on the only difference of the population

20:35.100 --> 20:35.660
with this.

20:36.410 --> 20:43.100
Now again, if I might apply something to it and add something to it again, the equation which I will

20:43.100 --> 20:45.960
be getting would be another linear equation.

20:46.850 --> 20:55.580
So if I keep on applying linear equations on this on this collection of the detail on this function,

20:55.760 --> 21:02.240
then the output which I will be getting will be a linear function only, which is not ideal model.

21:03.760 --> 21:09.340
If you want it, you'll have a linear function, then there is nothing better than applying a linear

21:09.340 --> 21:09.940
regression.

21:10.960 --> 21:16.340
We could have simply applied regardless of English and I would have loved the way work now.

21:16.810 --> 21:18.910
So we don't want to do that.

21:19.540 --> 21:26.520
The thing is, we should always use a nonlinear, transforming function so that we can capture different

21:26.890 --> 21:28.840
complex for the patterns.

21:29.080 --> 21:36.940
If we will not have a complex function created, then it will not be able to capture complex patterns

21:36.940 --> 21:38.000
present in the data.

21:38.440 --> 21:41.560
It will only be able to capture linear functions in the data.

21:41.860 --> 21:45.820
That is why we use the nonlinear functions.

21:47.550 --> 21:51.530
So now let us have a look at this particular unilingual.

21:51.780 --> 21:55.320
So in this particular neural network, we have two hidden lives.

21:55.680 --> 21:56.560
Olian what?

21:56.560 --> 22:00.330
Kihei did not have any hidden here.

22:00.330 --> 22:02.140
We did not have any the levees.

22:02.370 --> 22:05.210
We had the input and the output here.

22:06.250 --> 22:10.250
So here we have the booklet and the booklet.

22:10.480 --> 22:13.360
Apart from that, I have done this.

22:14.400 --> 22:23.340
Now, notice that each point of the input layer is connected to both the points and then led one and

22:23.340 --> 22:31.470
both the points in the hidden layer one up and with all the value of nodes and then layer.

22:32.600 --> 22:39.980
Now, for the then, these will be connected with the output again, they will be connected with all

22:39.980 --> 22:43.070
the less so.

22:46.000 --> 22:51.550
So it will look something like this, so all of these will be connected to each of.

22:55.340 --> 22:57.740
So after that, the output will be given.

23:05.410 --> 23:13.690
Now, let's look further now to solve this, what will happen is, first of all, then we will be working

23:13.690 --> 23:14.230
on this.

23:14.390 --> 23:22.390
What will happen is that the input values have we have certain values and those values will be multiplied

23:22.390 --> 23:28.480
with their corresponding weight and added together, I pushed into this sigmoid function.

23:29.590 --> 23:34.290
Similarly, the same thing will be pushed in to the other side of.

23:37.810 --> 23:43.190
So what comes inside this will be somewhat similar like here.

23:43.330 --> 23:47.850
Initially, we could have seen this or we can have random beats also.

23:48.010 --> 23:53.950
So based on the value of the weights, we will have different things coming in to these microchips than

23:53.950 --> 24:00.950
from the sigmoid functions, which will be multiplied from all the lines of juggling connected with

24:00.970 --> 24:01.650
each of these.

24:02.350 --> 24:05.800
I will push into this another sigmoid.

24:08.800 --> 24:11.890
So from this point, the values will go up.

24:12.610 --> 24:14.650
Now let us have a look at this.

24:14.920 --> 24:21.580
So this is the team we would like to be following for the what of the value of the loot?

24:21.860 --> 24:30.730
Because in this entire time, this entire neural network, the more the input values we know of the

24:30.730 --> 24:35.250
output values, we also know what function we will be applying.

24:35.470 --> 24:38.700
But the only thing that we don't know is the value of the beads.

24:38.710 --> 24:39.890
But in these lines.

24:40.420 --> 24:43.210
So we need to find the values of the.

24:43.930 --> 24:49.120
So how we can find out the values of the game is actually designed by Jean.

24:50.630 --> 24:54.320
So let us have a look at this now, this looks like a new rolnick.

24:55.720 --> 25:00.770
So you really would imagine that this is the first move which has value?

25:01.090 --> 25:01.370
It is.

25:01.450 --> 25:04.390
And then there is another input more.

25:05.890 --> 25:12.070
Now, this function, this is one of the sigmoid functions, just imagine this, this is a sigmoid function

25:12.070 --> 25:16.340
and in it we get the biopsy, which is equal to a cross.

25:17.170 --> 25:20.470
Here we have video your which is equal to the last one.

25:21.160 --> 25:28.590
Now, both of these are coming together at the end, which are given by this small, multiplied by the

25:28.600 --> 25:28.990
lower.

25:31.200 --> 25:39.150
So what will happen is that we will be able to calculate these values by we will be able to calculate

25:39.150 --> 25:47.880
the value of C by month by multiplying, and so we can easily find the value of C then the value of

25:47.880 --> 25:48.750
the envy's.

25:48.750 --> 25:55.290
We can buy the plus one, then value E can be easily calculated by C and to be.

25:55.500 --> 25:57.250
So this will be easy for us to get.

25:58.140 --> 25:59.610
But the thing is.

26:01.350 --> 26:09.330
We have to find out the values of the backlights, we want to find out the values of the backbone,

26:09.330 --> 26:15.650
if we want to find out if the only know the value of in will be the only know the value of it.

26:15.960 --> 26:20.760
And then we want to find out the value and then how can we do that?

26:21.120 --> 26:21.620
What then?

26:21.640 --> 26:30.630
We can have to avoid the bad word that so maybe away from it and B the words E that this word for word.

26:32.490 --> 26:38.590
And when we are going from E the West and it is for the back for us.

26:39.150 --> 26:47.860
So what happens in the back for us to find out the value of we can be competitive so that day will feed

26:47.880 --> 26:49.980
with respect to even be equal to one.

26:50.760 --> 26:56.840
Now that they feed, the respect you see will be equal then by then.

26:56.850 --> 27:01.770
See, I'm very critical of it with respect to the will be equal.

27:01.770 --> 27:03.450
Do then evade the.

27:05.240 --> 27:13.970
Now, if we want to find out that initiative of E now, we still don't know the value of so to calculate

27:13.970 --> 27:17.620
the value of the again, we will have to find out the value of the.

27:17.960 --> 27:19.220
So how will we do that?

27:19.670 --> 27:22.170
So they by identity is already there.

27:22.670 --> 27:30.200
Now, before I know the value of Baldie, the info you can give that these people will then be Oborne

27:30.200 --> 27:30.710
baby.

27:32.420 --> 27:43.280
So the value of then E with respect to Dalbavie will be valued by the including the Vivendi.

27:44.480 --> 27:49.640
Similarly, the value of E with respect to E will be.

27:50.940 --> 27:56.760
They eat by the sea into the sea by baby.

27:58.440 --> 28:09.030
Similarly, they eat by the sea in the Bering Sea by then, so it will have a combination of both and

28:09.030 --> 28:09.590
B here.

28:11.560 --> 28:20.440
Because she's a product of both, and so this is what Barillas, which means that if Y is equal to your

28:20.440 --> 28:27.790
faith and ecology of you, so human body is equal to your faith that this is a function of C.

28:29.660 --> 28:39.020
And so if this is a function of Sandy and I want to find out the value or the change in it, then I

28:39.020 --> 28:44.870
can find it out with respect to the change in Sydney FC and they are functional.

28:45.140 --> 28:50.120
And then the change in E can be found both in terms of.

28:51.970 --> 28:56.230
Changing it with respect to C, with respect to a.

28:57.130 --> 29:03.640
That is changing with respect to the with respect to the so this can be done in this particular way

29:03.640 --> 29:09.490
that is dealt with by delegates is equal to the value in value by benitz.

29:10.900 --> 29:17.230
So the intermediate, which is present that become come like this just the way it is something you.

29:18.440 --> 29:25.790
So they need vitality in daily vitality is equal to the Leviathan, so here we have the intermediate

29:25.790 --> 29:26.530
evidence.

29:26.840 --> 29:27.650
They love the.

29:28.750 --> 29:30.430
So this is what Jane Willis.

29:31.640 --> 29:35.420
Now, how we actually use this, we will think of something.

29:37.540 --> 29:38.080
So.

29:39.650 --> 29:41.640
Legacy won back propagation.

29:42.290 --> 29:50.870
So back propagation is calculating the gradient approaching, we always start from the familiar and

29:50.870 --> 29:55.830
propagate backwards of the beats and biases for each year.

29:56.510 --> 30:01.580
So we start from the end note and for the words the.

30:02.710 --> 30:03.880
Input values.

30:05.020 --> 30:10.930
And by going towards the input values, we obey the values of the gates which are in.

30:12.730 --> 30:17.590
And based on the values of faith, we can actually find out the outward values.

30:17.800 --> 30:24.040
So the main thing here is that what happens is that this is the next book, which we have.

30:25.640 --> 30:30.230
And in this network, we have these input values, which we know of.

30:30.950 --> 30:35.150
We know these sigmoid functions and we know about these output values.

30:35.510 --> 30:38.810
But the only things which we don't know about are debates.

30:39.800 --> 30:46.700
Now, in one forward buzz, we can find out dysfunction and find out these values, and then we find

30:46.700 --> 30:47.150
out the.

30:48.380 --> 30:49.950
Predicted output value.

30:50.360 --> 30:58.340
So based on some random values, we can predict the output right now because these output values are

30:58.340 --> 31:01.910
actually not exactly equal to these output values.

31:02.300 --> 31:06.150
That is, the predicted values will be different from the actual value.

31:06.350 --> 31:08.180
So we will have to find out.

31:08.510 --> 31:16.700
We will have the gold in the back for addiction to obey the bits, because only of the weights can actually

31:16.700 --> 31:18.200
improve the output values.

31:19.190 --> 31:27.410
So what we do, we find out what is the change which needs to be done and we go back and based on the

31:27.410 --> 31:32.930
change, we backtrack what the value of these weights and inputs should be and correspondingly, the

31:32.930 --> 31:35.210
input of the value of the wheat.

31:35.900 --> 31:42.260
Now, again, the former vice principal and again, the input values will be multiplied by the updated

31:42.340 --> 31:43.130
values.

31:43.340 --> 31:46.250
And again, the sigmoid function will be calculated.

31:46.460 --> 31:50.740
And based on the updated sigmoid value, the output will be delivered.

31:51.710 --> 31:59.840
Now, again, if the output value is not equal to the actual output value, it will find the error and

31:59.990 --> 32:06.030
back propagation to these input values so that it can actually have these rates.

32:06.740 --> 32:08.930
So this is what will keep on going.

32:08.940 --> 32:17.390
It will all keep on going the forward and backward pass until it actually gets the output values, which

32:17.390 --> 32:19.340
are very close to the actual value.

32:20.940 --> 32:23.410
So this is what neural networks is all about.

32:23.730 --> 32:30.840
It is all about going forward and backward, but until we are able to improve the way it's a high.

32:33.150 --> 32:40.890
So we adjust the leads and biases throughout the network so that we get the desired output in the output.

32:41.490 --> 32:41.940
Now.

32:43.330 --> 32:47.260
This is what it looks like, so we have the input value.

32:48.090 --> 32:51.450
I'm the input value is multiplied by some one.

32:52.410 --> 32:58.800
Then indicate the while we have this activation function, so what goes inside this activation function

32:59.100 --> 33:03.660
in this activation function input into one who's.

33:04.670 --> 33:14.480
So after this one comes out, so we get activation, we get input value into the blue one applied activation

33:14.480 --> 33:17.750
function, and then what value comes out of this?

33:17.750 --> 33:23.660
We multiply, they'll be then again, activation function is applied on top of it.

33:23.780 --> 33:32.630
So we get the union function and then we again apply multiply it with the WP and then again apply the

33:32.630 --> 33:35.840
activation function and we get the answer from this one.

33:36.350 --> 33:37.190
So what does a.

33:38.420 --> 33:42.140
So now if we read this, this entire thing, what is the.

33:43.680 --> 33:53.280
The output is nothing but activation function of light on WTT multiplied by the result of the Hadler

33:53.280 --> 33:53.520
to.

33:54.940 --> 34:05.110
What is the result of his desire is nothing but activation function applied on the menu to include dessert

34:05.110 --> 34:06.210
of the Italian one.

34:07.930 --> 34:14.620
And what is he, the one who delivered one result is nothing like the blue one multiplied with the input.

34:15.760 --> 34:22.210
So you can see the again going forward, very similar to calculating the value and the back prediction

34:22.210 --> 34:26.260
also for calculating the input, values and the values.

34:26.880 --> 34:34.810
So in this particular deal, if we go in this particular process, so the output value will be activation

34:34.810 --> 34:44.170
of light on W3 in the activation of light on W2, the activation of light on the blue one in.

34:46.770 --> 34:53.880
You can read this for a while and actually see the entire thing, me, you can cause the video and have

34:53.880 --> 34:58.610
a look at it right on the understand, because this is what which will be creating the concept.

35:00.330 --> 35:07.530
Now, if people want to find out the change in the output value, what change we want to have in the

35:07.530 --> 35:12.840
output value to improve our prediction, then how would that be calculated?

35:13.200 --> 35:22.290
That will be calculated by then output by Delver, W1 that this change in output, one with respect

35:22.290 --> 35:23.880
to change in W1.

35:25.050 --> 35:29.860
How W1 will change actually impact the change in the output value.

35:30.420 --> 35:38.190
So what this bill is, we will do this will be will do change in output with respect to the delayed

35:38.190 --> 35:38.490
one.

35:39.580 --> 35:47.700
The change in the lead, one with respect to the one in two, changing the earlier one with respect

35:47.700 --> 35:49.020
to the blue one.

35:50.980 --> 35:54.970
We're just changing our vote with respect to the little one.

35:55.930 --> 36:05.260
And the change in the land won with respect to the land, one in the change in the land with respect

36:05.750 --> 36:07.860
to the land won with respect to the blue one.

36:08.560 --> 36:10.710
So it's just going this way.

36:17.350 --> 36:20.150
This is the entire equation now.

36:20.200 --> 36:23.320
We will try to understand this photo.

36:24.700 --> 36:25.270
So.

36:27.540 --> 36:31.050
For us, the output of activation function.

36:33.240 --> 36:43.170
Is they give the the world because we will be subtracting this output with the actual value so that

36:43.170 --> 36:44.540
they give us the added value.

36:45.810 --> 36:53.700
Now we want to minimize this, everybody, so what we will be doing is we will be back tracking this

36:53.700 --> 37:00.300
error value to W1 so that we can gradually update the value of the bits.

37:00.600 --> 37:01.870
So how are we going to do that?

37:02.250 --> 37:06.390
The change in error with respect to W1 is equal to.

37:07.410 --> 37:13.710
The change in attitude with respect to the everyone will be equal to the change in Ed with respect to

37:14.160 --> 37:15.090
the output.

37:16.610 --> 37:23.550
In go change in output with respect to the net include the change in the labor with respect to.

37:25.130 --> 37:31.430
Fewer didn't run into the change and he didn't run with respect to the new one, so this is what we

37:31.430 --> 37:31.790
are doing.

37:32.690 --> 37:33.200
This is.

37:34.380 --> 37:39.720
James and Ed, with respect to the blue one in blue jeans and Ed, with respect to output into output,

37:39.990 --> 37:46.800
with respect to to into change in hereinto with respect to one single change, including one with respect

37:46.800 --> 37:50.130
to the W one.

37:54.640 --> 38:00.070
So let us calculate the error values, we will calculate these.

38:01.090 --> 38:06.820
So in this scenario, let's consider the this as one of the.

38:07.870 --> 38:17.770
So for any length for the first year, this be the legacy, then this is the output layer or any layer

38:17.770 --> 38:18.780
intermediately.

38:19.090 --> 38:22.540
So in this year, there will be an activation function, the.

38:23.520 --> 38:25.980
And there will be an output exergy.

38:27.280 --> 38:28.300
For leered in.

38:29.330 --> 38:36.890
And there will be an input value that is the sigmoid function, the product of every time, everything

38:37.130 --> 38:46.340
as an object and what is in store and sees nothing but multiplication of X1 into W.M. plus X2 into the

38:46.340 --> 38:49.310
video plus extra into the battery from different modes.

38:50.900 --> 38:55.370
So in the case, we want to find out the change in.

38:56.190 --> 39:03.480
And if you want to find out the change in edit or with respect to the change in weight.

39:04.700 --> 39:10.040
That is the change in area with respect to the change in weight.

39:10.100 --> 39:13.960
So how will we do that again with the same function which we learned in the.

39:15.360 --> 39:22.980
So the change in L.A. with respect to we will be equal to the change in era with respect to this sigmoidal

39:22.980 --> 39:23.460
function.

39:26.260 --> 39:32.710
I multiplied by this sigmoid function change in sigmoid function with respect to the change in.

39:34.040 --> 39:34.460
So.

39:35.640 --> 39:36.790
What will this be?

39:37.140 --> 39:44.760
Now, the change in this is with respect to W. is nothing by expanding.

39:46.360 --> 39:47.170
What is this?

39:47.410 --> 39:54.500
This is this is nothing but Xingdou W and X is a constant W is the variable.

39:54.760 --> 40:03.940
So if we take a derivative of of a constant and the variable, then be out with no power on top of it.

40:03.940 --> 40:05.870
So the output will be the constant value.

40:06.370 --> 40:10.540
So we get X minus one, which is from the previous year.

40:12.220 --> 40:17.680
Now we need to find out the change in Ayda with respect to the.

40:19.250 --> 40:24.080
S, which is the sigmoidal the segment of this letter.

40:24.620 --> 40:25.610
So what is this?

40:26.110 --> 40:27.610
This is legacy.

40:27.660 --> 40:28.700
This is day.

40:28.910 --> 40:30.320
This is this Valentine's Day.

40:31.010 --> 40:32.510
Now, how do we find now?

40:33.490 --> 40:36.590
So let us say we want to find out for the final.

40:37.240 --> 40:45.370
So this will be equal to change and e with respect to this, so for the final leg changing, even with

40:45.370 --> 40:54.850
respect to S will be equal to now here, the what will be added value will be the function of X minus

40:54.850 --> 40:57.010
Y value for with.

40:58.450 --> 41:03.190
That is predictive value minus the actual value will split.

41:03.880 --> 41:06.500
Now, this is Gwen, is there.

41:06.560 --> 41:08.380
Now, what is this one?

41:11.380 --> 41:14.020
This X1 is nothing but.

41:15.370 --> 41:19.390
Sigmoid function of light or t, the function of light on is.

41:20.920 --> 41:21.300
Right.

41:21.400 --> 41:24.340
This is the definition of light on is what was.

41:26.300 --> 41:31.620
X is X, Y, plus W, one extra W2 this extra on.

41:32.060 --> 41:37.760
And this is what is going inside this, that the summation value which is going inside, this is the

41:37.760 --> 41:38.900
activation function.

41:40.530 --> 41:44.460
And X is the output value after applying the activation function.

41:45.060 --> 41:46.090
So what do we get?

41:46.500 --> 41:52.100
So it's it's and is nothing but sigmoid function applied on the summation value.

41:52.560 --> 41:53.010
So.

41:54.210 --> 42:03.600
Legacy of this dysfunction, this activation function is damage, then the very of and net is equal

42:03.600 --> 42:05.070
to one minus Tusker.

42:05.970 --> 42:08.730
So we can replace it, we can use this value.

42:11.330 --> 42:11.780
So.

42:12.730 --> 42:19.650
If you are checking for the previous year, this is for the last year, we are checking for the existence

42:19.660 --> 42:27.820
of one one previously, Daniel's enemy of those back to the previous years is we will do they love E

42:28.240 --> 42:37.720
with with respect to S Galavis, with respect to the previous year, X into the of previous years,

42:38.320 --> 42:47.500
with respect to the previous year when they this is the final let the previous year and the year before

42:47.500 --> 42:48.510
that date.

42:48.790 --> 42:55.540
So this is how it will just keep on going globally previously and it will just keep on finding out values

42:55.540 --> 42:57.400
from the one layer behind it.

42:57.820 --> 43:01.450
And based on that, it will keep on finding of different values.

43:02.870 --> 43:08.900
So what is the end goal is in which we will be following this, you know, we don't going to the entire

43:09.200 --> 43:15.620
pocket, only thing that we need to know is this entire thing, which is happening so easily.

43:15.620 --> 43:24.220
You will have something in common values and you will think some random values and build up over us

43:24.230 --> 43:32.240
so that the value, the outward value can be calculated now based on the output value and the actual

43:32.240 --> 43:32.690
value.

43:32.690 --> 43:34.650
Some in an amount that we.

43:35.300 --> 43:37.490
And this where we have to be in.

43:38.150 --> 43:46.760
So if we move this into whatever change will be needed, that will be back propagated so that the value

43:46.850 --> 43:48.500
of the wealth can be a big.

43:49.930 --> 43:54.730
And this will be a big theme, the week of each and every live.

43:55.650 --> 44:02.660
So the values each and every year will be big for the week so that we get better.

44:03.330 --> 44:09.500
I think we'll just keep on doing this entire process again and again until you get the results.

44:11.740 --> 44:13.150
So what do we do?

44:13.180 --> 44:20.440
We to make the leads to a small random number and like all the biases, because, you know, we have

44:20.440 --> 44:22.710
not discussed what biases.

44:23.170 --> 44:25.340
So I have this for now.

44:25.360 --> 44:26.590
So what is bias?

44:30.990 --> 44:35.170
Now, here's what we have, is we have this equation.

44:35.580 --> 44:44.970
What does this equation have this equation have W plus W one X, Y plus W two x2.

44:45.420 --> 44:47.850
So these are the V.

44:49.260 --> 44:52.500
And this one, which is being played here.

44:54.220 --> 44:55.540
This is ordered by.

44:57.280 --> 45:00.900
So this is also one, though, which will be a big.

45:03.100 --> 45:05.650
OK, so this is called by.

45:10.510 --> 45:17.600
So the full report is for next March and the activation function.

45:17.920 --> 45:22.750
Now we will go forward and backward buzz by these forces.

45:22.750 --> 45:27.330
Will it be done or whatever they need that we will not take all the data.

45:27.340 --> 45:33.610
And one thing we will take the view that in batches, because it will take a lot of time every day,

45:33.610 --> 45:35.380
all the already that it once.

45:35.740 --> 45:38.470
So we will do this in many batches.

45:38.970 --> 45:43.830
OK, so we will forward, pass and calculate the activation value.

45:44.230 --> 45:52.840
Then we will back what and calculate the gradient and update the gradient by propagating backwards moving.

45:54.280 --> 46:02.590
And we will update the weights and the biases based on the gradient vector calculated from averaging

46:02.590 --> 46:03.750
over the that.

46:04.000 --> 46:10.840
So we will find out the dates by averaging the gradient vector from the many vectors.

46:11.020 --> 46:12.910
So that is what we will be doing.

46:14.020 --> 46:21.730
Now we will have a look at the court in the next session, so I hope that will make this entire thing

46:21.730 --> 46:27.220
a little more clearer so that you will be able to implement these of your own.

46:27.700 --> 46:29.590
So let's meet at the.
