WEBVTT

00:01.140 --> 00:08.860
Now, let us have a look at this and type of item does, which we have in case of SUV, in case of SVM.

00:09.150 --> 00:13.460
We will be working with several hypovolemic that the government.

00:14.280 --> 00:18.810
So what is going on is a mathematical function.

00:19.710 --> 00:27.150
This function of government is to the data as input and transform it into a required form.

00:27.510 --> 00:27.990
So.

00:29.590 --> 00:37.240
Here you can see they have this data, which is in the form of the data, which is in two dimensions.

00:37.600 --> 00:45.070
And what now, what Cornell would do is government would transform this data from these two dimensions

00:45.280 --> 00:52.340
to three dimensions where it will add another dimension to it and move earlier.

00:52.510 --> 00:59.980
The problem was that because X and Y were two different dimensions and the data had no linear separation.

01:00.170 --> 01:05.290
So we were not able to separate these and we were not able to create a line between this.

01:06.380 --> 01:13.550
Now, with the presence of the colonel, what happens is the coroner transforms this data from this

01:13.550 --> 01:16.370
dual dimensions to three dimensions.

01:17.060 --> 01:27.500
Now, these three dimensions actually allow us to create a plane between X in Z axis, and this plane

01:27.650 --> 01:31.580
is able to separate both of these clusters.

01:32.420 --> 01:40.460
Now, when we transform this data back to the linear plane, back to the X and Y plane, we can see

01:40.610 --> 01:47.270
that the line, the plane, which we had all actually created, had formed the circle, the plane.

01:48.230 --> 01:54.530
Which is able to classify X and Y and separate X and Y clearly.

01:57.340 --> 02:04.630
Now, what is the coroner the coroner defines the distance measure between new data and the support

02:04.630 --> 02:12.310
vectors that this observation poses to the hyperbole in case of a corner, the corner will be defining

02:12.310 --> 02:14.880
the distance between the new data.

02:15.460 --> 02:22.780
So the government will make sure that the new data, which we are adding to this particular data, the

02:22.780 --> 02:30.880
data which we are actually testing is finite, is evaluated with respect to the government, with respect

02:30.880 --> 02:32.480
to the distance from the plane.

02:32.740 --> 02:39.730
So it is able to define the distance and based on that, it can actually find out which observation,

02:39.730 --> 02:43.060
which are closest to the type of plane and which are not.

02:44.250 --> 02:51.510
The values which are closest to the high poverty, they are known as the support workers and based on

02:51.510 --> 02:56.010
the support vectors, it actually defines how the classification needs to be.

02:57.240 --> 03:05.410
Now, the main task here is how far or how close the lines could be to these support vectors.

03:06.270 --> 03:09.660
That is the hyperbole in which we are creating.

03:09.840 --> 03:14.550
How far or how close can we allow them to get through this support mechanism?

03:15.030 --> 03:23.130
The closer the separation line will be to this support vector, the more accurate the.

03:24.780 --> 03:27.120
Classification will be back.

03:27.790 --> 03:33.740
Another thing is that there are a few more parameters which allow us to define these things.

03:34.140 --> 03:36.230
So let us have a look at that now.

03:36.240 --> 03:46.680
And during the hyper the higher dimensions, the corners will allow the data to be classified in a higher

03:46.680 --> 03:47.290
dimension.

03:47.460 --> 03:52.980
So there are corners which are present for the lower dimensions of their corners, which are present

03:52.980 --> 03:54.320
for the higher dimensions.

03:54.570 --> 03:59.750
So the corner like polynomial corner and a radial corner.

03:59.910 --> 04:04.350
These are two corners which are used for the higher dimension.

04:04.530 --> 04:08.220
So they transform the input space into higher dimension.

04:08.400 --> 04:13.400
If they have a linear kernel, it will not transform the data into higher dimension.

04:13.410 --> 04:16.530
It will just walk on the X and Y axis.

04:16.740 --> 04:20.790
But when we have the polynomial going, it will only.

04:21.960 --> 04:27.850
Wolf, it will allow to have more complex functions and considered more complex functions.

04:28.100 --> 04:35.250
Eileen O'Connor will only create a straight line so a linear kernel will only be able to create a straight

04:35.250 --> 04:45.000
line via the polynomial and radial kernel will be able to capture the complex features, also the complex

04:45.000 --> 04:46.440
shapes of the data.

04:46.920 --> 04:51.590
So that is why for higher dimensional data, we use polynomial and.

04:54.830 --> 05:02.030
Now, here we can see the visualization of the scene, so here when we have a linear kernel, we have

05:02.300 --> 05:05.550
a straight line of separation which is needed.

05:05.960 --> 05:13.540
So here we have just created a straight line and the straight line was able to classify the data accurately.

05:14.330 --> 05:21.540
But when the data is actually in a polynomial form, that is, it has different goals.

05:21.740 --> 05:31.250
So in that case, we can use the polynomial and the situations where the 5N is a lot more complex and

05:31.250 --> 05:33.500
cannot be figured in a single goal.

05:33.860 --> 05:38.740
So in those cases, we will be using these radial basis function.

05:39.620 --> 05:46.700
Here you can see that here we have small clusters of data separator and these are still belonging to

05:46.700 --> 05:47.660
the same classes.

05:47.660 --> 05:50.240
And around these we have the.

05:51.360 --> 05:57.270
Another class, so these are more accurately classified by the radial basis function.

05:58.540 --> 06:07.000
Now, how far these line of separations will be is decided by the hyper barometer.

06:07.990 --> 06:15.650
So that is see value now what they see value, see value is on the regularization parameter.

06:15.940 --> 06:18.670
Now, when the value of C is large.

06:20.080 --> 06:29.170
Then the value of sea is large, the smaller margin hyper plane will be considered since it stresses

06:29.320 --> 06:38.470
on getting all the brownie points classified vertically and a small value of sea will consider a large

06:38.470 --> 06:40.180
margin piper plane.

06:40.180 --> 06:44.630
And even if some points are misclassified by the hyperbole.

06:44.920 --> 06:49.600
So what does the sea value emphasize on the sea value?

06:49.600 --> 06:54.340
Emphasis on the distance from the support vectors.

06:54.790 --> 06:55.240
So.

06:57.360 --> 07:02.140
Here we have these vectors and here we have this hyperbole.

07:02.760 --> 07:11.010
Now, if the value of SI will be large, then we will be trying to have the distance smaller and then

07:11.010 --> 07:13.750
the value of SI will be small.

07:13.770 --> 07:19.400
Then we will try to maximize the distance from this site separating hyperbole.

07:19.770 --> 07:22.410
So let's have a look at the image related to that.

07:22.590 --> 07:27.510
So here you can see when sea value is very small, the distances get very high.

07:29.120 --> 07:32.630
So when the distances get very high, the.

07:33.700 --> 07:40.630
Few of the data point actually fall inside this and these might get misclassified.

07:42.320 --> 07:51.080
Why then did we increase the value of see, what happens is that the margin is kept very small and so

07:51.080 --> 07:55.230
that there is no point which gets misclassified.

07:55.400 --> 08:00.780
So the larger the value of C, the lesser the chance of misclassification.

08:01.640 --> 08:04.400
So here you can see the diagram later.

08:04.440 --> 08:06.730
So here we have created a clear line.

08:07.130 --> 08:07.910
I'm here.

08:08.090 --> 08:15.560
The value as the value of these is that more closely to the data classes, which we have.

08:16.880 --> 08:24.530
So the higher the value of C, the lesser the chances of misclassification, because then the margin

08:24.530 --> 08:26.570
separation will be less than.

08:28.680 --> 08:32.470
And a little bit by the media is the guerma value.

08:32.670 --> 08:40.890
Now, the gamma barometer defines how far the influence of each training observation affects the calculation

08:40.890 --> 08:42.710
of the optimal hyperbole.

08:43.080 --> 08:50.700
That is how far the influence of each training observation affects the calculation of optimal hyperbole.

08:51.000 --> 08:56.380
Now it defines how far the influence of a single training example reaches.

08:56.550 --> 09:04.330
So if they have one training example, then how much influence will it have on the hyper creation?

09:04.560 --> 09:11.370
If that is, each point will be having the area around it, which will be impacted.

09:12.590 --> 09:21.920
So if they have low value of Gamma, then it will then up Datapoint will have a very long impact for

09:21.920 --> 09:23.620
the hypovolemic creation.

09:23.870 --> 09:26.570
That is, it will have a father boundry.

09:26.750 --> 09:34.730
When we have a high value of gamma, it means that we want to have the type of plain-clothes it will

09:34.730 --> 09:37.650
allow the hyper plane to be created closer to it.

09:38.450 --> 09:43.090
Now the gamma barometer is the inverse of the radius of the influence.

09:43.280 --> 09:52.490
So the higher the value of gamma, the lower the radius will be lower the radius of the lensmen.

09:53.150 --> 09:53.600
So.

09:55.820 --> 10:03.890
Ladies, so here we have Gameau values small, so this means that the radius of influence will be high,

10:03.890 --> 10:07.640
so it will try to keep the boundary as far as possible.

10:07.880 --> 10:13.610
That is why the boundaries are straight in line and are farther away from the point.

10:14.000 --> 10:22.270
Why, as the value of gamma increases, the points allow the boundary to be closer to them.

10:22.550 --> 10:30.770
So because the impact of the point is low, so it is allowing the type of hydroplaned to be closer to

10:30.770 --> 10:30.910
it.

10:31.160 --> 10:33.830
Hence we have such boundaries.

10:36.290 --> 10:45.850
Now, let us have a look at the pros of this swim so it works really well with a clear margin of separation.

10:46.190 --> 10:55.190
So in case we have data which has a clear space of margin where the classes are well separated in those

10:55.190 --> 10:55.910
conditions.

10:57.060 --> 11:06.060
SVM will work very well next is it is effective in higher dimensional spaces because we have the corners

11:06.060 --> 11:13.740
available which can actually deal with high dimensions, then it is effective in cases where the number

11:13.740 --> 11:17.370
of dimension is greater than the number of samples.

11:18.240 --> 11:19.320
Why is this so?

11:19.500 --> 11:27.150
Let's say we have 20, 20 data points and we have more number of dimensions.

11:27.490 --> 11:31.590
Number of dimensions would be the number of features that we have now.

11:31.710 --> 11:38.700
If we have more number of features, there will be more number of dimensions available now because while

11:38.700 --> 11:46.620
classifying using SVM, we are not really worried about all the data points which are present, but

11:46.620 --> 11:50.340
only about four years hence.

11:50.550 --> 11:53.570
We don't really need that much amount of data.

11:55.070 --> 12:01.760
And SVM, there is no need to have large amount of data, would have a better performance for having

12:01.760 --> 12:07.370
a better performance, we need a better, cleaner separation than what we are going to have a better

12:07.390 --> 12:14.000
performance, but more a little bit more number of rules of data that would not really mean that we

12:14.000 --> 12:17.380
can have a better classified from the film.

12:17.630 --> 12:25.970
OK, so in this case, if it can work really well, if that is high amount of dimensions, because if

12:25.970 --> 12:31.940
there are a number of dimensions, then it will allow it to find those dimensions where the data can

12:31.940 --> 12:33.400
actually be separated.

12:35.230 --> 12:40.840
Then because the community which we are using is actually being used to convert this data into higher

12:40.840 --> 12:41.480
dimensions.

12:41.740 --> 12:45.670
Now, when we already have higher dimensions, then it is easier task.

12:46.300 --> 12:53.230
Now it uses a subset of training points in the decision function called support vector.

12:53.250 --> 12:55.000
So it is a memory efficient.

12:55.570 --> 13:01.090
Now, again, same point because there is no need of all the points.

13:01.090 --> 13:03.190
All the data points are not needed.

13:03.190 --> 13:06.070
We only need the support vectors.

13:06.220 --> 13:09.240
That design will be more efficient in nature.

13:09.430 --> 13:16.420
And again, it will be better for me and it will not be as complex as other algorithms, which will

13:16.420 --> 13:21.790
take a lot of time to do to actually get trained.

13:23.660 --> 13:30.220
Now, what are the funds now, it does not perform well when we have large data set because it required

13:30.230 --> 13:31.730
training, time is higher.

13:32.660 --> 13:38.030
OK, so it will not perform well when we have large data said, because it will have to compare their

13:38.030 --> 13:39.970
distances and then find out.

13:40.130 --> 13:42.220
So it is better to have a smaller data set.

13:42.230 --> 13:45.670
We can clearly find the line of separation, then use it now.

13:45.680 --> 13:53.690
It also does not perform well when the data has some more noises like this that classes are overlapping.

13:55.240 --> 14:02.140
Now, here you can see when the glasses are separated, it is working fine, but when it gets closer

14:02.140 --> 14:05.760
to the data, it actually is not able to classify it.

14:06.580 --> 14:15.190
So when the glasses wave is overlapping, then it will not be able to find a line, a line of separation,

14:15.370 --> 14:18.040
and then it will not find a line of separation.

14:18.040 --> 14:20.890
Then obviously it will not be able to classify it properly.

14:21.460 --> 14:27.930
Now, SVM does not directly provide a probability estimate, so we cannot find a probability of a point

14:27.940 --> 14:33.100
being in one particular glass because it can StreetLink give which glass it will belong to, but it

14:33.100 --> 14:39.940
will not give a probability because anyhow, it is calculating based on the distance from the from the

14:39.940 --> 14:41.320
line of separation rate.

14:41.500 --> 14:45.790
So it cannot really calculate the probability of it being in one glass.

14:46.540 --> 14:53.440
Now these are calculated using the expensive five fold cross-validation it is included in the related

14:53.800 --> 14:57.790
SVC method of the Vitan Cyclone Lababidi.

14:59.420 --> 15:07.510
So this is about SBM in the next session, we will have a look at the whole walkthrough of SBM on a

15:07.530 --> 15:13.450
tax data so that you will have a clearer picture of how this is implemented.
