WEBVTT

00:00.270 --> 00:01.020
Hello, everyone.

00:01.320 --> 00:04.170
So let's start working on a case study.

00:04.650 --> 00:11.610
And for this, we will be using a commonly used dataset, which is how how is price prediction?

00:12.240 --> 00:16.770
So let us import the basic libraries, which we will need.

00:20.760 --> 00:22.920
So we have important non-pay.

00:23.280 --> 00:24.300
US, you, my.

00:25.380 --> 00:33.360
And we have also important the data set which we will in needed for the latest, Jabe Schabel, the

00:33.360 --> 00:33.810
data.

00:33.810 --> 00:38.430
So we can simply say to you don't want to get the data sheet.

00:38.760 --> 00:43.800
So here you can see there are fourteen hundred and sixty rows and 81 columns, which we have.

00:44.370 --> 00:51.750
So let's have a look at the data first so we can safely thought or tend to view the data.

00:52.050 --> 01:01.020
But first, we can also set some properties in pandas so that we can view all the data points, at least

01:01.020 --> 01:02.040
all the columns.

01:15.060 --> 01:19.950
This will allow us to view all the columns and once we have said it to anyone.

01:20.400 --> 01:23.310
Now we will fund this.

01:23.970 --> 01:27.630
And you can see we are able to view all the data points at once.

01:27.990 --> 01:30.210
So what does the status quo?

01:30.270 --> 01:40.200
So this is a data regarding house prices and different parameters, which actually are present for estimating

01:40.200 --> 01:41.070
that house price.

01:41.460 --> 01:45.240
So this is the column sale price, which is the price of the house.

01:48.520 --> 01:52.500
Further, we have different properties regarding the house to be rented.

01:52.860 --> 01:58.380
This is simply an IED, which cannot which does not hold any specific information.

01:58.710 --> 02:01.320
It is just a unique identifier for each and every house.

02:01.980 --> 02:05.910
Then we have a subclass and a zoning.

02:05.940 --> 02:08.380
What is the log frontage?

02:08.430 --> 02:09.520
What is the area?

02:09.540 --> 02:10.680
What is the street?

02:11.760 --> 02:15.510
Which alleys on the sheep, Dequan?

02:15.510 --> 02:16.740
Door of the land?

02:17.130 --> 02:19.290
The utilities which are available?

02:19.410 --> 02:22.740
What is the neighborhood is what is the condition of the house?

02:23.040 --> 02:25.380
And several other details are present.

02:25.950 --> 02:33.570
You can also view delete a dictionary which is available with the data and using that.

02:33.570 --> 02:39.240
Additionally, you can easily find out what all the information is presented, this data.

02:39.690 --> 02:46.140
So usually what we do is we actually go through the entire data dictionary to understand what type of

02:46.140 --> 02:46.980
data we have.

02:47.400 --> 02:55.770
And for that, we look at the data frame itself and we try to identify what type of columns are present

02:56.250 --> 02:59.970
and different attributes of a particular dataset.

03:00.300 --> 03:08.410
So for that, letters, of course, have a look at the names of the data so we can simply say they,

03:08.490 --> 03:10.450
Don, don't them?

03:17.540 --> 03:19.630
And you can see the columns of the data.

03:20.630 --> 03:25.190
And for the letters, also have a look at different.

03:26.060 --> 03:33.510
You can see different type of information so we can say data to inform full view.

03:34.400 --> 03:35.810
What type of data is present?

03:35.810 --> 03:41.510
So you can see IDs, integer and subclasses and data that are fourteen point values also.

03:41.510 --> 03:44.660
And there are a lot of categorical columns which are present.

03:44.990 --> 03:46.760
You can see these are object.

03:47.390 --> 03:52.990
These are all categorical columns which have datatype as an object for there.

03:53.010 --> 03:55.740
You can also see the number of null values.

03:55.740 --> 04:00.780
So you can see that there are one four six zero, not null values here.

04:00.800 --> 04:04.490
You can see in the column there are a lot of null values present.

04:08.210 --> 04:11.840
Here again, in five place, there is a lot of null values present.

04:12.170 --> 04:15.650
So accordingly, you can decide which column you want to keep.

04:15.680 --> 04:23.600
Usually I will not keep any column which is having more than, let's say, 50 percent or more than 20

04:23.600 --> 04:25.280
percent, no data.

04:25.850 --> 04:27.860
So it's a big no.

04:27.890 --> 04:29.960
So we remove these columns.

04:31.340 --> 04:32.000
Flameless.

04:32.000 --> 04:38.120
Q You will see things, miscellaneous features.

04:38.390 --> 04:44.960
You can remove all of these because these will not really impact will not hold much information or do

04:44.990 --> 04:53.120
what you can do is you can have a view of the mean median more than important metrics of the numeric

04:53.120 --> 04:53.810
variables.

04:54.050 --> 04:59.480
So for that, what you can simply do is you can see a detente or that's great.

05:02.750 --> 05:08.540
And here you can see the count, the mean, the standard deviation, the minimum value, the maximum

05:08.540 --> 05:09.050
value.

05:09.380 --> 05:14.680
Then you can have a look at the twenty five percentile books and ten, seventy five, PopTech, whatever.

05:14.870 --> 05:18.050
Do they split and then you can accordingly have a look at that.

05:19.730 --> 05:27.750
You can see this seems to be a categorical value just Kadence.

05:28.070 --> 05:29.930
Here we have zero one, two, three.

05:29.940 --> 05:32.240
So maximum three bathrooms are present here.

05:32.750 --> 05:38.300
This is how but maximum to have but sufficient so that you have all the information which is present

05:38.300 --> 05:38.570
here.

05:38.900 --> 05:46.250
You can see the maximum price of our house is seven point five thousand.

05:46.490 --> 05:53.000
Well, if you see the minimum value, is this here?

05:53.660 --> 05:54.890
This is the minimum value.

05:56.870 --> 05:58.070
This is the minimum value.

05:58.490 --> 06:07.400
And the maximum value is very much different from the seventy five percent in value.

06:07.670 --> 06:12.380
So here you can easily see that the values are here.

06:12.390 --> 06:17.780
We have a lot of outliers, but we will be able to identify and analyze that later.

06:19.280 --> 06:20.570
Let's go further.

06:20.840 --> 06:32.000
And let's see the different sale prices and let us actually have a lot of the distribution of the sales

06:32.000 --> 06:32.540
prices.

06:34.870 --> 06:44.800
So for that, what we can do is we can simply say it's this thought, this song, which is the distribution

06:44.800 --> 06:48.930
block, and inside this, you can give data and in data.

06:48.930 --> 06:59.640
I want to have a look at the sales price column so I can put appeal and order.

07:00.940 --> 07:07.210
I can simply say not not sure.

07:08.190 --> 07:08.350
And

07:11.710 --> 07:17.480
now here you can easily see that this is the distribution.

07:17.500 --> 07:28.510
Initially, the data looks nicely distributed, but later you can see there are a lot of points which

07:28.510 --> 07:31.160
are lying towards the right thing.

07:31.480 --> 07:34.840
So here you can see that data is skewed in nature.

07:37.980 --> 07:41.710
For though, you can easily find out the mean median mode.

07:41.730 --> 07:48.660
Although we would we have seen the mean of these values, you can still find out the median, the mood,

07:50.040 --> 07:52.290
the skewness, the Cardoso's.

07:55.080 --> 07:56.610
So let's do that first.

07:59.160 --> 08:07.830
So here I have retrieved all the important metrics that is minimax mean, median Morde, standard deviation

08:07.830 --> 08:09.780
readings, cutlasses and skewness.

08:11.280 --> 08:14.700
So you can see it is highly lepta political protecting nature.

08:16.320 --> 08:17.580
It is highly skewed.

08:18.490 --> 08:21.630
So you can easily see these other data points, which we have.

08:22.110 --> 08:23.520
You see, this is the main value.

08:23.520 --> 08:24.870
This is the max value.

08:25.320 --> 08:26.760
This is the mean value.

08:27.420 --> 08:28.830
And this is the median value.

08:28.920 --> 08:32.760
You can easily see that the mean and the median value is almost similar in nature.

08:33.090 --> 08:38.820
So there is not much difference because here there are very few data points which are present in the

08:38.820 --> 08:39.180
team.

08:42.840 --> 08:45.030
So this is what we have got from this.

08:45.750 --> 08:53.670
Now, further, what we can do is we can try to find out the sample and population mean.

08:53.670 --> 08:57.730
Let's try to identify something from this particular data.

08:57.780 --> 09:02.690
Let's see if a sample will be similar to the population itself.

09:02.700 --> 09:04.440
So let's try to find that out.

09:06.240 --> 09:08.370
So for this, I can fix a.

09:08.710 --> 09:11.700
So let's say in Bidoun flying now.

09:12.780 --> 09:13.890
So it seemed.

09:14.850 --> 09:17.850
So here I am giving some seats, 23.

09:18.780 --> 09:22.460
And now let's take some sample values.

09:22.460 --> 09:25.660
So let's take oh, how many data points out there?

09:25.680 --> 09:29.220
So the shape is one four six zero.

09:29.240 --> 09:31.590
So let's take 500 data points from this.

09:34.020 --> 09:37.930
So we can simply see something.

09:45.600 --> 09:50.700
Each is equal to impede or random

09:55.580 --> 09:55.890
for.

10:12.740 --> 10:27.730
It's his price, and out of this, I want to get 500 values so I can see size is equal to 500.

10:28.500 --> 10:29.950
So this is my south village.

10:30.520 --> 10:39.040
And for this, I want to get the mean to what is to me with this, I can just simply pectus and find

10:39.040 --> 10:41.020
out the meaning of this.

10:42.400 --> 10:48.370
So here you can see that the sample price.

11:00.240 --> 11:03.510
It's nothing but one eight one six seven five.

11:04.170 --> 11:11.580
Now, if we look at the population mean we can basically find all the means we hear already we have

11:11.580 --> 11:11.740
it.

11:12.240 --> 11:14.700
So this will see something mean the third one.

11:15.270 --> 11:20.100
So this is the population mean and this is the sample mean.

11:20.100 --> 11:23.310
So you can see these are clearly very close to each other.

11:25.350 --> 11:32.490
Now, next, what we will be doing is we will try to find out a little more about it.

11:34.410 --> 11:37.470
Now, let us have a look at the few neighborhoods which are present.

11:39.090 --> 11:44.720
So for that, we can have a look at these numbers, the work.

11:49.070 --> 11:50.390
These are the names.

12:01.520 --> 12:04.520
So these are the different neighborhoods which are present.

12:05.000 --> 12:18.560
Now let's see for Edward if the prices in Edwards is similar to the prices in the entire

12:21.350 --> 12:21.840
place.

12:21.840 --> 12:31.100
So what we do is we can simply try to find the comparison between both of them and we can compare this.

12:33.290 --> 12:39.470
So for this, we will be performing the sectors because we are comparing a sample to the population

12:40.220 --> 12:43.610
and we want to find out if the sample is different from the population.

12:44.240 --> 12:49.880
We want to compare a particular neighborhood to the entire population, which we have.

12:49.910 --> 12:50.750
So let's do that.

12:51.110 --> 13:01.580
Before that, we will first need to import the library for the test so we can simply says from.

13:08.120 --> 13:11.260
From stats Mondelēz about the stats.

13:17.120 --> 13:17.990
With stand.

13:24.750 --> 13:33.290
Because it is so wildly important, said that we will be using this particular test to find out if the

13:33.860 --> 13:39.110
population is similar to the neighborhood or not, so we can save said.

13:39.980 --> 13:53.900
Well, and you do have that as an output from this test and say that this we will have to pass the population

13:53.900 --> 14:03.830
as well as the sample, so we can simply say X one is equal to data.

14:06.700 --> 14:06.950
They

14:13.540 --> 14:18.740
would be done a very good is equivalent to it punch.

14:24.290 --> 14:31.970
Now, from this, I want to get these sales price so and sees his price hill.

14:44.290 --> 14:51.850
Now, apart from this, I need to provide the entirety of space for the population so I can simply say

14:52.660 --> 15:00.680
values is equal to Peter still space.

15:02.410 --> 15:08.170
Let us again have a look at the documentation to be sure of it so we can see.

15:14.480 --> 15:17.600
His leftist.

15:29.400 --> 15:31.590
So you can see it takes value

15:35.160 --> 15:42.360
and the value has to be they mean, so we'll see.

15:42.840 --> 15:44.090
I mean.

15:49.920 --> 15:56.760
Now let's run this and also let us bring these Etoile and Biegun.

16:17.290 --> 16:28.240
So you can see that does did well is minus 12 and the P-value is very small, which means that the neighborhood,

16:28.690 --> 16:33.580
Edward, is significantly different from the other neighborhoods.

16:34.630 --> 16:41.440
Now, let us perform the same thing for another neighborhood so we can fix the entire thing from here

16:42.730 --> 16:45.910
and we can replace it with another neighborhood name.

16:49.870 --> 16:52.900
Let's take Boystown for now.

17:06.560 --> 17:08.280
And it's this.

17:09.180 --> 17:12.180
So here again, the value is very small.

17:13.440 --> 17:27.000
So you can easily see that the mean for the location, Edward, and the entire population is different.

17:27.420 --> 17:34.350
And similarly, the mean for the location or town is also significantly different from the entire population.

17:35.430 --> 17:38.340
So this is the data which we have.

17:38.340 --> 17:43.680
So we are just trying to see that the means would be different for both for the sample that we have

17:43.680 --> 17:46.740
picked for a particular neighborhood and for the entire population.

17:49.620 --> 17:56.490
Now, when we look at this data, we can easily see that the neighborhood is having a very important

17:56.490 --> 17:58.260
impact on the teams faces.

17:58.590 --> 18:03.570
So if there is one neighborhood, there would be different type of sales places, which would be pleasant

18:03.780 --> 18:04.800
in another neighborhood.

18:04.800 --> 18:08.610
There would be some other type of sales places would be which would be pleasant.

18:09.060 --> 18:13.470
Now, for the weekend here, we have compared to the population.

18:13.770 --> 18:20.630
We can also compare the city's prices and different factors amongst the neighborhoods.

18:20.640 --> 18:25.130
Is that because these will be two different detours?

18:25.210 --> 18:29.700
The one would be, for example, neighborhood and there would be for Vye Neighborhood.

18:30.090 --> 18:33.300
So here we are comparing two different samples.

18:33.810 --> 18:39.600
So based on two different samples, we can try to find out if those two samples are somewhat similar

18:39.600 --> 18:40.680
to each other or not.

18:41.130 --> 18:43.020
But that is also another thing which we can do.

18:43.320 --> 18:49.320
But whenever we are dealing with two different samples and not comparing with the population, that

18:49.320 --> 18:55.530
is the time when the other tests come into which that is like the disguise criticized.

18:55.800 --> 18:57.450
Those tests will come into picture.

18:57.750 --> 19:04.410
When we are talking about population, we will try to solve it using the Zetas because their test gives

19:04.790 --> 19:06.630
a population into consideration.

19:08.370 --> 19:15.060
Now, let's try to identify how many houses would I be able to get if I had a certain amount of money.

19:16.230 --> 19:24.510
So let's first find out the details about the neighborhood input.

19:28.520 --> 19:36.140
So here, let me get the data for Atwood and for the same.

19:37.190 --> 19:41.570
I will find out what the mean value and standard deviation.

19:42.380 --> 19:43.340
So the means

19:46.970 --> 19:47.480
to me

19:53.420 --> 19:58.250
and it would be the standard deviation.

19:59.160 --> 19:59.360
OK.

20:02.690 --> 20:07.550
Now, let us import that package from

20:11.650 --> 20:12.400
before.

20:15.050 --> 20:22.130
That's usually what we do is we imake on imports at the top of the file so that if someone else is looking

20:22.130 --> 20:25.580
at our code, they'll be able to find out more likely to be imported.

20:25.880 --> 20:30.060
So usually in take this example of your coffee.

20:30.620 --> 20:34.360
So that is find out the Z score.

20:34.370 --> 20:41.990
So here, as we know, that code is equal to the value.

20:42.170 --> 20:49.160
So let's say I want to find out how many houses I will be able to get if I see something that's 50000.

20:50.760 --> 20:53.120
Then I looked the mean from it

20:56.600 --> 21:00.920
and I divide it from the standard deviation.

21:02.720 --> 21:06.710
And I want to find out later the percentage.

21:07.310 --> 21:17.480
So the percentage can be from Goldman Sachs thought, ma'am, thought C, the F CVS will give you the

21:17.960 --> 21:23.350
probability, the percentage probability from the Z score.

21:23.840 --> 21:25.700
So we put into that scoring in.

21:33.170 --> 21:40.870
So here there is ninety nine point seven, five percent probability that you will get a house in Everwood

21:40.890 --> 21:43.640
if you are paying 250 thousand.

21:44.540 --> 21:47.540
Now let's decrease it a little bit.

21:47.540 --> 21:49.400
So let's say sort of 250 thousand.

21:49.400 --> 21:51.650
I'm willing to pay only 180 thousand.

21:52.250 --> 21:54.860
So let's see how much is the probability.

21:55.100 --> 22:02.210
So there is a B eight point four, six percentage that I will be able to find a good house for myself.

22:03.390 --> 22:09.620
Now, let's do the same and check for some other neighborhood.

22:10.130 --> 22:12.380
So let's have a look at.

22:13.640 --> 22:14.030
So I

22:19.220 --> 22:27.110
see there are very few houses there, so let's check what is to start with that particular place.

22:27.680 --> 22:33.200
So let's first check for 250 and drive this.

22:33.920 --> 22:36.510
So here you can see that is eighty seven point two.

22:36.530 --> 22:37.390
It was intense.

22:37.750 --> 22:40.320
And I'll get a house in science.

22:40.730 --> 22:44.420
So this means that the amount would have been higher.

22:44.420 --> 22:45.570
The range would have been higher.

22:45.590 --> 22:49.670
So let's try to look at the mean for this particular location.

22:50.330 --> 22:58.850
So, I mean, is a one eight six triple five that get the maximum value?

23:10.200 --> 23:12.620
See, here is a higher value.

23:12.920 --> 23:14.330
So let's put.

23:17.510 --> 23:25.550
So even after paying a much higher amount in comparison to input, there is a comparatively lower probability

23:25.550 --> 23:27.200
that I'll be able to get a house here.

23:27.560 --> 23:38.630
So as we see Ed Wood and Soyer have comparatively much difference in the max value and the values which

23:38.630 --> 23:47.000
I'm getting into that school, let's try to find out from the samples of both if these would be actually

23:47.000 --> 23:51.350
different from each other or are they are similar in nature.

23:51.560 --> 23:59.410
So if I take a sample from somewhere and if I did a sample from Edwards, let's try to find out if the

23:59.690 --> 24:03.970
those prices are significantly different in Edwards.

24:03.970 --> 24:04.010
And.

24:04.420 --> 24:05.780
So let's let's try that.

24:10.220 --> 24:16.130
So for that, first, we need to get the data for both the locations.

24:16.520 --> 24:23.410
So let's get it inside a piece.

24:24.770 --> 24:30.050
So we say E is equal to the data for soy

24:39.620 --> 24:44.940
and the beans equal to the data for October.

24:52.340 --> 24:57.450
Now, let's do a thesis on this independently so that.

24:58.560 --> 25:03.870
So the the other schools before independent.

25:04.640 --> 25:06.620
And we have kids in here.

25:07.250 --> 25:13.630
And we give amethysts zero and equal.

25:13.750 --> 25:16.010
Explain that.

25:18.200 --> 25:19.040
And it's from this.

25:21.830 --> 25:24.620
You can see that the P value is very low.

25:25.070 --> 25:31.370
Hence, we can see that these the prices in Soyer and in gold are significantly different.

25:31.790 --> 25:42.350
Now, let's actually try to find out that with how much amount I will be able to be 90 percent sure

25:42.350 --> 25:51.650
that I will get a house in, let's say, in Englewood or or down or any other location.

25:51.650 --> 25:53.150
So let's try to find that out.

25:53.810 --> 25:56.330
So let's do it below this one also.

25:58.400 --> 26:03.560
So, yes, let's use these same means and standard deviations.

26:04.460 --> 26:06.650
So now let's check foreswear.

26:06.920 --> 26:11.810
So let's see how in how much amount in what interval.

26:12.050 --> 26:16.370
I will be almost 90 percent sure to find out that I be able to get their house.

26:16.370 --> 26:16.490
And.

26:17.600 --> 26:27.950
So for that, I can simply do with the fact that they don't know what I mean.

26:27.960 --> 26:31.240
So I guess we need to provide how much was did we want to show it with?

26:31.350 --> 26:34.110
How much probability we want to be sure with.

26:34.530 --> 26:35.910
So let's say 90.

26:38.970 --> 26:41.710
Then we need to provide the details.

26:42.240 --> 26:46.610
So let's say we're looking for societies.

26:46.830 --> 26:49.660
So let's get new places for societies.

26:52.020 --> 26:53.940
Then we need to provide the means.

26:56.850 --> 27:00.390
And the skills would be the standard deviation.

27:03.150 --> 27:08.120
Now, we can simply run this.

27:08.190 --> 27:13.650
And these are the different values which we are getting.

27:17.720 --> 27:21.860
We have to provide the lens here, so we say, and then

27:26.450 --> 27:29.720
this will give us the DNA P-value.

27:30.200 --> 27:33.080
So here.

27:36.570 --> 27:40.180
You can see the blower and a button below.

27:40.830 --> 27:48.540
So we need to have at least two, seven, nine typifying to be sure that there is a 90 percent probability

27:48.540 --> 27:51.030
of getting them or getting the house here.

27:51.300 --> 27:53.520
So let's look at some other

27:56.580 --> 27:57.440
neighborhood.

27:57.610 --> 28:00.810
So let's check for Edwards.

28:01.260 --> 28:02.670
Let's run this entire thing.

28:02.730 --> 28:03.720
Put it Curtice.

28:11.110 --> 28:13.360
So we need to get assistance.

28:14.550 --> 28:24.580
The experts that strung this post so far, 90 percent probability, we need to have around

28:27.130 --> 28:28.300
200000.

28:29.590 --> 28:36.730
Let's get further and let's try to look at it so at some other way.

28:37.300 --> 28:50.050
So we actually saw that when we are comparing the neighborhood for approval for the actual sale price

28:50.050 --> 28:53.920
of the entire data, it is significantly different.

28:54.550 --> 28:57.590
Same is the case of Georgetown.

28:57.970 --> 29:00.320
So for both, it is significantly different.

29:00.340 --> 29:07.010
And we found that using these mechanisms now is at best is usually used when we are comparing with the

29:07.510 --> 29:09.760
of the population values.

29:10.000 --> 29:13.540
And then we don't really have these samples in hand.

29:13.870 --> 29:20.530
But when we're dealing with samples, then in that case, in sort of using this as a test, we should

29:20.530 --> 29:22.680
have used even to repeat this.

29:23.050 --> 29:25.360
So let's apply the same.

29:25.360 --> 29:33.760
Let's try to find out if the neighborhood sample for neighborhood Edwards or the neighbor down or it

29:33.940 --> 29:38.850
is significantly different from the entire population means.

29:39.070 --> 29:40.630
So let's try to do that.

29:40.630 --> 29:42.190
Let's try to find that out.

29:46.420 --> 29:49.720
So let's take this from here.

29:51.310 --> 29:54.100
Let's get this news.

29:57.070 --> 30:00.060
So now we are applying the one somebody asked.

30:07.180 --> 30:12.550
So let's get the scores and the bailout deal.

30:14.230 --> 30:27.490
And this we will be getting from these steps are the best ideas for one step, not just for one sample.

30:28.300 --> 30:30.040
Then we get the data.

30:30.040 --> 30:34.270
So the data would be just the same from here.

30:36.790 --> 30:38.680
Let's check for a good.

30:42.260 --> 30:45.140
And this will be the population us.

30:53.650 --> 30:55.780
Now, this is people me.

30:59.540 --> 31:05.630
And here we have the data, so instead of providing the entire population, we'll have to provide the

31:05.660 --> 31:06.350
sample here.

31:06.740 --> 31:16.220
So let's take a sample of, say, 60 values, taking a sample of 60 values.

31:17.000 --> 31:17.630
Now

31:20.150 --> 31:24.800
let's get the Peace Corps and the P-value.

31:46.550 --> 31:51.590
So here you can see again, the P value is significantly low.

31:51.980 --> 31:58.670
Hence, we can easily see the same thing, that the neighborhood advert is significantly different from

31:58.670 --> 32:01.940
the population on the basis of one sample fetus.

32:02.240 --> 32:07.560
Same thing we can do repetitively for different neighborhoods.

32:07.940 --> 32:10.370
Let's pick a different neighborhood.

32:10.370 --> 32:12.300
So let's see.

32:13.730 --> 32:22.630
For now, for proper iliakis this and insert over 60 Nesquik just to be here.

32:23.600 --> 32:24.710
What is the actual.

32:25.490 --> 32:27.260
That's fifty one so they can be put to you.

32:28.370 --> 32:35.390
So here you can see here the P value is zero point zero two five three.

32:35.720 --> 32:39.920
So this is not much different.

32:40.400 --> 32:44.300
But yes, it is marginally at the exact point.

32:44.540 --> 32:47.780
So it completely depend on the P value, which you are looking at.

32:48.080 --> 32:54.080
So if you're looking at, let's say, one percent, then it is significantly different.

32:54.380 --> 32:59.930
If you are looking at 2.5 or 2.5 percent, then it is almost that.

33:00.200 --> 33:03.480
So this is completely up to you how you look at it.

33:03.510 --> 33:09.740
And again, if you're looking at, let's say, five percent P value, then it is way below the five

33:09.740 --> 33:12.860
percent above the five percent P value.

33:12.860 --> 33:15.290
Then you cannot see that it is significantly different.

33:15.620 --> 33:20.750
So that's how you can easily evaluate using the one sample doing that.

33:20.990 --> 33:24.650
So it completely depends on the significance level that you're looking at.

33:26.030 --> 33:31.880
Now, let's look at this particular location.

33:33.650 --> 33:36.600
You can do your own analysis.

33:36.620 --> 33:42.020
I'm just giving you an idea of how you would perform an analysis, how you would do that even in sight.

33:42.410 --> 33:46.190
Now, let's take an example of, say, 60 years.

33:47.810 --> 33:54.290
Here you can see this is very clearly visible, that that is a significant difference between the amount

33:54.290 --> 33:55.730
of opposition and quality growth.

33:56.030 --> 33:58.520
So that's how you get an insight photofit.

33:59.090 --> 34:03.260
Now, for there, we can have look at different now.

34:04.040 --> 34:10.010
We have been comparing a particular neighborhood with the entire population.

34:10.310 --> 34:18.050
Now, in case we want to compare one neighborhood with another neighborhood, here, you can see this

34:18.140 --> 34:23.570
is how we do it from the independent Betis to compare to different neighborhoods.

34:23.900 --> 34:29.780
So this is only the which we have under the best umbrella.

34:30.230 --> 34:33.800
We have the B, the independent the best.

34:33.800 --> 34:36.380
We have the one sample, the test.

34:36.650 --> 34:39.740
So you can perform these features like this.

34:40.070 --> 34:43.020
Then apart from that, we will have prepared people.

34:43.040 --> 34:44.810
So then we have the bare details.

34:44.840 --> 34:51.980
It has to be done when we have some particular data, which is the same sample is being tested for in

34:51.980 --> 34:53.030
another time frame.

34:53.390 --> 35:00.980
So in that case, in this particular data, we don't have that example, but you can surely use that

35:00.980 --> 35:03.450
in any other case from any other dataset.

35:03.460 --> 35:04.970
So that is completely up to you.

35:06.560 --> 35:12.740
But just to give you a particular insight, let me try to give you an example from this particular data.

35:13.190 --> 35:18.330
So so let's say be down and get the value for each day.

35:25.250 --> 35:27.730
Now, the t the values in it.

35:27.770 --> 35:29.420
So these are the different values.

35:29.780 --> 35:32.960
Now, let me plot these for you in the same plot.

35:35.540 --> 35:39.350
So this might be a little time to generate the plot.

35:39.680 --> 35:43.000
So here you can see the plot for the year boot.

35:43.400 --> 35:48.740
Now, when you see this, you can see that there are few houses which were built earlier and major lead

35:48.740 --> 35:51.920
in on houses which are very neat here.

35:52.460 --> 35:57.010
So let's try to compare the houses which were built.

35:57.780 --> 36:01.500
Nineteen sixty and the houses which have occurred after nineteen sixty.

36:02.020 --> 36:07.550
So let's do that and compare using the independent dataset.

36:08.190 --> 36:13.530
So let's see A and B and let it be.

36:14.560 --> 36:16.590
Now let's get this condition.

36:16.950 --> 36:17.790
So.

36:26.700 --> 36:29.120
Let's get just as much from

36:32.550 --> 36:34.980
now, we need to compare the year before,

36:39.120 --> 36:50.130
and let's say we're looking for a less than equal to 1960, the same space for that.

36:51.120 --> 36:57.480
And the ones which are based off the 1960s included in 1960.

36:58.230 --> 37:00.600
These are the two which we wanted to compare.

37:00.930 --> 37:11.850
Now, for the next to the datasource that don't be based on the school that might be four independent.

37:12.570 --> 37:28.320
And we provide in the and we see this equal to zero and b C for is equal to two.

37:29.700 --> 37:31.140
Now let's run this.

37:32.970 --> 37:36.320
So here you can see the P value is very, very, very low.

37:36.330 --> 37:43.620
So which means that there is a very highly significant difference between the sales prices of the housebreak

37:43.950 --> 37:45.510
before and after 1960.

37:45.550 --> 37:53.850
Now, let's do the same comparison on the houses, which are right after, let's see, 1980.

37:55.990 --> 37:58.350
Let's compare on the basis of 1980.

37:58.980 --> 38:04.200
So here again, the prices are very different.

38:04.860 --> 38:07.500
Now, let us strive for 1980.

38:18.190 --> 38:18.910
Same thing.

38:19.300 --> 38:25.830
So here you can see the result now.

38:29.260 --> 38:41.860
Let us move to the Chi Square umbrella test, so let us check if dolus different line slopes are independent,

38:42.250 --> 38:51.670
so let us perform a test to check if knowing a particular line condole will have some impact.

38:52.150 --> 38:57.970
So for that, we will have to divide the science place into different criteria.

38:57.970 --> 39:04.600
So we'll go divide those whose plays into high, medium and luteal spaces, and then we will compare

39:04.600 --> 39:07.450
the different line contours with each other.

39:07.780 --> 39:09.370
So let's do that.

39:09.640 --> 39:14.050
So for that, let me simply create.

39:14.260 --> 39:20.890
Let me, first of all, import the stats by we already had these tags.

39:21.250 --> 39:22.780
So that is perfectly fine.

39:23.200 --> 39:35.020
So let us write a very small function, which will basically compute the frequency for guys square test.

39:35.260 --> 39:40.360
And then we can perform Chi Square test using those frequencies.

39:40.660 --> 39:46.570
So let's just write the function and just write that function for you and then explain it to you.

39:48.760 --> 39:51.730
So here is the function that is very simple in nature.

39:52.060 --> 39:56.050
So I have defined this function, compute frequency, chi square.

39:56.380 --> 40:01.030
So what it does is it will simply take the two columns which you are trying to compare.

40:01.360 --> 40:05.530
So I am providing the X and Y here.

40:05.950 --> 40:10.420
So in actual, we are comparing willAnd, Gondor and basis.

40:11.080 --> 40:13.510
So now let's get this.

40:13.810 --> 40:21.610
Now what we do is we just create the crosstabs of these frequencies.

40:22.090 --> 40:30.070
And using that particular crosstabs, we will simply apply the guys who are best for contingency on

40:30.070 --> 40:30.730
top of it.

40:31.180 --> 40:32.560
I'm using these guys.

40:32.580 --> 40:33.850
What is the contingency?

40:33.860 --> 40:41.110
We will be able to find out if Limed Gondwe actually has an impact on this, use phrases or not.

40:42.100 --> 40:45.040
So let's simply run this now.

40:48.490 --> 40:54.670
So here I have done this so you can see it simply created the cross tab.

40:54.670 --> 40:57.340
And then I simply printed be crossed out for you.

40:57.700 --> 40:59.680
This is the cross tab which was generated.

40:59.680 --> 41:06.700
So you can see for our line one door, there are four line Condor's being key achiness, low level.

41:07.090 --> 41:12.830
And then there seems places divided into high, medium and low using the feed.

41:13.300 --> 41:21.940
You could you could basically isor function, which cuts the data or you can divide the data into equal

41:21.940 --> 41:22.450
quarters.

41:22.490 --> 41:30.160
So when I say three, when I see I've got this in the freezer, it creates three equal regions for all

41:30.160 --> 41:36.970
three equally distributed categories for me, based on the numerical values presented the basis.

41:38.170 --> 41:42.070
So here I have the guys with this to start testing.

41:42.070 --> 41:46.360
So it comes out to be twenty six and the P value comes out to be a very low value.

41:46.630 --> 41:48.040
It is almost zero here.

41:48.280 --> 41:54.340
So this shows that there is a very significant difference between the scenes faces with respect to the

41:54.340 --> 41:57.550
front line contours and their independent dimension.

41:58.030 --> 42:00.730
So this is what we get you.

42:00.880 --> 42:09.100
We get to know that there is a lot of independents in the land, on doors and their corresponding sets,

42:09.100 --> 42:11.650
prices for this.

42:11.710 --> 42:17.920
If we look at this particular P value, we get to know that when you're performing decries, grittiest,

42:18.310 --> 42:20.980
and we have this very small P value.

42:21.250 --> 42:26.530
So it also appears the relationship between the land control and business price.

42:26.860 --> 42:32.430
Now, when you see the land, Korngold is having a very high impact on the scenes places.

42:32.860 --> 42:33.220
Right.

42:33.370 --> 42:40.090
When the not the line control is level, the sales prices are comparatively higher.

42:40.420 --> 42:42.370
When the land condor is low.

42:42.550 --> 42:44.890
You can see the sales prices are very low.

42:45.490 --> 42:54.300
So this shows that there is a very important dependency of sales prices on the land Gondo.

42:54.550 --> 43:03.280
So although Dilan condos internally are independent in nature, the sales prices are highly dependent

43:03.280 --> 43:04.870
on the long line control.

43:05.140 --> 43:10.140
So basically, non control is a very important feature to identify the sales places.

43:10.410 --> 43:16.600
This is this is what you will find out using the guys with this, that if the sales price is dependent

43:16.600 --> 43:18.130
on the line or not.

43:18.400 --> 43:27.430
So when I say that the P value is very low, that means that the sales price is dependent on the line

43:27.430 --> 43:28.060
control.

43:30.620 --> 43:34.420
Very high now.

43:34.550 --> 43:44.480
It simply shows that the two variables, sales price and loan line control, they are not independent.

43:44.720 --> 43:50.510
They are highly dependent because price is very highly dependent on the line.

43:50.510 --> 43:51.070
Quantock.

43:53.780 --> 43:58.010
Now, let us perform the one the analysis for radians, which is an over.

43:58.340 --> 44:05.000
Now it is the user domain, whether there are any statistically significant differences between the

44:05.000 --> 44:08.270
means of three or more independent or unpredicted groups.

44:08.720 --> 44:11.720
So Raegan, let check for the same thing.

44:11.720 --> 44:13.550
Will check for Dilan Condole.

44:13.880 --> 44:18.620
So Dilan Condor's bank is low and level.

44:18.950 --> 44:21.440
They are significantly different from each other.

44:21.470 --> 44:23.240
There are different categories.

44:23.510 --> 44:25.790
So these are independent and related groups.

44:25.790 --> 44:32.570
So we can actually check if the sales prices for these are different internally or not.

44:32.960 --> 44:41.390
So we did see this seem to be able to see that the see they had an impact on two places.

44:41.390 --> 44:43.700
Significant difference expectancy's prices.

44:44.660 --> 44:46.760
But we were not able to compare them.

44:47.270 --> 44:49.400
So let's compare them internally themselves.

44:51.290 --> 44:53.330
So let's check again.

44:53.630 --> 45:00.440
Now, here, each group which we will be taking considering has to be having a different participant

45:00.440 --> 45:02.240
and each group should have a different take.

45:02.810 --> 45:09.800
So when we are checking at this year's prices, we are looking at two prices of these different locations.

45:09.800 --> 45:12.950
So their houses will be different for each location.

45:13.250 --> 45:14.750
So the condition is perfect.

45:15.350 --> 45:20.460
Now, next, we will be checking out basically what we are trying to find out.

45:20.640 --> 45:28.280
It is when we perform the one we know, what is the one win or best will actually tell us if these are

45:28.280 --> 45:29.180
different or not.

45:29.600 --> 45:36.980
So we will be comparing multiple categories here and finding out which one is actually significantly

45:36.980 --> 45:40.520
different from all of the categories or or less similar or not.

45:41.090 --> 45:48.800
So if even one of the the category is significantly different, there will be a significant value which

45:48.800 --> 45:49.550
will be coming out.

45:49.910 --> 45:53.780
So first, let's applaud the group for this.

45:54.080 --> 45:56.960
And we also applaud the book's plot for the theme.

45:57.260 --> 46:01.910
I'm grouping the data by Land Cawdor and getting the sales places.

46:02.240 --> 46:06.070
And later, imploringly land control over this is prices.

46:06.080 --> 46:07.190
So let's try this.

46:07.520 --> 46:14.300
So here you can see I'm getting the land condors and their respective count mean, standard deviation

46:14.300 --> 46:15.440
and the maximum value.

46:15.770 --> 46:19.930
There you can see the mean for bank is one four three one zero four here.

46:19.970 --> 46:22.040
Twenty three, twenty nine, eighteen.

46:22.510 --> 46:29.510
Here the maximum values are not much different here, but for level the max value is quite high.

46:29.870 --> 46:39.960
Those 75 percent beilis somewhat similar for all for 50 percent nihil is again higher for the low one.

46:39.980 --> 46:42.740
So these are the details which we are giving from these.

46:43.040 --> 46:48.260
And total data is also different for bank we have for bank hills.

46:48.260 --> 46:52.070
And Lord, we have smaller number of houses.

46:52.310 --> 46:54.890
Microlevel, we have higher number of houses.

46:56.300 --> 46:59.150
So now let's look further.

46:59.510 --> 47:03.770
So we'll run the slides for if one way ANOVA.

47:04.070 --> 47:06.680
So here when we run this, we are checking.

47:06.950 --> 47:07.820
We are running it.

47:07.820 --> 47:16.340
We are comparing it for these places of land, condo level, for fighting for the low and for height.

47:16.820 --> 47:18.200
Now, let's run this.

47:18.920 --> 47:25.250
So when we run this, we get the value as a very high, very low value, which means that at least one

47:25.250 --> 47:29.570
of the classes is significantly different from all under three.

47:29.870 --> 47:31.880
So at least one of them is different.

47:32.150 --> 47:37.900
So what you can do is you can compare these you can compare these with one another and then find out

47:38.210 --> 47:40.370
what I even further does here.

47:40.380 --> 47:43.760
And you can see that be countered slightly different.

47:44.030 --> 47:49.310
So let me take a sample of B five from each one of these.

47:50.090 --> 47:53.420
So let's take a sample of thirty five from each one of these.

47:54.290 --> 47:56.180
So let's get this again.

47:59.000 --> 48:02.870
And they could go something

48:06.870 --> 48:07.460
like this.

48:08.740 --> 48:12.530
They've done this for.

48:29.420 --> 48:36.890
Here you see still the P-value is coming out to be very low, which simply means that there is at least

48:36.890 --> 48:42.440
one category which is having a significant difference from all of those three categories.

48:42.740 --> 48:46.160
Now, this actually calls for the further analysis.

48:46.460 --> 48:53.150
So what you will be doing is really comparing level with bank level with low level the 10th.

48:53.450 --> 48:58.490
And then you will find out which one is actually different so that you can get an idea what you will

48:58.490 --> 48:58.880
be doing.

48:59.150 --> 49:01.340
And you can simply do that using this.

49:01.850 --> 49:04.500
So that does not involve you can easily find that out.

49:07.040 --> 49:09.230
So this is one Vilanova.

49:09.470 --> 49:16.310
And similarly, you can try this out for much different multiple variables.

49:16.310 --> 49:18.620
So I have considered like one good.

49:18.620 --> 49:23.630
You can pick up any of this activity and then walk on it and then test.