﻿1
00:00:01,024 --> 00:00:02,816
‫Now let's move on to

2
00:00:03,072 --> 00:00:05,376
‫Next library which is seaborn.

3
00:00:06,912 --> 00:00:10,240
‫Seaborn is a library for data visualization

4
00:00:10,496 --> 00:00:13,824
‫Most commonly used library is matplotlib

5
00:00:14,336 --> 00:00:20,480
‫But for our course we think seaborn is much more suited that's why you will be discussing seaborn only

6
00:00:21,760 --> 00:00:27,392
‫You want you can learn about ####### on your own.

7
00:00:28,928 --> 00:00:34,560
‫First we will import the

8
00:00:45,568 --> 00:00:49,152
‫seaborn library we will write input seaborn as SNS.

9
00:00:49,920 --> 00:00:50,432
‫Now

10
00:00:51,200 --> 00:00:54,784
‫To plot the distribution of our age variable

11
00:00:55,296 --> 00:00:58,880
‫from our customer table will write SNS

12
00:00:59,904 --> 00:01:02,720
‫.thisplot

13
00:01:02,976 --> 00:01:06,048
‫Bracket I will mention my

14
00:01:06,816 --> 00:01:09,632
‫Column name it is customer

15
00:01:10,912 --> 00:01:11,936
‫data 2

16
00:01:12,960 --> 00:01:13,984
‫Dot

17
00:01:14,240 --> 00:01:16,544
‫age capital A.

18
00:01:22,432 --> 00:01:26,272
‫You can see this histogram of our age variable

19
00:01:28,320 --> 00:01:29,344
‫Is first

20
00:01:29,600 --> 00:01:32,160
‫Creating bins of all the ages

21
00:01:33,184 --> 00:01:38,048
‫So the minimum value in our age variable is 18 and the maximum is 17

22
00:01:38,560 --> 00:01:40,864
‫Python has created different Bin

23
00:01:41,376 --> 00:01:43,168
‫Between these two values

24
00:01:43,936 --> 00:01:44,704
‫And then

25
00:01:44,960 --> 00:01:51,104
‫Plotted the number of variables in those bins in the form of bar this is called histogram.

26
00:01:52,640 --> 00:01:56,736
‫You can see most of our customer are in this last bucket

27
00:01:59,040 --> 00:02:04,160
‫This line is also known as KDE kernel density estimate

28
00:02:05,440 --> 00:02:11,584
‫And we are not going into explanation on how to arrive on this line so

29
00:02:11,840 --> 00:02:13,632
‫We are just going to remove this line.

30
00:02:13,888 --> 00:02:15,168
‫To remove this line

31
00:02:15,424 --> 00:02:21,568
‫we'll Write SNS dot plot data to.age

32
00:02:24,384 --> 00:02:28,224
‫And then KDE equal to false.

33
00:02:35,904 --> 00:02:38,720
‫Now the KDE line is removed.

34
00:02:38,976 --> 00:02:40,768
‫If you want to see

35
00:02:41,536 --> 00:02:46,144
‫Arguments of any function you can just write help

36
00:02:46,400 --> 00:02:49,216
‫And in bracket you can write that function.

37
00:02:49,472 --> 00:02:53,568
‫Example in our case we'll write SNS dot plot.

38
00:02:57,664 --> 00:03:01,248
‫It will show you the syntax of that function

39
00:03:01,504 --> 00:03:04,320
‫All the variables that it is taking

40
00:03:04,832 --> 00:03:07,392
‫And the default values of those variable

41
00:03:07,648 --> 00:03:12,512
‫so by default KDE was true that's why we were getting the KDE plot

42
00:03:13,536 --> 00:03:17,632
‫And you can change the value to false to to remove the KDE.

43
00:03:18,144 --> 00:03:20,704
‫Similarly there are other variables as well

44
00:03:20,960 --> 00:03:25,056
‫You can create a rug plot if you write#######equal to true

45
00:03:25,312 --> 00:03:29,664
‫If you can have a look at this and read the documentation

46
00:03:29,920 --> 00:03:32,736
‫Now you will change the colour of this graph.

47
00:03:41,440 --> 00:03:44,512
‫You can see colour is also a variable

48
00:03:45,024 --> 00:03:45,792
‫We will

49
00:03:46,048 --> 00:03:51,680
‫Just write sns.thisplot

50
00:04:02,432 --> 00:04:05,504
‫Colour equal to red

51
00:04:06,784 --> 00:04:12,928
‫Red in double quotes run this can see the colour is now

52
00:04:13,184 --> 00:04:13,952
‫Changed to red.

53
00:04:14,976 --> 00:04:19,071
‫Seaborn library comes with various data sets

54
00:04:19,839 --> 00:04:24,703
‫No we will just import one of such database also known as Iris

55
00:04:25,471 --> 00:04:28,543
‫So will right Iris Iris is our

56
00:04:28,799 --> 00:04:32,127
‫Name of variable containing Iris dataset

57
00:04:32,383 --> 00:04:35,199
‫Then we'll write Sns.plot

58
00:04:35,455 --> 00:04:36,991
‫Underscore data set

59
00:04:41,087 --> 00:04:44,415
‫And in bracket will write Iris.

60
00:04:45,951 --> 00:04:47,743
‫Run this

61
00:04:49,279 --> 00:04:53,119
‫You can view a sample of this data set by using Iris

62
00:04:53,375 --> 00:04:54,655
‫.#######

63
00:05:01,055 --> 00:05:03,871
‫So our data contains

64
00:05:04,383 --> 00:05:05,663
‫5 columns

65
00:05:05,919 --> 00:05:08,223
‫sepal length,sepal width

66
00:05:08,735 --> 00:05:10,527
‫petal length,petal width

67
00:05:10,783 --> 00:05:16,927
‫and specie this is the data of flowers where we have sepal length and sepal width and petal

68
00:05:17,183 --> 00:05:18,207
‫Length and Petal width.

69
00:05:18,719 --> 00:05:21,279
‫And then the specie of that flower.

70
00:05:21,535 --> 00:05:26,399
‫To get the number of columns we'll write Iris dot shape

71
00:05:28,959 --> 00:05:30,751
‫and run this

72
00:05:33,055 --> 00:05:36,639
‫You can see there are total 150 rows

73
00:05:36,895 --> 00:05:38,431
‫There are five columns

74
00:05:40,735 --> 00:05:46,879
‫You want to get the mean value median value minimum value and maximum value we can also write Iris.

75
00:05:47,135 --> 00:05:47,647
‫Describe

76
00:05:58,911 --> 00:06:03,007
‫This will you show us some statistics of all this

77
00:06:03,263 --> 00:06:04,287
‫Four columns.

78
00:06:06,335 --> 00:06:06,847
‫Now

79
00:06:07,103 --> 00:06:08,895
‫Will plot a scatter plot

80
00:06:09,663 --> 00:06:12,223
‫Between sepal Length and sepal width.

81
00:06:12,735 --> 00:06:15,295
‫To do that will write ######

82
00:06:16,575 --> 00:06:18,879
‫#######

83
00:06:23,999 --> 00:06:25,535
‫our x variable

84
00:06:25,791 --> 00:06:31,935
‫Should be sepal length I'l write sepal length

85
00:06:33,727 --> 00:06:39,871
‫And y variable is sepal width

86
00:06:41,151 --> 00:06:44,479
‫And then our data is Iris.

87
00:06:49,343 --> 00:06:51,135
‫If you run this

88
00:06:51,391 --> 00:06:52,671
‫You are getting

89
00:06:53,183 --> 00:06:54,975
‫Scatter plot

90
00:06:55,231 --> 00:06:57,535
‫Between sepal length and sepal width

91
00:06:57,791 --> 00:07:01,119
‫on the top we have distribution of sepal length

92
00:07:01,887 --> 00:07:02,655
‫And

93
00:07:03,679 --> 00:07:07,519
‫The right hand side we have distribution of sepal width.

94
00:07:13,919 --> 00:07:20,063
‫There are other variations of this scatter plot also if you want to change the colour of

95
00:07:20,319 --> 00:07:26,463
‫dots the size of this dot if you want a Shady plot with this histogram Shady plot or

96
00:07:26,719 --> 00:07:31,583
‫####### You can find all this with the help option of joint

97
00:07:31,839 --> 00:07:36,447
‫And we will discuss that during our uni variate analysis.

98
00:07:37,471 --> 00:07:37,983
‫Next

99
00:07:38,239 --> 00:07:40,799
‫Another important function is pair plot

100
00:07:41,311 --> 00:07:45,663
‫So while doing our analysis instead of plotting

101
00:07:45,919 --> 00:07:48,735
‫Scatter plot for all the variables we can

102
00:07:49,503 --> 00:07:52,575
‫Do it for all the variables using just one command

103
00:07:53,343 --> 00:07:55,903
‫That is sns.pairplot

104
00:07:59,743 --> 00:08:03,583
‫And in bracket we just have to mention the data frame that is Iris.

105
00:08:06,655 --> 00:08:10,239
‫It will create scatter plot for all the

106
00:08:10,495 --> 00:08:11,519
‫Variable.

107
00:08:14,591 --> 00:08:17,663
‫For example this is a distribution of

108
00:08:17,919 --> 00:08:18,943
‫Sepal length

109
00:08:20,223 --> 00:08:23,807
‫This is a scatter plot of sepal Length and sepal width

110
00:08:24,063 --> 00:08:28,671
‫This is a scatter plot of sepal length and Petal length

111
00:08:28,927 --> 00:08:32,255
‫This is a scatter plot of sepal length

112
00:08:33,023 --> 00:08:34,303
‫and petal width.

113
00:08:35,071 --> 00:08:41,215
‫This is a very useful command and in a single command we can get scatter plot for all our variables.

114
00:08:42,239 --> 00:08:46,591
‫That's all for this video and that's all for Python crash course

115
00:08:46,847 --> 00:08:48,639
‫This is just a crash course

116
00:08:48,895 --> 00:08:51,199
‫You are not covering anything in deep

117
00:08:51,711 --> 00:08:56,063
‫And as we go along will discuss all this things in more detail.

