1
00:00:01,610 --> 00:00:08,090
‫So once we have trained our model I told you that we can quantify the quality of trade of our model

2
00:00:08,360 --> 00:00:12,410
‫using the means squared error term which is given by this formula.

3
00:00:12,420 --> 00:00:20,180
‫It is basically residual Tamal squares divided by n if the predictor responses are very close to the

4
00:00:20,180 --> 00:00:21,590
‫observations.

5
00:00:21,590 --> 00:00:23,330
‫The means could at it will be small

6
00:00:27,300 --> 00:00:34,650
‫but if we find out mean square error on the same data which we used to bring in our model it is called

7
00:00:34,680 --> 00:00:35,610
‫training MSE

8
00:00:38,470 --> 00:00:41,180
‫training it it is not something we are interested in.

9
00:00:41,430 --> 00:00:48,540
‫We are interested in the accuracy of the predictions when we apply our method to previously unseen test

10
00:00:48,600 --> 00:00:49,240
‫data.

11
00:00:50,670 --> 00:00:54,380
‫For example I am predicting house price.

12
00:00:54,470 --> 00:01:00,720
‫I don't really care how well our method predicts house price of previously completed transactions.

13
00:01:00,950 --> 00:01:08,590
‫I get about how well will it predict the whole place of the future transactions similarly if I want

14
00:01:08,590 --> 00:01:13,320
‫to predict the risk of a particular disease in different individuals.

15
00:01:13,400 --> 00:01:19,630
‫I want to do it for future patients and not for the ones I already know the outcome for.

16
00:01:19,700 --> 00:01:26,000
‫So what we're going to do is we are going to split our data into two parts.

17
00:01:26,030 --> 00:01:29,000
‫One will be called training set.

18
00:01:29,360 --> 00:01:36,210
‫This will be used to train the model and the other part will be called The Best said.

19
00:01:36,440 --> 00:01:45,300
‫This will be the unseen data and it will be used to assess the accuracy of our model so mathematically.

20
00:01:45,300 --> 00:01:53,910
‫I have these and pairs of observations X1 by one x2 y2 to extend way in these will be part of my training

21
00:01:54,210 --> 00:01:57,750
‫data and I will use them to train my model.

22
00:01:57,750 --> 00:02:05,540
‫Once I have used them I will have identified the functional form of play that is the ethics.

23
00:02:05,970 --> 00:02:16,040
‫Now I will use previously unseen set of observations x 0 y 0 these observations will come from our best

24
00:02:16,070 --> 00:02:23,170
‫said and I will try to find out the best error which is given by this formula.

25
00:02:23,220 --> 00:02:24,820
‫So test means good.

26
00:02:24,870 --> 00:02:33,720
‫It is average of squared of difference between the predicted value of y to the actual value of y on

27
00:02:33,720 --> 00:02:39,360
‫the given test data so for different types of models.

28
00:02:39,560 --> 00:02:48,590
‫I will compare the value of this test error and then to let the model with least tested it.

29
00:02:48,770 --> 00:02:54,830
‫I hope you understand the idea behind having test entering data basically.

30
00:02:54,990 --> 00:02:57,860
‫We have training data and corresponding training.

31
00:02:57,870 --> 00:03:05,890
‫EDIT body models selected but there is no guarantee that the method with Lewis training error will also

32
00:03:05,890 --> 00:03:07,000
‫have law tested.

33
00:03:08,760 --> 00:03:16,390
‫Roughly speaking many statistical methods specifically estimate we does so that we are able to minimize

34
00:03:16,520 --> 00:03:24,300
‫trainings at data and for such methods all the training so that it will be small but the actual test

35
00:03:24,300 --> 00:03:26,460
‫set it can be quite large.

36
00:03:30,220 --> 00:03:41,100
‫In this graph you can see four different lines this black one is the true function which we want to

37
00:03:41,100 --> 00:03:41,580
‫predict.

38
00:03:43,610 --> 00:03:47,120
‫This orange line is the output of a linear regression model.

39
00:03:48,520 --> 00:03:53,590
‫And these blue and green lines are the result of some other more flexible models.

40
00:03:55,650 --> 00:04:02,650
‫And these small circles that we are seeing are the data points which were used to train the model.

41
00:04:02,770 --> 00:04:10,090
‫You can see as am increasing the flexibility of the model that is I am allowing it to change its shape

42
00:04:10,750 --> 00:04:11,640
‫or its direction.

43
00:04:11,650 --> 00:04:15,220
‫Many times it is touching more points on this graph.

44
00:04:16,920 --> 00:04:23,940
‫So this green card which has highest flexibility is touching the maximum number of points whereas the

45
00:04:24,000 --> 00:04:33,040
‫Orange Girl which has least flexibility is touching very few points you can see after a certain level

46
00:04:33,040 --> 00:04:41,950
‫of flexibility this flexibility is making the curve more wiggly that is it is following the individual

47
00:04:41,950 --> 00:04:50,930
‫data points and not the overall function to the effect of flexibility on training it and tested it can

48
00:04:50,930 --> 00:04:52,470
‫be seen on the graph on the right.

49
00:04:54,540 --> 00:05:00,800
‫You can see that this great plot is of creating error as you keep on increasing the flexibility the

50
00:05:00,870 --> 00:05:09,840
‫training that keeps coming down that is the model will be fitting or will be touching a lot of these

51
00:05:09,840 --> 00:05:19,240
‫sample points but after a certain point the test data which is given by this red go it start increasing

52
00:05:19,240 --> 00:05:21,640
‫with the increasing flexibility.

53
00:05:24,020 --> 00:05:32,420
‫You can see this orange point is the test and train it for this orange glow which is the trade lingo.

54
00:05:34,810 --> 00:05:42,820
‫This blue point is for the blue go which is approximating the true function very closely and this green

55
00:05:42,940 --> 00:05:47,020
‫point is for the green go which is very flexible.

56
00:05:47,020 --> 00:05:55,730
‫It has low training error then the blue go below that is putting the point more closely but it has high

57
00:05:55,730 --> 00:06:03,730
‫test error because it is not approximating the true function so we want to identify this blue point

58
00:06:04,130 --> 00:06:08,640
‫where we get the minimum tested it.

59
00:06:09,010 --> 00:06:15,350
‫Now there are several techniques to split the data into training and test so that we can find this minimum

60
00:06:15,350 --> 00:06:15,770
‫point

61
00:06:18,820 --> 00:06:23,090
‫so we are going to discuss the three most popular techniques.

62
00:06:23,090 --> 00:06:25,570
‫First is called validation set approach.

63
00:06:25,570 --> 00:06:33,800
‫Second is leave one out cross validation and the third one is capable cross validation.

64
00:06:33,850 --> 00:06:40,160
‫The first technique which is validation set approach is the simplest approach in this matter.

65
00:06:40,190 --> 00:06:48,500
‫We will randomly divide the data into two parts a training set and a tested the model will be fitted

66
00:06:48,740 --> 00:06:56,050
‫on the training set and once the model is trained the error for the test it will be calculated to estimate

67
00:06:56,060 --> 00:07:04,530
‫detested we usually do a split of 80 20 that is we use 80 percent of the data for training purposes

68
00:07:04,860 --> 00:07:11,860
‫and 20 percent of the data for testing purposes will be running this approach in our software package.

69
00:07:11,870 --> 00:07:18,090
‫In a separate video there are basically two limitations of this approach.

70
00:07:18,090 --> 00:07:25,430
‫One is that part of the available data will not be used for training and as we know the more data we

71
00:07:25,430 --> 00:07:31,730
‫use during training better will be the performance of the more than So if we keep some data for testing

72
00:07:32,130 --> 00:07:34,420
‫the train model will not be as good.

73
00:07:36,610 --> 00:07:41,400
‫And if you have limited number of observations your training will be severely impacted.

74
00:07:43,340 --> 00:07:50,090
‫Secondly the test error can be highly variable depending on which observation is selected for training

75
00:07:50,510 --> 00:07:52,570
‫and which observation is selected for testing.

76
00:07:53,740 --> 00:08:01,290
‫So to handle these two issues there are these two alternative approaches in the lead one out cross validation

77
00:08:02,280 --> 00:08:10,200
‫so both will and observations will keep the first observation for testing purposes and run the motor

78
00:08:10,200 --> 00:08:13,390
‫on the remaining and minus one of the reasons.

79
00:08:13,710 --> 00:08:20,980
‫Then we will keep the second observation for testing purposes and run the model on the remaining and

80
00:08:20,980 --> 00:08:24,440
‫minus one observations and we will run this model.

81
00:08:24,670 --> 00:08:31,590
‫And times where every time will keep one observation for testing and the other and minus one for training

82
00:08:32,950 --> 00:08:38,330
‫and will take the average of the error on each of these testing observations.

83
00:08:41,070 --> 00:08:48,410
‫So since we need to run this model in times this method can be computationally expensive.

84
00:08:49,200 --> 00:08:56,440
‫An alternative to this leave one out cross validation is the key for cross validation in this.

85
00:08:56,500 --> 00:09:05,710
‫We will divide the data into case it and then we will train the day down K minus one set and use the

86
00:09:05,710 --> 00:09:09,810
‫get set for testing purposes.

87
00:09:09,870 --> 00:09:18,610
‫You can see that leave a note cross validation is a special case of capable validation if you have gays

88
00:09:18,610 --> 00:09:21,630
‫equal to N then give all validation and leave one note.

89
00:09:21,630 --> 00:09:27,100
‫Cross our relation are the same thing so we will not be covering these two techniques in the software

90
00:09:27,100 --> 00:09:31,990
‫package will only be done in the validation set approach in our software package.