1
00:00:11,130 --> 00:00:17,160
OK, so in this lecture, we are going to look at how to implement the Niyi forecast as well as evaluate

2
00:00:17,160 --> 00:00:19,110
it using the metrics we learned about.

3
00:00:20,810 --> 00:00:23,690
OK, so let's start by importing no giant pandas.

4
00:00:29,950 --> 00:00:35,470
The next step is to upgrade psyche, learn the reason for this is the current version of Saika Learn

5
00:00:35,470 --> 00:00:38,640
installed in Google CoLab does not have the map metric.

6
00:00:39,070 --> 00:00:43,570
So as I frequently mentioned in my courses, machine learning is a field that moves fast.

7
00:00:43,870 --> 00:00:45,610
This kind of thing is completely normal.

8
00:00:45,610 --> 00:00:47,070
Libraries change all the time.

9
00:00:53,910 --> 00:01:00,420
The next step is to import our metrics from Saikat learn so we have the map, the M80, the R-squared

10
00:01:00,420 --> 00:01:01,330
and the Mzee.

11
00:01:02,190 --> 00:01:05,880
You'll notice that this does not include the army nor the SMAP.

12
00:01:06,180 --> 00:01:07,710
We'll see how to deal with those later.

13
00:01:12,930 --> 00:01:19,050
So for this exercise, we'll be using prices from the S&amp;P 500, which can be downloaded from my website.

14
00:01:25,750 --> 00:01:30,730
The next step is to call pedigreed CSFI in order to get a data frame of our data.

15
00:01:34,000 --> 00:01:38,740
As always, I'd like to do a DFAT head to get some sense for the data we're working with.

16
00:01:42,970 --> 00:01:47,070
So we can see all the expected columns open, high, low and so forth.

17
00:01:51,070 --> 00:01:56,980
The next step is to generate our predictions, so basically the new forecast is a forecast where we

18
00:01:57,130 --> 00:02:03,370
simply predict the previous value that we can accomplish this by calling the shift function on the close

19
00:02:03,370 --> 00:02:03,580
call.

20
00:02:03,580 --> 00:02:06,580
And we'll call this new column close prediction.

21
00:02:10,470 --> 00:02:14,550
The next step is to call DFG head once again to see our new column.

22
00:02:19,930 --> 00:02:25,750
Notice that the first row now contains not a number, since, of course, there is no last value for

23
00:02:25,750 --> 00:02:26,560
the first row.

24
00:02:30,150 --> 00:02:36,030
OK, so for convenience, we're going to assign the true closed prices to a variable called Y true and

25
00:02:36,030 --> 00:02:39,210
the predicted closed prices to a variable called a widespread.

26
00:02:40,140 --> 00:02:42,470
This is what these arguments are called inside you learn.

27
00:02:42,660 --> 00:02:45,720
So I felt it would be appropriate to call them the same thing.

28
00:02:51,380 --> 00:02:56,810
OK, so the next portion of this notebook will be to look at our metrics, the main purpose of this

29
00:02:56,810 --> 00:03:01,070
is not just to understand how to do this in code, since that's pretty easy.

30
00:03:01,460 --> 00:03:05,070
But I want you to pay attention to how these values relate to each other.

31
00:03:05,540 --> 00:03:09,590
Think about what would be considered a good value and what would be considered bad.

32
00:03:12,410 --> 00:03:17,090
OK, so let's start with the sum of squared errors, since there's no function for this, we're going

33
00:03:17,090 --> 00:03:18,630
to calculate it ourselves.

34
00:03:19,280 --> 00:03:24,920
So since why shouldn't we prêt or effectively one dimensional arrays, we can just take the difference

35
00:03:24,920 --> 00:03:27,460
into a dot products with the same difference.

36
00:03:30,640 --> 00:03:32,990
OK, so the result is about 6000.

37
00:03:33,430 --> 00:03:36,490
Of course, we don't necessarily know whether this is bad or good.

38
00:03:36,490 --> 00:03:37,630
It's just a no.

39
00:03:39,560 --> 00:03:44,090
The next step is to calculate the mean square error where we use our psyche to learn function.

40
00:03:46,620 --> 00:03:52,410
OK, so this is the result, as you can see, this brings the number down to a more reasonable range.

41
00:03:55,090 --> 00:03:59,180
Now, as Python coders, we can't be afraid of implementing things ourselves.

42
00:03:59,590 --> 00:04:03,460
Some students get absolutely frightened when they see that you're not using a library.

43
00:04:03,610 --> 00:04:06,230
But I urge all students not to take this approach.

44
00:04:06,670 --> 00:04:09,090
In fact, implementing the MSA is trivial.

45
00:04:09,400 --> 00:04:13,840
It's just what we had before, divided by the length of either flour, white, widespread.

46
00:04:17,010 --> 00:04:19,850
OK, and so we get the same answer as expected.

47
00:04:22,930 --> 00:04:28,750
The next step is to calculate the root mean squared error now, surprisingly, this is done by the mean

48
00:04:28,750 --> 00:04:32,900
squared error function where you can pass in the argument squared equal to false.

49
00:04:33,370 --> 00:04:34,270
So let's try that.

50
00:04:37,050 --> 00:04:40,370
OK, so we get about one point six, seven, which makes sense.

51
00:04:43,430 --> 00:04:48,230
And of course, we can just take the square root of our previous calculation, so let's try that also.

52
00:04:50,650 --> 00:04:52,900
And we get the same answer as expected.

53
00:04:55,850 --> 00:05:00,950
The next step is to calculate the mean absolute error, since we have a secure, learned function for

54
00:05:00,950 --> 00:05:02,840
this, we are going to make use of it.

55
00:05:06,020 --> 00:05:11,270
OK, so you can see that although the root mean square error and the mean absolute error have the same

56
00:05:11,270 --> 00:05:13,880
units, they do not give you the same value.

57
00:05:17,070 --> 00:05:21,560
Now, we know that for all the previous metrics we've seen, they are scale dependent.

58
00:05:22,140 --> 00:05:26,990
So the next step is to look at the R-squared, which does not depend on the scale of the data.

59
00:05:30,660 --> 00:05:35,820
OK, so this should be surprising, as you recall, we said that the best R-squared is one.

60
00:05:36,240 --> 00:05:39,670
Where is the R-squared of simply predicting the mean value is zero?

61
00:05:40,770 --> 00:05:45,550
It turns out that our naive forecast gets an R squared of zero point nine nine nine.

62
00:05:46,080 --> 00:05:51,300
This kind of makes sense since stock prices don't vary that wildly from one day to the next.

63
00:05:51,690 --> 00:05:56,260
So predicting the last value in the series should give us pretty good predictors.

64
00:05:56,820 --> 00:06:01,800
However, in another sense, these are also very bad predictions because they are really just the dumbest

65
00:06:01,800 --> 00:06:02,840
predictions possible.

66
00:06:03,510 --> 00:06:09,120
So let this be a lesson that if you see a model that happens to predict stock prices very well, don't

67
00:06:09,120 --> 00:06:11,430
assume that such a model is actually useful.

68
00:06:14,840 --> 00:06:19,130
The next step is to compute the make, which is another scale and very symmetric.

69
00:06:22,120 --> 00:06:26,620
OK, so this is nearly zero, which makes sense since the R-squared is nearly one.

70
00:06:30,110 --> 00:06:32,090
The next step is to compute the SMAP.

71
00:06:32,600 --> 00:06:38,240
Now you'll notice that I didn't import this function from Sakia learn this is because no such function

72
00:06:38,240 --> 00:06:38,850
exists.

73
00:06:39,290 --> 00:06:42,800
So this is the power of being able to implement these things yourself.

74
00:06:43,190 --> 00:06:48,320
Someone who does not have these skills would probably start by going to Google and then checking stack

75
00:06:48,320 --> 00:06:49,670
overflow and so forth.

76
00:06:49,910 --> 00:06:54,500
They might end up wasting their whole day trying to figure out where is the function for SMAP.

77
00:06:54,890 --> 00:06:59,840
But when you do have these skills, you're able to get this done in just a few lines of code and a few

78
00:06:59,840 --> 00:07:00,950
seconds of effort.

79
00:07:04,480 --> 00:07:09,430
OK, and we see that the result is pretty close to the non symmetric map, which makes sense.