1
00:00:11,060 --> 00:00:16,250
OK, so in this lecture, we are going to look at applying the whole Winsor's model to a new data center,

2
00:00:16,520 --> 00:00:18,170
specifically stock prices.

3
00:00:18,740 --> 00:00:23,840
Now, before we begin this lecture, I want to give you an opportunity to stop this video and to try

4
00:00:23,840 --> 00:00:24,970
to do this on your own.

5
00:00:25,580 --> 00:00:30,440
Since you've already learned all the code you need to complete this exercise, you technically do not

6
00:00:30,440 --> 00:00:31,200
need my help.

7
00:00:31,760 --> 00:00:37,520
So if you want to treat this as an exercise, please download the data set from the euro given in this

8
00:00:37,520 --> 00:00:38,120
notebook.

9
00:00:38,420 --> 00:00:42,590
But after you get the euro, close the notebook and build a model on your own.

10
00:00:43,250 --> 00:00:47,130
OK, so if you want to try this as an exercise, please do so now.

11
00:00:47,330 --> 00:00:48,830
Otherwise, let's continue.

12
00:00:49,770 --> 00:00:52,910
OK, so let's skip the imports since you have seen this all before.

13
00:00:56,690 --> 00:01:02,870
The next step is to download our data, so clearly this is a CSFI of stock prices for stocks in the

14
00:01:02,870 --> 00:01:04,220
S&amp;P 500.

15
00:01:08,310 --> 00:01:11,910
Well, now run the head commands in order to check what's inside our CSFI.

16
00:01:16,740 --> 00:01:22,860
Note that unlike a typical stock price, GSV, this contains multiple tickers at once, you can see

17
00:01:22,860 --> 00:01:26,910
that it's essentially multiple ticker's concatenated in the same data frame.

18
00:01:27,430 --> 00:01:29,610
You can tell them apart by the name column.

19
00:01:34,040 --> 00:01:37,690
The next step is to load in our data using Pediatrics V..

20
00:01:41,970 --> 00:01:45,630
The next step is to call the head to see what's in our data frame.

21
00:01:49,710 --> 00:01:51,390
OK, so nothing unexpected.

22
00:01:54,680 --> 00:01:57,480
The next step is to grab the clothes prices for Google.

23
00:01:57,860 --> 00:02:00,350
Please feel free to choose your own ticker if you like.

24
00:02:04,280 --> 00:02:07,660
The next step is to call the plot function to see what our data looks like.

25
00:02:11,330 --> 00:02:15,020
OK, so we see that Google is performing very well in the stock market.

26
00:02:18,340 --> 00:02:23,950
The next step is to take the log transform of the close price, we'll also plot this new column to see

27
00:02:23,950 --> 00:02:24,850
what it looks like.

28
00:02:30,470 --> 00:02:33,410
Interestingly, we can now see a pretty linear trend.

29
00:02:37,280 --> 00:02:40,700
So you'll notice that I've commented on this line, which does not work.

30
00:02:41,960 --> 00:02:46,040
So the problem with stock prices is they only exist for trading days.

31
00:02:46,520 --> 00:02:49,370
However, trading days are not evenly spaced in time.

32
00:02:49,940 --> 00:02:54,470
Furthermore, they don't correspond with business days, which is the closest thing in panties.

33
00:02:55,460 --> 00:02:57,580
Unfortunately, this line throws an error.

34
00:02:57,830 --> 00:03:01,730
So we'll just have to make do without having a frequency for the index.

35
00:03:03,740 --> 00:03:09,560
The next step is to split our data into training tests I've chosen and testicles 30, which might be

36
00:03:09,560 --> 00:03:10,160
a bit long.

37
00:03:10,160 --> 00:03:12,640
So feel free to change this to whatever you like.

38
00:03:16,450 --> 00:03:18,180
OK, so you've seen this before.

39
00:03:23,360 --> 00:03:28,950
OK, so the next step is to instantiate our model, since I don't believe this data has any seasonality.

40
00:03:29,090 --> 00:03:31,470
I've said Seasonale to none again.

41
00:03:31,490 --> 00:03:34,580
Feel free to test different parameters and see what works best.

42
00:03:37,480 --> 00:03:42,730
Note that we technically didn't have to take the log ourselves, since that's models can do this internally,

43
00:03:43,390 --> 00:03:48,490
also observe that we get a warning since our data frame index doesn't have a frequency.

44
00:03:51,490 --> 00:03:54,970
The next step is to assign our predictions to the Google data frame.

45
00:03:58,470 --> 00:04:02,550
For the test set, you see that I've converted the result into an umpire, Ray.

46
00:04:03,330 --> 00:04:09,390
This is because not having a frequency for the index messes up the forecast indices and this will mess

47
00:04:09,390 --> 00:04:11,240
up where it goes in the data frame.

48
00:04:16,200 --> 00:04:18,990
OK, so the next step is to plot our predictions.

49
00:04:26,230 --> 00:04:29,330
Notice how nice they look for the train set, but don't be fooled.

50
00:04:29,650 --> 00:04:31,930
Of course, what really matters is the test said.

51
00:04:35,300 --> 00:04:40,310
OK, so in order to get a better picture of what's going on, we're going to plot only the last one

52
00:04:40,310 --> 00:04:41,330
hundred data points.

53
00:04:45,530 --> 00:04:47,670
So the result is not too surprising.

54
00:04:48,140 --> 00:04:53,700
We see that our model only appears to do well on the train set because it copies the last value.

55
00:04:54,080 --> 00:04:57,620
This makes sense since the log price nearly follows a random walk.

56
00:04:58,100 --> 00:05:03,230
And of course, the forecast is a straight line since we've used holds linear trend model to fit our

57
00:05:03,230 --> 00:05:06,740
data set as an exercise.

58
00:05:06,770 --> 00:05:12,680
Try to compute one of our forecasting metrics for this prediction and compare it to the night forecast.

59
00:05:13,130 --> 00:05:18,380
Remember that for the naive forecast, we have to propagate the last known value through the forecast

60
00:05:18,380 --> 00:05:19,020
horizon.

61
00:05:19,430 --> 00:05:24,890
So don't copy the last value within the test set, but only copy the last value from the train set.

62
00:05:25,490 --> 00:05:27,380
This should give you a horizontal line.

63
00:05:28,010 --> 00:05:31,280
So in other words, see if a horizontal line is better than Holtze.

64
00:05:31,340 --> 00:05:32,390
Linear trend line.