1
00:00:11,140 --> 00:00:16,750
So in this lecture, we are going to discuss some common at times serious transformations, if you're

2
00:00:16,750 --> 00:00:22,000
familiar with machine learning, then you know that it's often useful to transform your data before

3
00:00:22,000 --> 00:00:24,000
passing it into a machine learning model.

4
00:00:24,550 --> 00:00:32,380
For example, standardization or min scaling four time series, we'll be discussing three common transformations

5
00:00:32,620 --> 00:00:36,700
the power transform, the log transform and the Buzzcocks transform.

6
00:00:37,240 --> 00:00:40,210
As you'll see, these all essentially serve the same purpose.

7
00:00:44,910 --> 00:00:50,160
So let's start with the power transform, the power transform involves raising all your data points

8
00:00:50,160 --> 00:00:55,530
to a power, for example, by raising every data point to the power of one half, you'll be taking the

9
00:00:55,530 --> 00:00:56,900
square root of your data set.

10
00:00:57,810 --> 00:00:59,100
So why is this useful?

11
00:01:00,030 --> 00:01:03,560
Well, imagine that your data appears to grow quadratic in time.

12
00:01:04,020 --> 00:01:09,090
If you take the square root, the result would be that you transform your data to grow linearly.

13
00:01:09,630 --> 00:01:11,070
So why is that useful?

14
00:01:11,670 --> 00:01:16,250
Well, you'll soon learn about some machine learning models that can learn linear trends very well,

15
00:01:16,680 --> 00:01:20,480
but there's no model for quadratic trends or Kubic trends and so forth.

16
00:01:21,090 --> 00:01:26,430
Thus, by transforming your data to appear like it has a linear trend, you give your model a better

17
00:01:26,430 --> 00:01:31,650
chance of forecasting future data points and modeling the true nature of the Time series more closely.

18
00:01:36,370 --> 00:01:42,760
So another transformation with a similar purpose is the log transform, like the power transform, it

19
00:01:42,760 --> 00:01:46,160
basically ends up squashing your data into a smaller range.

20
00:01:46,600 --> 00:01:51,940
In fact, a lot of the time I'll just end up using the log transform by default without considering

21
00:01:51,940 --> 00:01:52,880
other options.

22
00:01:53,800 --> 00:01:58,530
One common application of the log transform is in finance and finance.

23
00:01:58,540 --> 00:02:02,620
It's common to model stock prices as following a normal distribution.

24
00:02:03,640 --> 00:02:07,960
It's also common to model log returns instead of returns based on percentages.

25
00:02:08,770 --> 00:02:12,340
As an example, this is the basis for the famous Black-Scholes formula.

26
00:02:13,750 --> 00:02:19,630
Note that one possible issue with the log transform is that it doesn't accept zero or negative values

27
00:02:19,630 --> 00:02:20,270
as input.

28
00:02:21,190 --> 00:02:27,450
For this reason, it can only be used for data which is strictly positive for data that might be non-negative.

29
00:02:27,460 --> 00:02:30,610
It's common to simply add one before taking the log.

30
00:02:35,170 --> 00:02:41,350
OK, so a third transform we're going to discuss is the box cox transform, which generalises the concept

31
00:02:41,350 --> 00:02:44,270
of both the power transform and the log transform.

32
00:02:44,740 --> 00:02:50,020
You can see that it involves this parameter lambda, which is the power to use when taking the transform.

33
00:02:51,010 --> 00:02:52,630
So why does this make sense?

34
00:02:53,200 --> 00:02:58,900
This makes sense because the natural logarithm is actually the limit of this specific power transform

35
00:02:59,140 --> 00:03:00,730
as the power approaches zero.

36
00:03:01,990 --> 00:03:06,850
Now inside the box Cox function will automatically choose the value of Lambda for us.

37
00:03:07,210 --> 00:03:10,620
So we don't need to worry about finding the optimal value ourselves.

38
00:03:11,080 --> 00:03:15,880
But if you're interested in learning how this value is chosen, I'd encourage you to check out the CPA

39
00:03:15,880 --> 00:03:20,440
documentation as well as this article I've included in extra reading tea.

40
00:03:25,150 --> 00:03:30,790
So one common reason people give for why they use the Buzzcocks transform is that they want to make

41
00:03:30,790 --> 00:03:32,510
the data normally distributed.

42
00:03:33,370 --> 00:03:37,510
However, note that this motivation does not apply to Raw Time series.

43
00:03:38,020 --> 00:03:39,110
So why is this?

44
00:03:39,940 --> 00:03:42,850
Well, remember that Time series data is dynamic.

45
00:03:43,000 --> 00:03:44,300
It changes in time.

46
00:03:44,350 --> 00:03:45,470
It can have a trend.

47
00:03:46,000 --> 00:03:51,340
So when you take time series data and plot a histogram hoping that it will be normal, this is actually

48
00:03:51,340 --> 00:03:55,290
the wrong thing to do was discuss this more later in the course.

49
00:03:55,300 --> 00:04:01,300
But in order to take data over time and plot its distribution or histogram, we need that data to be

50
00:04:01,300 --> 00:04:02,140
stationary.

51
00:04:02,740 --> 00:04:06,490
Stationary essentially means distribution doesn't change over time.

52
00:04:07,780 --> 00:04:09,430
So why is this a requirement?

53
00:04:10,060 --> 00:04:14,380
Well, imagine you have some data which simply follows a line that grows at a constant rate.

54
00:04:15,010 --> 00:04:17,670
Does plotting the histogram of this data makes sense?

55
00:04:18,100 --> 00:04:19,060
The answer is no.

56
00:04:19,720 --> 00:04:22,030
What do we want this to be normally distributed?

57
00:04:22,420 --> 00:04:23,440
The answer is no.

58
00:04:23,950 --> 00:04:27,310
In fact, this data behaves much better with a linear trend.

59
00:04:27,820 --> 00:04:33,460
The point of plotting a histogram is to understand the distribution of the data, but the distribution

60
00:04:33,460 --> 00:04:38,020
at the bottom of this plot is clearly different from the distribution at the top of this plot.

61
00:04:38,650 --> 00:04:43,390
Therefore, it makes no sense to mix this data together into a single histogram.

62
00:04:43,780 --> 00:04:46,330
This does not tell us how the data is distributed.

63
00:04:50,960 --> 00:04:56,510
The final topic I want to discuss in this lecture is why the log transform is deeply fundamental.

64
00:04:56,990 --> 00:05:01,490
Not only is it useful mathematically, but it also seems to be part of nature itself.

65
00:05:02,180 --> 00:05:04,170
One example of this is perception.

66
00:05:04,730 --> 00:05:10,070
For example, although a normal conversation is ten thousand times louder than a whisper, it doesn't

67
00:05:10,070 --> 00:05:12,750
have ten thousand times the effect on your senses.

68
00:05:13,310 --> 00:05:18,050
That's why we use the decibel scale to measure sound, which is essentially a log transform.

69
00:05:19,810 --> 00:05:25,540
Another example of how the logarithm seems to simply be a part of nature is how we as humans interpret

70
00:05:25,540 --> 00:05:26,210
numbers.

71
00:05:26,740 --> 00:05:31,630
For example, if you have one thousand dollars in the bank, then losing one thousand dollars would

72
00:05:31,630 --> 00:05:32,710
be a pretty big deal.

73
00:05:33,190 --> 00:05:38,080
But if you have one billion dollars in the bank, spending one thousand dollars on a pair of jeans would

74
00:05:38,080 --> 00:05:39,300
feel completely normal.

75
00:05:40,330 --> 00:05:45,490
Another way to think of this is imagine going from zero dollars in wealth to one million.

76
00:05:45,880 --> 00:05:47,050
That's a pretty big jump.

77
00:05:47,620 --> 00:05:49,570
How about one million to two million?

78
00:05:49,990 --> 00:05:54,400
Although you still made the same amount of money, its utility is less so.

79
00:05:54,400 --> 00:05:58,690
One might model the utility of wealth as the logarithm of the wealth and.
