1
00:00:00,000 --> 00:00:02,820
Time-series come in
all shapes and sizes,

2
00:00:02,820 --> 00:00:05,655
but there are a number
of very common patterns.

3
00:00:05,655 --> 00:00:08,355
So it's useful to recognize
them when you see them.

4
00:00:08,355 --> 00:00:09,720
For the next few minutes we'll

5
00:00:09,720 --> 00:00:11,895
take a look at some examples.

6
00:00:11,895 --> 00:00:14,234
The first is trend,

7
00:00:14,234 --> 00:00:15,540
where time series have

8
00:00:15,540 --> 00:00:18,045
a specific direction
that they're moving in.

9
00:00:18,045 --> 00:00:19,725
As you can see from
the Moore's Law

10
00:00:19,725 --> 00:00:21,285
example we showed earlier,

11
00:00:21,285 --> 00:00:23,880
this is an upwards facing trend.

12
00:00:23,880 --> 00:00:26,760
Another concept is seasonality,

13
00:00:26,760 --> 00:00:28,170
which is seen when patterns

14
00:00:28,170 --> 00:00:30,480
repeat at predictable intervals.

15
00:00:30,480 --> 00:00:32,925
For example, take a look
at this chart showing

16
00:00:32,925 --> 00:00:36,195
active users at a website
for software developers.

17
00:00:36,195 --> 00:00:39,950
It follows a very distinct
pattern of regular dips.

18
00:00:39,950 --> 00:00:41,335
Can you guess what they are?

19
00:00:41,335 --> 00:00:43,220
Well, what if I told
you if it was up for

20
00:00:43,220 --> 00:00:45,290
five units and then down for two?

21
00:00:45,290 --> 00:00:47,090
Then you could tell that it very

22
00:00:47,090 --> 00:00:48,560
clearly dips on the weekends

23
00:00:48,560 --> 00:00:50,015
when less people are working

24
00:00:50,015 --> 00:00:52,265
and thus it shows seasonality.

25
00:00:52,265 --> 00:00:54,200
Other seasonal series could

26
00:00:54,200 --> 00:00:55,700
be shopping sites that peak on

27
00:00:55,700 --> 00:00:57,710
weekends or sport sites

28
00:00:57,710 --> 00:00:59,915
that peak at various times
throughout the year,

29
00:00:59,915 --> 00:01:01,910
like the draft or opening day,

30
00:01:01,910 --> 00:01:03,560
the All-Star day playoffs

31
00:01:03,560 --> 00:01:05,755
and maybe the championship game.

32
00:01:05,755 --> 00:01:09,485
Of course, some time series
can have a combination

33
00:01:09,485 --> 00:01:13,135
of both trend and seasonality
as this chart shows.

34
00:01:13,135 --> 00:01:15,290
There's an overall upwards trend

35
00:01:15,290 --> 00:01:18,190
but there are
local peaks and troughs.

36
00:01:18,190 --> 00:01:20,840
But of course, there
are also some that are

37
00:01:20,840 --> 00:01:23,330
probably not
predictable at all and

38
00:01:23,330 --> 00:01:25,655
just a complete set
of random values

39
00:01:25,655 --> 00:01:28,400
producing what's typically
called white noise.

40
00:01:28,400 --> 00:01:29,510
There's not a whole lot you can

41
00:01:29,510 --> 00:01:31,385
do with this type of data.

42
00:01:31,385 --> 00:01:34,190
But then consider
this time series.

43
00:01:34,190 --> 00:01:36,800
There's no trend and
there's no seasonality.

44
00:01:36,800 --> 00:01:39,185
The spikes appear at
random timestamps.

45
00:01:39,185 --> 00:01:40,610
You can't predict when that will

46
00:01:40,610 --> 00:01:42,650
happen next or how
strong they will be.

47
00:01:42,650 --> 00:01:45,865
But clearly, the entire
series isn't random.

48
00:01:45,865 --> 00:01:47,420
Between the spikes there's

49
00:01:47,420 --> 00:01:50,390
a very deterministic
type of decay.

50
00:01:50,390 --> 00:01:54,020
We can see here that the value
of each time step is

51
00:01:54,020 --> 00:01:55,580
99 percent of the value of

52
00:01:55,580 --> 00:01:58,670
the previous time step
plus an occasional spike.

53
00:01:58,670 --> 00:02:01,860
This is an auto
correlated time series.

54
00:02:01,860 --> 00:02:03,590
Namely it correlates with

55
00:02:03,590 --> 00:02:07,770
a delayed copy of itself
often called a lag.

56
00:02:07,900 --> 00:02:10,835
This example you can see at lag

57
00:02:10,835 --> 00:02:13,415
one there's a strong
autocorrelation.

58
00:02:13,415 --> 00:02:16,460
Often a time series like
this is described as having

59
00:02:16,460 --> 00:02:20,465
memory as steps are
dependent on previous ones.

60
00:02:20,465 --> 00:02:22,820
The spikes which
are unpredictable

61
00:02:22,820 --> 00:02:25,025
are often called Innovations.

62
00:02:25,025 --> 00:02:26,390
In other words, they cannot be

63
00:02:26,390 --> 00:02:28,865
predicted based on past values.

64
00:02:28,865 --> 00:02:31,070
Another example is here where

65
00:02:31,070 --> 00:02:33,080
there are multiple
autocorrelations,

66
00:02:33,080 --> 00:02:35,840
in this case, at time
steps one and 50.

67
00:02:35,840 --> 00:02:38,570
The lag one autocorrelation gives

68
00:02:38,570 --> 00:02:42,130
these very quick short-term
exponential delays,

69
00:02:42,130 --> 00:02:46,290
and the 50 gives the small
balance after each spike.

70
00:02:46,290 --> 00:02:49,910
Time series you'll encounter
in real life probably

71
00:02:49,910 --> 00:02:52,850
have a bit of each of
these features: trend,

72
00:02:52,850 --> 00:02:56,275
seasonality,
autocorrelation, and noise.

73
00:02:56,275 --> 00:02:59,075
As we've learned
a machine-learning model

74
00:02:59,075 --> 00:03:01,115
is designed to spot patterns,

75
00:03:01,115 --> 00:03:04,325
and when we spot patterns
we can make predictions.

76
00:03:04,325 --> 00:03:06,740
For the most part this
can also work with

77
00:03:06,740 --> 00:03:09,800
time series except for the noise
which is unpredictable.

78
00:03:09,800 --> 00:03:11,810
But we should recognize that this

79
00:03:11,810 --> 00:03:13,610
assumes that patterns
that existed

80
00:03:13,610 --> 00:03:18,035
in the past will of course
continue on into the future.

81
00:03:18,035 --> 00:03:20,855
Of course, real life time series

82
00:03:20,855 --> 00:03:22,295
are not always that simple.

83
00:03:22,295 --> 00:03:25,220
Their behavior can change
drastically over time.

84
00:03:25,220 --> 00:03:27,470
For example, this time series had

85
00:03:27,470 --> 00:03:28,640
a positive trend and

86
00:03:28,640 --> 00:03:31,460
a clear seasonality
up to time step 200.

87
00:03:31,460 --> 00:03:33,155
But then something happened

88
00:03:33,155 --> 00:03:35,300
to change its behavior
completely.

89
00:03:35,300 --> 00:03:37,130
If this were stock, price then

90
00:03:37,130 --> 00:03:39,110
maybe it was a big
financial crisis or

91
00:03:39,110 --> 00:03:40,490
a big scandal or

92
00:03:40,490 --> 00:03:43,160
perhaps a disruptive
technological breakthrough

93
00:03:43,160 --> 00:03:45,020
causing a massive change.

94
00:03:45,020 --> 00:03:47,630
After that the time series
started to trend

95
00:03:47,630 --> 00:03:50,720
downward without
any clear seasonality.

96
00:03:50,720 --> 00:03:54,980
We'll typically call this a
non-stationary time series.

97
00:03:54,980 --> 00:03:57,680
To predict on this we could

98
00:03:57,680 --> 00:03:59,870
just train for
limited period of time.

99
00:03:59,870 --> 00:04:03,710
For example, here where I
take just the last 100 steps.

100
00:04:03,710 --> 00:04:05,930
You'll probably get
a better performance than

101
00:04:05,930 --> 00:04:08,360
if you had trained on
the entire time series.

102
00:04:08,360 --> 00:04:11,030
But that's breaking the mold
for typical machine,

103
00:04:11,030 --> 00:04:14,540
learning where we always assume
that more data is better.

104
00:04:14,540 --> 00:04:16,475
But for time series forecasting

105
00:04:16,475 --> 00:04:18,590
it really depends
on the time series.

106
00:04:18,590 --> 00:04:20,600
If it's stationary, meaning

107
00:04:20,600 --> 00:04:23,210
its behavior does not change
over time, then great.

108
00:04:23,210 --> 00:04:24,890
The more data you
have the better.

109
00:04:24,890 --> 00:04:26,825
But if it's not stationary

110
00:04:26,825 --> 00:04:28,580
then the optimal
time window that you

111
00:04:28,580 --> 00:04:31,070
should use for
training will vary.

112
00:04:31,070 --> 00:04:33,230
Ideally, we would like

113
00:04:33,230 --> 00:04:34,880
to be able to take
the whole series

114
00:04:34,880 --> 00:04:36,500
into account and generate

115
00:04:36,500 --> 00:04:38,765
a prediction for what
might happen next.

116
00:04:38,765 --> 00:04:41,810
As you can see, this isn't
always as simple as you might

117
00:04:41,810 --> 00:04:46,020
think given a drastic change
like the one we see here.

118
00:04:46,360 --> 00:04:48,680
So that's some of
what you're going to

119
00:04:48,680 --> 00:04:50,360
be looking at in this course.

120
00:04:50,360 --> 00:04:52,910
But let's start by going
through a workbook that

121
00:04:52,910 --> 00:04:56,390
generates sequences like
those you saw in this video.

122
00:04:56,390 --> 00:04:59,030
After that we'll then
try to predict some of

123
00:04:59,030 --> 00:05:00,980
these synthesized sequences as

124
00:05:00,980 --> 00:05:04,830
a practice before later we'll
move on to real-world data.