1
00:00:00,000 --> 00:00:02,730
In the previous video,
you saw how to prepare

2
00:00:02,730 --> 00:00:06,000
a time series for machine
learning by windowing the data.

3
00:00:06,000 --> 00:00:09,150
You saw the code to create
a very simple dataset,

4
00:00:09,150 --> 00:00:10,980
and then you learned
how you can prepare

5
00:00:10,980 --> 00:00:14,325
that dataset as features
and labels or x's and y's.

6
00:00:14,325 --> 00:00:16,080
In this video, you'll go through

7
00:00:16,080 --> 00:00:17,670
a screencast of a notebook

8
00:00:17,670 --> 00:00:19,725
that contains all of that code.

9
00:00:19,725 --> 00:00:21,690
Okay. To start, you'll need to

10
00:00:21,690 --> 00:00:23,670
have TensorFlow 2.0 installed.

11
00:00:23,670 --> 00:00:25,830
To check your version,
run this code block.

12
00:00:25,830 --> 00:00:28,875
The output should
read 2.0.0 something.

13
00:00:28,875 --> 00:00:30,810
In this case, I'm
running a nightly build

14
00:00:30,810 --> 00:00:33,060
hence the dev code at the end.

15
00:00:33,060 --> 00:00:36,315
If you need 2.0, you can
install with code like this.

16
00:00:36,315 --> 00:00:39,420
The tf-nightly-2.0-preview
is the dev version

17
00:00:39,420 --> 00:00:41,580
I just mentioned
or alternatively,

18
00:00:41,580 --> 00:00:44,600
you can install with pip
install tensorflow like this.

19
00:00:44,600 --> 00:00:46,190
The version might
be different than

20
00:00:46,190 --> 00:00:48,230
b1 because at the time
I recorded this,

21
00:00:48,230 --> 00:00:50,705
TensorFlow had been
released as beta 1.

22
00:00:50,705 --> 00:00:53,240
Check tensorflow.org for
the latest instructions

23
00:00:53,240 --> 00:00:55,475
for installing
the production bits.

24
00:00:55,475 --> 00:00:57,290
Once you've installed TensorFlow,

25
00:00:57,290 --> 00:00:58,720
you can run this code.

26
00:00:58,720 --> 00:01:00,975
First, we'll create
a simple dataset,

27
00:01:00,975 --> 00:01:02,750
and it's just a simple
range containing

28
00:01:02,750 --> 00:01:04,655
10 elements from zero to nine.

29
00:01:04,655 --> 00:01:07,745
We'll print each one out on
its own line as you can see.

30
00:01:07,745 --> 00:01:11,060
Next, we'll window the data
into chunks of five items,

31
00:01:11,060 --> 00:01:12,770
shifting by one each time.

32
00:01:12,770 --> 00:01:14,240
We'll see that this gives us

33
00:01:14,240 --> 00:01:16,255
the output of
the first five items,

34
00:01:16,255 --> 00:01:17,710
and then the second five items,

35
00:01:17,710 --> 00:01:20,255
and then the third
five items, etc.

36
00:01:20,255 --> 00:01:21,915
At the end of the dataset,

37
00:01:21,915 --> 00:01:24,155
when there isn't enough data
to give us five items,

38
00:01:24,155 --> 00:01:26,155
you'll see shorter lines.

39
00:01:26,155 --> 00:01:29,130
To just get chunks
of five records,

40
00:01:29,130 --> 00:01:31,280
we'll set drop_reminder to true.

41
00:01:31,280 --> 00:01:34,475
When we run it, we'll see that
our data looks like this.

42
00:01:34,475 --> 00:01:37,690
We've got even sets
that are the same size.

43
00:01:37,690 --> 00:01:41,480
TensorFlow likes its data
to be in numpy format.

44
00:01:41,480 --> 00:01:43,310
So we can convert it
easily by calling

45
00:01:43,310 --> 00:01:45,845
the dot numpy method
and when we print it,

46
00:01:45,845 --> 00:01:49,020
we can see it's now listed
in square brackets.

47
00:01:49,810 --> 00:01:52,460
Next up is to split into

48
00:01:52,460 --> 00:01:54,605
x's and y's or
features and labels.

49
00:01:54,605 --> 00:01:56,675
We'll take the last
column as the label,

50
00:01:56,675 --> 00:01:58,445
and we'll split using a lambda.

51
00:01:58,445 --> 00:02:00,935
We'll split the data
into column minus one,

52
00:02:00,935 --> 00:02:03,500
which is all of the columns
except the last one,

53
00:02:03,500 --> 00:02:06,425
and minus one column which
is the last one only.

54
00:02:06,425 --> 00:02:08,570
Now we can see that
we have a set of

55
00:02:08,570 --> 00:02:10,820
four items and a single item.

56
00:02:10,820 --> 00:02:12,830
Remember that
the minus one column

57
00:02:12,830 --> 00:02:14,675
denotes the last value
in the list,

58
00:02:14,675 --> 00:02:16,100
and column minus one

59
00:02:16,100 --> 00:02:18,185
denotes everything
about the last value.

60
00:02:18,185 --> 00:02:20,360
As such, we can see
zero, one, two,

61
00:02:20,360 --> 00:02:21,890
three and one, two, three,

62
00:02:21,890 --> 00:02:25,290
four before the split
just for example.

63
00:02:25,390 --> 00:02:28,640
Next of course, is
to shuffle the data.

64
00:02:28,640 --> 00:02:30,755
This is achieved with
the shuffle method.

65
00:02:30,755 --> 00:02:33,260
This helps us to rearrange
the data so as not to

66
00:02:33,260 --> 00:02:36,155
accidentally introduce
a sequence bias.

67
00:02:36,155 --> 00:02:38,390
Multiple runs will
show the data in

68
00:02:38,390 --> 00:02:41,480
different arrangements because
it gets shuffled randomly.

69
00:02:41,480 --> 00:02:43,885
Finally, comes batching.

70
00:02:43,885 --> 00:02:45,725
By setting a batch size of two,

71
00:02:45,725 --> 00:02:48,865
our data gets batched into two
x's and two y's at a time.

72
00:02:48,865 --> 00:02:50,860
For example, as we saw earlier,

73
00:02:50,860 --> 00:02:52,615
if x is zero, one, two, three,

74
00:02:52,615 --> 00:02:54,100
we can see that
the corresponding y

75
00:02:54,100 --> 00:02:55,720
is four or if x is five,

76
00:02:55,720 --> 00:02:58,670
six, seven, eight,
then our y is nine.

77
00:02:58,950 --> 00:03:01,840
So that's the workbook
with the code that

78
00:03:01,840 --> 00:03:04,150
splits a data series
into windows.

79
00:03:04,150 --> 00:03:05,380
Try it out for yourself,

80
00:03:05,380 --> 00:03:07,060
and once you're familiar
with what it does,

81
00:03:07,060 --> 00:03:08,765
proceed to the next video.

82
00:03:08,765 --> 00:03:11,050
There you'll move to
the seasonal dataset

83
00:03:11,050 --> 00:03:12,595
that you've been using two dates,

84
00:03:12,595 --> 00:03:14,410
and with this
windowing technique,

85
00:03:14,410 --> 00:03:17,680
you'll see how to set up x's
and y's that can be fed into

86
00:03:17,680 --> 00:03:19,405
a neural network to see how

87
00:03:19,405 --> 00:03:22,490
it performs with
predicting values.