1
00:00:00,920 --> 00:00:03,930
In the last videos you saw how
to prepare time series data for

2
00:00:03,930 --> 00:00:08,000
machine learning by creating
a window dataset where the previous

3
00:00:08,000 --> 00:00:11,350
n values could be seen as
the input features are x.

4
00:00:11,350 --> 00:00:17,240
And the current value with any time
stamp is the output label or the y.

5
00:00:17,240 --> 00:00:19,050
It would then look a little bit like this.

6
00:00:19,050 --> 00:00:23,710
With a number of input values on x,
typically called a window on the data.

7
00:00:23,710 --> 00:00:26,306
You saw in the previous
videos how to use the tools

8
00:00:26,306 --> 00:00:29,520
in tensor flow datasets
to create these windows.

9
00:00:29,520 --> 00:00:33,010
This week you'll adapt that code
to feed a neural network and

10
00:00:33,010 --> 00:00:34,010
then train it on the data.

11
00:00:35,780 --> 00:00:39,310
So let's start with this function
that will call a windows dataset.

12
00:00:39,310 --> 00:00:42,570
It will take in a data series
along with the parameters for

13
00:00:42,570 --> 00:00:44,740
the size of the window that we want.

14
00:00:44,740 --> 00:00:47,400
The size of the batches
to use when training, and

15
00:00:47,400 --> 00:00:50,849
the size of the shuffle buffer, which
determines how the data will be shuffled.

16
00:00:51,920 --> 00:00:57,274
The first step will be to create a dataset
from the series using a tf.data dataset.

17
00:00:57,274 --> 00:01:01,030
And we'll pass the series to it
using its from_tensor_slices method.

18
00:01:02,890 --> 00:01:05,400
We will then use the window
method of the dataset

19
00:01:05,400 --> 00:01:09,880
based on our window_size to slice
the data up into the appropriate windows.

20
00:01:09,880 --> 00:01:12,420
Each one being shifted by one time set.

21
00:01:12,420 --> 00:01:15,840
We'll keep them all the same size
by setting drop remainder to true.

22
00:01:18,180 --> 00:01:21,380
We then flatten the data out to
make it easier to work with.

23
00:01:21,380 --> 00:01:25,220
And it will be flattened into chunks
in the size of our window_size + 1.

24
00:01:26,860 --> 00:01:29,550
Once it's flattened,
it's easy to shuffle it.

25
00:01:29,550 --> 00:01:32,740
You call a shuffle and
you pass it the shuffle buffer.

26
00:01:32,740 --> 00:01:35,360
Using a shuffle buffer
speeds things up a bit.

27
00:01:35,360 --> 00:01:38,730
So for example, if you have
100,000 items in your dataset, but

28
00:01:38,730 --> 00:01:40,810
you set the buffer to a thousand.

29
00:01:40,810 --> 00:01:43,888
It will just fill the buffer with
the first thousand elements,

30
00:01:43,888 --> 00:01:45,236
pick one of them at random.

31
00:01:45,236 --> 00:01:47,985
And then it will replace
that with the 1,000 and

32
00:01:47,985 --> 00:01:51,056
first element before randomly
picking again, and so on.

33
00:01:51,056 --> 00:01:55,245
This way with super large datasets, the
random element choosing can choose from

34
00:01:55,245 --> 00:01:58,089
a smaller number which
effectively speeds things up.

35
00:02:00,128 --> 00:02:02,930
The shuffled dataset is
then split into the xs,

36
00:02:02,930 --> 00:02:06,970
which is all of the elements except the
last, and the y which is the last element.

37
00:02:09,330 --> 00:02:12,280
It's then batched into the selected
batch size and returned.