1
00:00:00,000 --> 00:00:02,610
Now that we have
a window datasets,

2
00:00:02,610 --> 00:00:05,145
we can start training
neural networks with it.

3
00:00:05,145 --> 00:00:07,035
Let's start with
a super simple one

4
00:00:07,035 --> 00:00:09,210
that's effectively a
linear regression.

5
00:00:09,210 --> 00:00:10,920
We'll measure its accuracy,

6
00:00:10,920 --> 00:00:13,605
and then we'll work from
there to improve that.

7
00:00:13,605 --> 00:00:15,975
Before we can do a training,

8
00:00:15,975 --> 00:00:17,580
we have to split our dataset

9
00:00:17,580 --> 00:00:19,605
into training and
validation sets.

10
00:00:19,605 --> 00:00:22,830
Here's the code to do
that at time step 1000.

11
00:00:22,830 --> 00:00:26,250
We can see that the training
data is the subset of

12
00:00:26,250 --> 00:00:30,750
the series called x train
up to the split time.

13
00:00:30,750 --> 00:00:34,170
Here's the code to do
a simple linear regression.

14
00:00:34,170 --> 00:00:36,195
Let's look at it line by line.

15
00:00:36,195 --> 00:00:39,270
We'll start by setting
up all the constants

16
00:00:39,270 --> 00:00:42,300
that we want to pass to
the window dataset function.

17
00:00:42,300 --> 00:00:44,900
These include the window
size on the data,

18
00:00:44,900 --> 00:00:46,940
the batch size that
we want for training,

19
00:00:46,940 --> 00:00:48,605
and the size of
the shuffled buffer

20
00:00:48,605 --> 00:00:50,360
as we've just discussed.

21
00:00:50,360 --> 00:00:52,970
Then we'll create our dataset.

22
00:00:52,970 --> 00:00:55,210
We'll do this by
taking our series,

23
00:00:55,210 --> 00:00:57,170
and in the notebook that
you'll go through later,

24
00:00:57,170 --> 00:00:58,700
you'll create the same synthetic

25
00:00:58,700 --> 00:01:00,470
series as you did in week one.

26
00:01:00,470 --> 00:01:02,540
You'll pass it your series along

27
00:01:02,540 --> 00:01:04,760
what your desired window
size, batch size,

28
00:01:04,760 --> 00:01:06,200
and shuffled buffer size,

29
00:01:06,200 --> 00:01:08,810
and it will give you back
a formatted datasets

30
00:01:08,810 --> 00:01:10,430
that you could use for training.

31
00:01:10,430 --> 00:01:12,065
I'm then going to create

32
00:01:12,065 --> 00:01:13,910
a single dense layer with

33
00:01:13,910 --> 00:01:16,610
its input shape being
the window size.

34
00:01:16,610 --> 00:01:18,945
For linear regression,
that's all you need.

35
00:01:18,945 --> 00:01:20,190
I'm using this approach.

36
00:01:20,190 --> 00:01:23,060
By passing the layer to
a variable called L0,

37
00:01:23,060 --> 00:01:26,630
because later I'm want to
print out its learned weights,

38
00:01:26,630 --> 00:01:28,520
and it's a lot easier for
me to do that if I have

39
00:01:28,520 --> 00:01:31,270
a variable to refer to
the layer for that.

40
00:01:31,270 --> 00:01:33,860
Then I simply define
my model as a

41
00:01:33,860 --> 00:01:37,190
sequential containing
the sole layer just like this.

42
00:01:37,190 --> 00:01:40,610
Now I'll compile and fit
my model with this code.

43
00:01:40,610 --> 00:01:42,830
I'll use the mean
squared error loss

44
00:01:42,830 --> 00:01:45,535
function by setting loss to MSE,

45
00:01:45,535 --> 00:01:48,770
and my optimizer will use
Stochastic Gradient Descent.

46
00:01:48,770 --> 00:01:51,610
I'd use this methodology
instead of the raw string,

47
00:01:51,610 --> 00:01:54,530
so I can set parameters on
it to initialize it such as

48
00:01:54,530 --> 00:01:57,850
the learning rate or
LR and the momentum.

49
00:01:57,850 --> 00:01:59,570
Experiment with different values

50
00:01:59,570 --> 00:02:00,590
here to see if you can get

51
00:02:00,590 --> 00:02:04,175
your model to converge more
quickly or more accurately.

52
00:02:04,175 --> 00:02:06,500
Next you can fit your model

53
00:02:06,500 --> 00:02:08,330
by just passing it the dataset,

54
00:02:08,330 --> 00:02:10,070
which has already
been preformatted

55
00:02:10,070 --> 00:02:11,600
with the x and y values.

56
00:02:11,600 --> 00:02:13,790
I'm going to run for
a 100 epochs here.

57
00:02:13,790 --> 00:02:15,380
Ignoring the epoch but epoch

58
00:02:15,380 --> 00:02:18,500
output by setting
verbose to zero.

59
00:02:18,500 --> 00:02:20,345
Once it's done training,

60
00:02:20,345 --> 00:02:21,695
you can actually inspect

61
00:02:21,695 --> 00:02:23,810
the different weights
with this code.

62
00:02:23,810 --> 00:02:25,790
Remember earlier
when we referred to

63
00:02:25,790 --> 00:02:28,190
the layer with
a variable called L 0?

64
00:02:28,190 --> 00:02:29,975
Well, here's where that's useful.

65
00:02:29,975 --> 00:02:32,195
The output will look like this.

66
00:02:32,195 --> 00:02:33,530
If you inspect it closely,

67
00:02:33,530 --> 00:02:36,350
you will see that the first
array has 20 values in it,

68
00:02:36,350 --> 00:02:38,515
and the secondary
has only one value.

69
00:02:38,515 --> 00:02:40,655
This is because
the network has learned

70
00:02:40,655 --> 00:02:41,960
a linear regression to

71
00:02:41,960 --> 00:02:43,910
fit the values as
best as they can.

72
00:02:43,910 --> 00:02:46,520
So each of the values
in the first array can

73
00:02:46,520 --> 00:02:49,475
be seen as the weights
for the 20 values in x,

74
00:02:49,475 --> 00:02:53,250
and the value for
the second array is the b value.