1
00:00:11,550 --> 00:00:15,390
In this lecture we are going to do what I call code preparation.

2
00:00:15,390 --> 00:00:21,330
It's like the in between step that will help us get from theory to code so we know what each of the

3
00:00:21,330 --> 00:00:24,660
steps are and we know conceptually how they work.

4
00:00:24,660 --> 00:00:29,840
The next question is what's the right syntax in PI talk to actually make it work.

5
00:00:29,850 --> 00:00:33,430
I like to think of this like how I think of learning programming languages.

6
00:00:33,480 --> 00:00:38,820
I hope everyone in this class knows Python since that's the language we'll be using.

7
00:00:39,030 --> 00:00:43,500
So let's say what I want to do is print the numbers from zero to nine inclusive.

8
00:00:43,800 --> 00:00:45,510
I know how to do that in Python.

9
00:00:45,540 --> 00:00:51,900
It's just for i in range 10 ten print I but let's say that now I want to learn how to do this in Java.

10
00:00:52,440 --> 00:00:53,280
If you know Java.

11
00:00:53,280 --> 00:00:55,520
Just pretend that you don't for a moment.

12
00:00:55,710 --> 00:00:57,990
What's the situation now.

13
00:00:57,990 --> 00:00:59,780
Well I understand the concept.

14
00:00:59,820 --> 00:01:00,890
The only challenges.

15
00:01:00,900 --> 00:01:03,500
What is the Java syntax for it.

16
00:01:03,510 --> 00:01:06,260
I know the concept but I don't know the syntax.

17
00:01:06,510 --> 00:01:13,150
So let's look at the syntax looking at this all the components should feel very familiar to you.

18
00:01:13,510 --> 00:01:15,030
Even if you've never seen it before.

19
00:01:15,760 --> 00:01:19,350
I can see the keyword for that's the same in both languages.

20
00:01:19,390 --> 00:01:21,550
That tells me I'm doing a for loop.

21
00:01:21,550 --> 00:01:24,400
Same concept different syntax.

22
00:01:24,400 --> 00:01:31,510
I see the index variable I again it's the same concept in both languages in a for loop.

23
00:01:31,520 --> 00:01:38,430
I might have an index variable something that iterate as The Loop progresses I see a lower limit and

24
00:01:38,430 --> 00:01:46,190
an upper limit the lower limit is zero and the upper limit is 10 which are inclusive and exclusive respectively.

25
00:01:46,200 --> 00:01:51,960
Same thing in Python except that the lower limit is implicitly zero in Python so you don't have to specify

26
00:01:51,960 --> 00:01:56,270
it unless you want something other than zero.

27
00:01:56,310 --> 00:02:02,460
Finally I see that inside the loop we again have the same concept printing but again just with a different

28
00:02:02,460 --> 00:02:03,850
syntax.

29
00:02:03,990 --> 00:02:10,560
So I hope you get the analogy here which is that no one concepts and syntax are separate things.

30
00:02:10,650 --> 00:02:15,360
You might know the concepts but in different libraries the syntax will be different.

31
00:02:15,360 --> 00:02:21,600
Number two since we already know the concepts our next task is to ask what is the syntax

32
00:02:26,770 --> 00:02:31,870
what I want you to realize at this point is that while this is the most simple scripts from this course

33
00:02:32,110 --> 00:02:37,600
it is also the most important you're going to notice a pattern in this course which is that no matter

34
00:02:37,600 --> 00:02:42,300
how complex what we do is these steps will essentially remain the same.

35
00:02:42,340 --> 00:02:45,590
Remember that what we want to focus on right now is number one.

36
00:02:45,700 --> 00:02:47,020
How do we build the model.

37
00:02:47,230 --> 00:02:48,060
Number two.

38
00:02:48,160 --> 00:02:49,450
How do we train the model.

39
00:02:49,450 --> 00:02:52,570
And number three how do we make predictions using the model

40
00:02:57,710 --> 00:02:58,950
let's start with number one.

41
00:02:58,970 --> 00:03:00,920
How do we build the model.

42
00:03:00,920 --> 00:03:03,150
At this stage it's going to be very easy.

43
00:03:03,200 --> 00:03:07,850
Just one line of code model equals an end linear one one.

44
00:03:07,850 --> 00:03:13,040
What this means is that we want a linear model with one input and one output.

45
00:03:13,040 --> 00:03:19,400
The first argument is for the input size and the second argument is for the output size there are cases

46
00:03:19,400 --> 00:03:23,340
where the input and output might be sizes other than one in one.

47
00:03:23,450 --> 00:03:25,450
But we'll see those later.

48
00:03:25,460 --> 00:03:31,070
Right now we're interested in doing simple linear regression where there is one x axis and one Y axis

49
00:03:36,240 --> 00:03:36,730
next step.

50
00:03:36,730 --> 00:03:39,180
Number two how do we train the model.

51
00:03:39,340 --> 00:03:44,380
If you've ever used other libraries such as KRS and psychic learn you might assume this would be easy

52
00:03:44,380 --> 00:03:45,100
in python.

53
00:03:45,130 --> 00:03:49,240
But unfortunately it doesn't encapsulate all the steps as you would expect.

54
00:03:50,860 --> 00:03:56,590
Luckily thanks to the concepts we just learn that will help us in understanding the syntax.

55
00:03:56,710 --> 00:04:02,950
The first step in training the model is to define a loss and an optimizer as we discussed the loss that

56
00:04:02,950 --> 00:04:08,500
we need for linear regression is the mean squared error and there just so happens to be this object

57
00:04:08,530 --> 00:04:11,210
called MSE loss in PI talk.

58
00:04:11,260 --> 00:04:15,070
Typically we call this object a criterion.

59
00:04:15,230 --> 00:04:19,930
There's one interesting thing to notice here which is that this is a bit of an exercise in what I call

60
00:04:19,960 --> 00:04:21,640
API hunting.

61
00:04:21,640 --> 00:04:26,400
If you think only about the concepts you know the equation for the MSE laws.

62
00:04:26,410 --> 00:04:32,110
If I gave you the whys and the white hats you could calculate this loss yourself but in PI talk it's

63
00:04:32,110 --> 00:04:34,040
not a matter of knowing the formula.

64
00:04:34,240 --> 00:04:36,820
It's a matter of knowing the syntax.

65
00:04:36,820 --> 00:04:41,590
So you need to find the right object that implements the laws you were looking for and then do the right

66
00:04:41,620 --> 00:04:44,920
things with that object at the same time.

67
00:04:44,950 --> 00:04:49,180
It's nice to know a little bit about the concepts so that you actually have some sense of what's going

68
00:04:49,180 --> 00:04:50,890
on.

69
00:04:50,890 --> 00:04:52,840
Next we have the optimizer.

70
00:04:52,840 --> 00:04:57,940
There are many such optimizes in PI torch but the simplest one is called as QED.

71
00:04:58,060 --> 00:05:03,760
When we create the as the object we pass in two arguments the models parameters which we get by calling

72
00:05:03,760 --> 00:05:06,640
the convenient function modeled up parameters.

73
00:05:06,640 --> 00:05:15,060
And the learning rate as a side note as these stands for stochastic gradient descent again in these

74
00:05:15,060 --> 00:05:19,740
lectures I'm not so concerned about what's going on mathematically but rather the syntax.

75
00:05:19,890 --> 00:05:24,480
If you'd like to understand what's going on mathematically I would encourage you to check out the in-depth

76
00:05:24,540 --> 00:05:25,980
sections of this course

77
00:05:31,170 --> 00:05:31,480
All right.

78
00:05:31,510 --> 00:05:37,360
So at this point we've defined a lost object in an optimizer object but we haven't yet done any computation

79
00:05:37,360 --> 00:05:38,500
with them.

80
00:05:38,590 --> 00:05:42,870
As you recall gradient descent involves doing a loop in inside the loop.

81
00:05:42,880 --> 00:05:48,240
You repeatedly apply the gradient descent equation so that's essentially what we want to do right now.

82
00:05:48,250 --> 00:05:49,420
But in PI torch.

83
00:05:49,570 --> 00:05:51,880
So obviously we need a loop.

84
00:05:51,880 --> 00:05:55,720
Usually we call each iteration of the loop an epoch.

85
00:05:55,720 --> 00:06:01,990
Unfortunately it's not so easy as simply translating this one line of mathematical expression into one

86
00:06:01,990 --> 00:06:03,170
line of code.

87
00:06:03,190 --> 00:06:06,610
There are multiple steps in the code that have to be done.

88
00:06:06,640 --> 00:06:09,270
The weirdest part just happens to come first.

89
00:06:09,280 --> 00:06:14,360
We have to call optimizer that 0 grad at this stage.

90
00:06:14,370 --> 00:06:18,620
I like to think of it just as some boilerplate code something that needs to be done.

91
00:06:18,660 --> 00:06:21,070
So just put it there and forget about it.

92
00:06:21,300 --> 00:06:26,640
If you want to know what it does then what we can say is that internally Pi which accumulates gradients

93
00:06:26,670 --> 00:06:31,970
whenever you call lost out backward which is a line that shows up later in this loop.

94
00:06:31,980 --> 00:06:37,020
Personally I would not worry about that because ninety nine percent of the time we'll be doing exactly

95
00:06:37,020 --> 00:06:37,940
what you see here.

96
00:06:38,100 --> 00:06:44,280
And so far none of the examples in this course require you to actually accumulate radiance.

97
00:06:44,280 --> 00:06:49,980
The rest of the steps are relatively straightforward as you know in order to calculate the loss.

98
00:06:49,980 --> 00:06:51,150
We need the predictions.

99
00:06:51,150 --> 00:06:56,310
The Y happens in order to get those we pass them into our model.

100
00:06:56,310 --> 00:06:59,790
This allows us to see the functional aspect of Pi talk.

101
00:07:00,780 --> 00:07:06,810
Even though the variable model is an object of type of linear as you recall that's what we created earlier.

102
00:07:06,810 --> 00:07:09,770
It can be used as if it were a function.

103
00:07:09,840 --> 00:07:16,470
So when we pass in some inputs into the model and call it like a function we get the model outputs.

104
00:07:16,470 --> 00:07:22,220
The next step is to pass the arguments and the outputs into our criteria on the means squared error.

105
00:07:22,220 --> 00:07:26,350
Again this is an object but we're using it like a function.

106
00:07:26,370 --> 00:07:29,010
The next step is to call lost not backward.

107
00:07:29,190 --> 00:07:32,350
Conceptually you can think of this as calculating the gradients.

108
00:07:32,370 --> 00:07:38,310
For example the partial derivative of L with respect to M and the partial derivative of L with respect

109
00:07:38,310 --> 00:07:46,180
to B the final step is to actually apply the gradients using the gradient descent equation as you'll

110
00:07:46,180 --> 00:07:47,100
see later.

111
00:07:47,140 --> 00:07:50,300
There are a lot of variations to the basic gradient descent.

112
00:07:50,440 --> 00:07:53,620
So once you have the gradients there are a few different things you can do with them.

113
00:07:53,650 --> 00:07:54,850
Aside from just this

114
00:07:59,870 --> 00:08:05,130
and you may have noticed something strange in the previous coach that been previously I call the data

115
00:08:05,190 --> 00:08:11,280
X and Y but in the previous code snippet I use the variable names inputs and targets.

116
00:08:11,280 --> 00:08:14,010
Is this just a matter of some trivial renaming.

117
00:08:14,190 --> 00:08:17,580
Or did I really mean to say that they are different things.

118
00:08:17,580 --> 00:08:20,070
In fact they are different things.

119
00:08:20,220 --> 00:08:26,090
When I load in the data I usually call the inputs X and the targets y and usually they are no higher

120
00:08:26,090 --> 00:08:32,610
is but one important detail that we haven't looked at yet is that pi talk does not work with regular

121
00:08:32,610 --> 00:08:38,310
no higher res instead pi torch worked with torch tenses.

122
00:08:38,480 --> 00:08:43,550
These are basically arrays but they are special arrays that are part of the pie towards library rather

123
00:08:43,550 --> 00:08:45,050
than the name pi library.

124
00:08:45,140 --> 00:08:50,780
So they have some special functionality so some conversion has to take place before you pass the data

125
00:08:50,810 --> 00:08:51,620
into your model

126
00:08:56,830 --> 00:08:58,080
in the following example.

127
00:08:58,090 --> 00:09:02,130
For simple linear regression the steps are what you see here.

128
00:09:02,140 --> 00:09:07,900
First we're going to reshape x and y to be end by one instead of just n in machine learning.

129
00:09:07,900 --> 00:09:13,240
We typically assume that our data is two dimensional but the number of samples along the rows and the

130
00:09:13,240 --> 00:09:15,790
number of dimensions along the columns.

131
00:09:15,790 --> 00:09:21,520
This applies even for the targets which is not the case in library such as carers or psychic learn so

132
00:09:21,550 --> 00:09:26,600
be aware of that second Pi which is really picky about data types.

133
00:09:26,710 --> 00:09:32,830
It won't allow you to multiply a flow 32 by a float 64 which is also known as a double.

134
00:09:32,830 --> 00:09:36,120
So you have to be mindful of this also by default.

135
00:09:36,130 --> 00:09:42,300
If you don't specify anything pi touch will create all your variables as float 32 as in num pi.

136
00:09:42,310 --> 00:09:46,390
If you don't specify anything your arrays will be float sixty fours.

137
00:09:46,480 --> 00:09:53,820
So before we create torch tenses we're going to cast our name pi arrays into float 32 the final step

138
00:09:53,850 --> 00:09:57,020
is to call the function torch start from name PI.

139
00:09:57,030 --> 00:10:01,970
This takes as input our number higher res and returns corresponding towards tensor is.

140
00:10:02,070 --> 00:10:06,330
So these are the inputs and targets we were working with in the previous training loop

141
00:10:11,650 --> 00:10:14,160
the final step is to make predictions.

142
00:10:14,230 --> 00:10:19,990
You might think we already did that in the training loop when we called model of inputs but the important

143
00:10:19,990 --> 00:10:26,460
thing to remember is that pi talk only works with torch sensors and the inputs variable is a torch tensor.

144
00:10:27,310 --> 00:10:32,080
It's reasonable to assume then that the output is also a torch tensor.

145
00:10:32,080 --> 00:10:37,570
Of course if you want to do any later processing such as plotting the array with map plot lib you might

146
00:10:37,570 --> 00:10:41,670
want to have a name pi array rather than a torch tensor.

147
00:10:41,830 --> 00:10:47,020
Therefore in addition to actually calculating the output you'll also want to convert it back into an

148
00:10:47,020 --> 00:10:47,650
umpire.

149
00:10:48,580 --> 00:10:51,790
Normally you can do this just using the NUM pi function.

150
00:10:51,790 --> 00:10:57,040
However pi torture also does some stuff behind the scenes where it creates a graph and keeps track of

151
00:10:57,040 --> 00:10:59,040
the gradients in the graph.

152
00:10:59,100 --> 00:11:03,170
Basically you can just call the detach function before you call them and Pi function.

153
00:11:03,310 --> 00:11:05,890
In order to detach that tensor from the graph

154
00:11:11,100 --> 00:11:11,450
all right.

155
00:11:11,460 --> 00:11:15,450
So that was our code preparation for simple linear regression.

156
00:11:15,450 --> 00:11:21,210
I said this before but it's worth repeating while this is the simplest example in the course it's also

157
00:11:21,210 --> 00:11:27,990
the most important because it introduces a lot of new concepts mostly about how to use pi torch in future

158
00:11:27,990 --> 00:11:28,590
examples.

159
00:11:28,590 --> 00:11:31,770
We'll just be repeating these same concepts over and over again.

160
00:11:31,860 --> 00:11:33,980
So you'll get lots of practice.

161
00:11:34,140 --> 00:11:39,150
You've seen that pi torch is quite a bit more complicated than simply instantiating a model calling

162
00:11:39,150 --> 00:11:40,770
fit and then calling predict.
