1
00:00:11,540 --> 00:00:16,550
In this lecture we are going to look at the code lab notebook that performs linear regression in order

2
00:00:16,550 --> 00:00:22,750
to prove Moore's Law is true that the transistor count doubles approximately every two years.

3
00:00:22,790 --> 00:00:27,950
This lecture is going to walk you through a prepared to call lab notebook although a very good exercise

4
00:00:27,980 --> 00:00:33,890
which I always recommend is once you know how this is done to try and recreate it yourself with as few

5
00:00:33,890 --> 00:00:35,870
references as possible.

6
00:00:35,870 --> 00:00:40,760
As usual you can look at the title of the notebook to determine what notebook we are currently looking

7
00:00:40,760 --> 00:00:41,030
at.

8
00:00:43,010 --> 00:00:49,070
To start we're going to have all the same inputs as before with one additional import which is pandas.

9
00:00:49,070 --> 00:00:55,340
Pandas is useful for working with CSB files which is the format our data happens to be in to obtain

10
00:00:55,340 --> 00:01:00,620
the data we are going to use the w get method to get the CSB file hosted on my github

11
00:01:06,290 --> 00:01:06,640
next.

12
00:01:06,650 --> 00:01:08,900
We are going to load in the data.

13
00:01:08,900 --> 00:01:13,040
The data doesn't contain any headers and it only contains two columns.

14
00:01:13,040 --> 00:01:16,240
I would recommend verifying that this is the case.

15
00:01:16,370 --> 00:01:23,180
The next thing we want to do is reshape x so that it's a two dimensional array of size end by one rather

16
00:01:23,180 --> 00:01:24,570
than a one day array of length.

17
00:01:24,560 --> 00:01:28,450
Then we also need to do the same thing for y in PI talk.

18
00:01:28,550 --> 00:01:32,060
Although this generally doesn't extend to other libraries.

19
00:01:32,060 --> 00:01:36,370
Remember that the convention is X is a 2D array of size and by D.

20
00:01:36,380 --> 00:01:46,050
And why is a 2D array of size and by K where k is the number of outputs.

21
00:01:46,090 --> 00:01:49,240
Next we are going to do a scatter plot of this data.

22
00:01:49,240 --> 00:01:52,360
Notice that it shows exponential growth which we already know

23
00:02:01,790 --> 00:02:06,530
so next we're going to take the log of Y which is what we discussed previously.

24
00:02:06,740 --> 00:02:19,770
We see that we get back what looks like a good candidate for linear regression.

25
00:02:19,780 --> 00:02:23,630
Next we are going to do a little bit of pre processing on the data.

26
00:02:23,740 --> 00:02:30,100
Specifically we're going to standardize or normalize X and Y will want to keep these means and variances

27
00:02:30,100 --> 00:02:31,780
around for later use.

28
00:02:31,810 --> 00:02:36,470
So let's call them emacs for the mean of x as X for the standard deviation of X.

29
00:02:36,580 --> 00:02:38,130
And similarly for y.

30
00:02:38,680 --> 00:02:44,610
What we have to keep in mind is that by doing this the units of X and Y no longer make sense.

31
00:02:44,650 --> 00:02:50,740
Previously X had units of years but this is no longer the case and Y had units of log counts which is

32
00:02:50,740 --> 00:02:52,300
no longer the case.

33
00:02:52,450 --> 00:02:57,520
We'll have to reverse the transformation later on if we want to get back to the original units

34
00:03:04,290 --> 00:03:05,190
in the next plot.

35
00:03:05,190 --> 00:03:10,740
We do a scatter plot again which on first glance looks exactly the same as it did before but if you

36
00:03:10,740 --> 00:03:16,680
look closely you'll see that both the x and y axes are centered around zero and the range of values

37
00:03:16,680 --> 00:03:19,500
is now small which is exactly what we want.

38
00:03:27,310 --> 00:03:32,020
Next we're going to do a little more pre processing to convert our data into float 32

39
00:03:37,970 --> 00:03:38,360
next.

40
00:03:38,360 --> 00:03:41,840
It's time to create our linear regression in PI torch model.

41
00:03:41,840 --> 00:03:44,330
Thanks to our rule all data is the same.

42
00:03:44,330 --> 00:03:47,430
This code will be exactly the same as before.

43
00:03:47,480 --> 00:03:53,540
Once we have our model which is just one line of code we're going to create our loss and optimizer as

44
00:03:53,540 --> 00:03:57,880
you can see there's one argument here that might be new to you momentum.

45
00:03:58,040 --> 00:04:02,150
This is one of those variations on gradient descent that I was talking about earlier.

46
00:04:02,750 --> 00:04:15,850
If you want to know more about these you're encouraged to check out the ends up section of the course.

47
00:04:15,870 --> 00:04:27,910
Next we're going to convert our num pi data arrays into torch tensor is using the from then pi function.

48
00:04:27,930 --> 00:04:32,070
Next we're going to run our main training loop as promised.

49
00:04:32,070 --> 00:04:37,480
This is the exact same thing as what we had earlier just with a different number of epochs.

50
00:04:37,500 --> 00:04:42,600
I just want to mention again because a lot of people ask me how do I know what number of epochs are

51
00:04:42,600 --> 00:04:43,860
needed.

52
00:04:43,860 --> 00:04:46,940
Just remember that there is no way around it trial and error.

53
00:04:47,070 --> 00:04:49,080
There was no magic formula.

54
00:04:49,080 --> 00:04:51,930
You just have to keep trying until you're lost per iteration.

55
00:04:51,930 --> 00:04:52,520
Looks right.

56
00:04:53,900 --> 00:04:57,230
You'll be seeing the same loop over and over again throughout the course.

57
00:04:57,230 --> 00:05:00,710
So each time I'm going to spend less and less time explaining it.

58
00:05:12,240 --> 00:05:12,540
All right.

59
00:05:12,570 --> 00:05:14,820
So from the loss values is printing out.

60
00:05:14,970 --> 00:05:23,970
It looks like the loss has converged.

61
00:05:24,180 --> 00:05:27,450
All right so next we're going to plot the loss per iteration as usual.

62
00:05:27,640 --> 00:05:38,580
And from this we can confirm that our loss has converged.

63
00:05:38,590 --> 00:05:41,190
Next we're going to plot our line of best fit.

64
00:05:41,230 --> 00:05:45,520
Again this is the same as before so there's no need to explain this code again.

65
00:05:45,520 --> 00:05:48,580
As you can see our line is indeed a line of best fit

66
00:05:55,210 --> 00:05:55,830
next.

67
00:05:55,840 --> 00:05:58,240
Let's get the trained weights of the model.

68
00:05:58,240 --> 00:05:59,560
You already know how to do this.

69
00:05:59,560 --> 00:06:01,300
It's model weight data.

70
00:06:01,370 --> 00:06:03,420
Number pi as expected.

71
00:06:03,430 --> 00:06:07,290
It's a two dimensional array that holds a single number.

72
00:06:07,300 --> 00:06:10,150
The question is what does this number even mean.

73
00:06:10,150 --> 00:06:12,310
To answer that question we need a little theory

74
00:06:17,210 --> 00:06:20,540
let's start by recalling our model of exponential growth.

75
00:06:20,540 --> 00:06:22,100
We have C equals c not.

76
00:06:22,100 --> 00:06:28,640
Times are to the party as you recall we convert this into a linear equation by taking the log of both

77
00:06:28,640 --> 00:06:29,390
sides.

78
00:06:29,900 --> 00:06:35,460
So now we have log C equals log of C not plus log our times t.

79
00:06:35,480 --> 00:06:38,930
If we rename the variables we can make this look a little more familiar

80
00:06:42,720 --> 00:06:49,320
so let's call the input x the output Y and the slope a good thing to remember is that the slope of this

81
00:06:49,320 --> 00:06:52,920
line is the log of are not r itself.

82
00:06:52,920 --> 00:06:57,140
We can leave the intercept term untouched because it's not related to the rate of growth.

83
00:06:57,480 --> 00:06:59,450
It's just the initial value of the count.

84
00:07:07,300 --> 00:07:11,420
Next we have to take into account that our data is normalized.

85
00:07:11,740 --> 00:07:16,270
As you recall we normalize the data using the equations we see here.

86
00:07:16,270 --> 00:07:19,630
Let's call the transform data ex prime and Y prime.

87
00:07:19,630 --> 00:07:27,130
So why prime is equal to Y minus m y divided by s y and x prime is equal to x minus m x divided by s

88
00:07:27,130 --> 00:07:28,410
x.

89
00:07:28,480 --> 00:07:35,110
Now we can begin to understand our own model our own model is also just align just with different units

90
00:07:35,140 --> 00:07:44,770
and therefore a different slope and intercept so let's call the parameters of this line w and B so the

91
00:07:44,770 --> 00:07:49,840
line that we actually fitted would be y prime equals w times X prime plus b

92
00:07:57,230 --> 00:08:04,360
what we can do next is recover our original model by substituting X crime and Y prime with their expressions

93
00:08:04,370 --> 00:08:07,520
in terms of the original variables x and y.

94
00:08:07,520 --> 00:08:17,750
So now we have Y minus m y divided by S Y is equal to w times x minus em x divided by s x plus B you

95
00:08:17,750 --> 00:08:21,790
should be able to see intuitively that this is still a linear equation.

96
00:08:22,130 --> 00:08:28,580
After doing some algebraic manipulation we can isolate y in terms of all the other variables.

97
00:08:28,700 --> 00:08:34,650
If you don't understand how we got to this step I would strongly recommend trying this on paper by yourself.

98
00:08:35,030 --> 00:08:41,450
Well we can see is that there is one term involving X and three other terms not involving x.

99
00:08:41,450 --> 00:08:47,950
Clearly the three terms not involving X are equal to the intercept log of C not remember.

100
00:08:48,010 --> 00:08:51,680
We don't care about that because that doesn't involve the rate of growth.

101
00:08:51,710 --> 00:08:53,560
Instead what we care about is the slope.

102
00:08:54,380 --> 00:09:00,050
Luckily it has a simple expression it's just w times x y divided by s x

103
00:09:06,220 --> 00:09:12,760
we know that our original line has the form a x plus log of C dot and therefore we can conclude that

104
00:09:13,150 --> 00:09:21,430
if we had fitted a line to our original data the slope would be a equals w times s y divided by as x

105
00:09:27,440 --> 00:09:27,780
all right.

106
00:09:27,810 --> 00:09:30,930
So the next question we want to ask is what is the value of it.

107
00:09:31,740 --> 00:09:39,900
Luckily we know w we know s y and we know s x so we can just plug them in to find a remember that W

108
00:09:39,900 --> 00:09:42,730
is a two dimensional array holding a single value.

109
00:09:42,930 --> 00:09:49,070
So in order to retrieve his value we have to index w twice at rows 0 in column 0.

110
00:09:49,080 --> 00:09:53,480
We see that we get another sort of strange number zero point three four.

111
00:09:53,550 --> 00:09:55,980
It's not clear how this would lead to a rate of growth.

112
00:09:56,070 --> 00:09:59,040
That means the transistor count doubles every two years.

113
00:09:59,160 --> 00:10:01,650
Of course that's because there is still more work to do

114
00:10:07,760 --> 00:10:13,400
so let's go through a quick calculation to review once again our model for exponential growth is the

115
00:10:13,400 --> 00:10:15,890
count c is equal to C not.

116
00:10:15,890 --> 00:10:21,680
Times are to the party C not as the count when T equals zero and r is the rate of growth.

117
00:10:22,460 --> 00:10:26,540
If we take the log of both sides we get an equation that's a linear in t

118
00:10:31,440 --> 00:10:37,950
we can express this in terms of our usual linear regression variables Y equals X plus intercept.

119
00:10:38,160 --> 00:10:45,030
This shows us that Y is equal to log C A is equal a log r x is just the time t and the intercept is

120
00:10:45,030 --> 00:10:52,270
log of C not because a is equal to log R we can just take the exponential of both sides to get that

121
00:10:52,290 --> 00:10:54,680
R is equal to each of the power a.

122
00:10:54,840 --> 00:10:59,460
From here we can plug in the value for the slope that we found to get that are as equal to one point

123
00:10:59,460 --> 00:11:04,460
for 0 7 although as you'll see we won't need to use this value directly

124
00:11:11,450 --> 00:11:12,100
next.

125
00:11:12,120 --> 00:11:17,150
Let's consider the doubling time or in other words how long it takes for C to double.

126
00:11:17,220 --> 00:11:23,400
We can do this by thinking of the original equation as the original time and the original count.

127
00:11:23,580 --> 00:11:30,210
This origin is completely arbitrary and as you'll see very shortly its values don't actually matter.

128
00:11:30,270 --> 00:11:36,990
So in order to double this count the left hand side would become 2 C and on the right hand side the

129
00:11:36,990 --> 00:11:39,160
time would just be t prime.

130
00:11:39,180 --> 00:11:41,330
We call it t prime because we don't know what it is.

131
00:11:41,330 --> 00:11:43,170
Yeah that's what we're trying to solve for

132
00:11:49,700 --> 00:11:56,360
so if we take these two equations and divide one by the other we see that the C disappears the C not

133
00:11:56,420 --> 00:12:03,500
also cancels out and as I promise the origin point doesn't matter we get to is equal to answer the power

134
00:12:03,710 --> 00:12:09,150
t prime minus t t prime minus t is actually the time duration that we care about.

135
00:12:09,320 --> 00:12:16,070
That's how long it takes for the transistor count to double how far from T is t prime we can rearrange

136
00:12:16,070 --> 00:12:24,020
this to get that T prime minus t is equal to log 2 divided by log R which is equal to log 2 over a since

137
00:12:24,020 --> 00:12:24,870
we found a.

138
00:12:24,890 --> 00:12:26,630
This is just a number.

139
00:12:26,930 --> 00:12:32,990
And more importantly it's a constant number one important note is that the defining characteristic of

140
00:12:32,990 --> 00:12:37,670
exponential growth is that the growth rate does not change over time.

141
00:12:37,730 --> 00:12:43,490
In other words the value of t itself doesn't matter the same equation holds no matter what the value

142
00:12:43,490 --> 00:12:44,890
of t is.

143
00:12:44,960 --> 00:12:50,420
This is like saying the duration from t to t prime is always constant and as you can see here it's always

144
00:12:50,420 --> 00:12:51,800
log 2 divided by a

145
00:12:56,430 --> 00:13:02,550
so in the next block of code we can print out this value log 2 divided by a is approximately equal to

146
00:13:02,550 --> 00:13:05,920
2 just as Moore's Law says it should be.

147
00:13:06,000 --> 00:13:08,760
Therefore we've confirmed that Moore's Law is true

148
00:13:15,170 --> 00:13:16,820
in the final block for this notebook.

149
00:13:16,820 --> 00:13:20,030
I have a little exercise for you in this lecture.

150
00:13:20,030 --> 00:13:25,640
One of the most difficult parts was transforming the data using standardization and then reversing that

151
00:13:25,640 --> 00:13:29,560
transformation and you might ask yourself Is that really necessary.

152
00:13:30,140 --> 00:13:36,200
So the exercise for you is to find out what happens if you don't try to normalize the data.

153
00:13:36,200 --> 00:13:41,570
You might assume that since the data falls on a line it should be pretty easy to apply linear regression

154
00:13:41,900 --> 00:13:44,180
as we've done multiple times so far.

155
00:13:44,180 --> 00:13:46,780
So give that a try and I'll see you in the next lecture.