1
00:00:11,570 --> 00:00:16,730
In the next couple of lectures we are going to see how easy it is to apply what we just learned to a

2
00:00:16,730 --> 00:00:18,490
real world data set.

3
00:00:18,590 --> 00:00:24,320
In fact this real world data set is one of the most significant datasets of all time.

4
00:00:24,320 --> 00:00:26,610
This lecture is about Moore's Law.

5
00:00:26,750 --> 00:00:32,570
As you recall Moore's Law is the observation that the number of transistors per square inch on integrated

6
00:00:32,570 --> 00:00:36,070
circuits doubles approximately every two years.

7
00:00:36,080 --> 00:00:40,350
In other words roughly speaking computing power grows exponentially.

8
00:00:40,370 --> 00:00:45,650
Some futurists such as Ray Kurzweil have observed that this pattern applies to technology as a whole

9
00:00:45,920 --> 00:00:51,470
so that even when Moore's law runs out for transistors perhaps some new invention will appear to take

10
00:00:51,470 --> 00:00:52,630
its place.

11
00:00:52,640 --> 00:01:01,720
Maybe that will be quantum computing A.I. or something we can't yet imagine.

12
00:01:01,830 --> 00:01:05,540
Now you might be thinking wait a minute this isn't linear at all.

13
00:01:05,640 --> 00:01:11,520
If something grows by a constant factor over a given period of time that's actually exponential growth.

14
00:01:11,550 --> 00:01:17,880
For example it will grow from 1 to 2 4 8 16 32 64 and so on

15
00:01:23,000 --> 00:01:27,880
we can write the generic equation for exponential growth as C equals c not.

16
00:01:27,980 --> 00:01:30,410
Times are to the power t here.

17
00:01:30,410 --> 00:01:35,300
C is the output variable and that's the thing we're counting C not as the initial value.

18
00:01:35,300 --> 00:01:36,870
When C equals zero.

19
00:01:37,190 --> 00:01:43,170
T is the input variable which in our case represents time and finally R is the rate of growth.

20
00:01:43,280 --> 00:01:48,580
You can see that if t increases by 1 then C increases by a factor of R.

21
00:01:48,800 --> 00:01:51,590
This holds true for all time periods.

22
00:01:51,650 --> 00:01:57,460
So if I met T equals 1 going to t equals 2 I can find the next value simply by multiplying by R.

23
00:01:57,560 --> 00:02:00,800
And similarly for going from T equals to the T equals 3

24
00:02:05,930 --> 00:02:11,380
luckily by taking the log of the count variable we can turn this into a linear equation.

25
00:02:11,450 --> 00:02:17,450
Basically we take the log of both sides and we show that the log of C is a linear equation with respect

26
00:02:17,450 --> 00:02:18,900
to the time t.

27
00:02:18,980 --> 00:02:20,910
This is exactly what we want.

28
00:02:20,930 --> 00:02:26,390
So if we use a linear regression on the log of transistor counts with respect to time we should find

29
00:02:26,390 --> 00:02:30,530
that the rate at which transistor counts double is approximately two years

30
00:02:35,770 --> 00:02:37,580
if you can't see how this works right away.

31
00:02:37,600 --> 00:02:42,760
It might help to substitute the variables in the previous equation with some more familiar variables

32
00:02:43,240 --> 00:02:45,250
so let's replace log C with y.

33
00:02:45,250 --> 00:02:46,930
This will be the output variable.

34
00:02:47,080 --> 00:02:49,490
Let's replace log R with a slope.

35
00:02:50,050 --> 00:02:51,680
Let's replace T with X.

36
00:02:51,700 --> 00:02:53,440
This will be the input variable.

37
00:02:53,440 --> 00:02:57,710
And finally let's replace log of C not with B the intercept.

38
00:02:57,730 --> 00:03:02,940
Now we just have Y equals X plus B which is our linear equation from earlier.

39
00:03:02,950 --> 00:03:07,210
Note that I'm using the letter A instead of AMA for the slope which is perfectly okay

40
00:03:12,350 --> 00:03:13,050
at this point.

41
00:03:13,070 --> 00:03:16,840
The problem becomes exactly the same as our previous problem.

42
00:03:17,030 --> 00:03:22,700
As long as we can transform our data center into a quote unquote Excel spreadsheet which has the log

43
00:03:22,700 --> 00:03:28,550
of transistor counts in the y column and the year in the X column then our code for building training

44
00:03:28,550 --> 00:03:33,660
and predicting what the PI torch model should be exactly the same as it was before.

45
00:03:33,680 --> 00:03:37,570
This is what we mean by all data is the same as you will see.

46
00:03:37,580 --> 00:03:43,280
There is no difference in the model between this example with transistor counts and the last example

47
00:03:43,280 --> 00:03:48,710
with synthetic data and you might wonder why do I say this so often.

48
00:03:48,840 --> 00:03:51,750
And really I just go by how often I get asked.

49
00:03:51,750 --> 00:03:52,700
So it's really you.

50
00:03:52,710 --> 00:03:55,180
The students who guide what I say.

51
00:03:55,410 --> 00:03:59,790
If I find that a lot of students have a problem with something then I will emphasize that point more

52
00:03:59,790 --> 00:04:00,840
often.

53
00:04:00,840 --> 00:04:06,120
So despite the fact that I repeat this constantly students will still sometimes ask me can you do an

54
00:04:06,120 --> 00:04:08,150
example on such and such dataset.

55
00:04:08,310 --> 00:04:12,110
And my response is as always the code is exactly the same.

56
00:04:12,150 --> 00:04:18,090
No changes are needed although because this message is pretty effective I think it definitely has helped

57
00:04:18,090 --> 00:04:25,740
a lot of students understand the idea a lot better.

58
00:04:25,760 --> 00:04:29,640
There is one caveat to this which you will see in the following lecture.

59
00:04:29,840 --> 00:04:35,270
In order to use linear regression or any other kind of model in this course effectively you should make

60
00:04:35,270 --> 00:04:40,050
sure your data is either normalized or in a small range of values and centered around zero.

61
00:04:40,160 --> 00:04:44,080
Or at least near zero in order to do this.

62
00:04:44,090 --> 00:04:48,360
We're going to subtract the mean and divide by the standard deviation.

63
00:04:48,380 --> 00:04:54,240
This makes it so that the sample has means zero and variance one please note that in this course we

64
00:04:54,240 --> 00:05:00,300
will use the terms normalize and standardize pretty much interchangeably standardized is a term that

65
00:05:00,300 --> 00:05:05,850
is used more in statistics and it refers specifically to the case where you take a sample and transform

66
00:05:05,850 --> 00:05:09,020
it so that it has mean zero and a variance one.

67
00:05:09,030 --> 00:05:14,700
This is opposed to the term normalized which could refer to standardization but as a more general term

68
00:05:14,700 --> 00:05:21,170
overall for example it could mean transforming your data so that it stays in the range 0 to 1.

69
00:05:21,180 --> 00:05:25,820
This is not the same as having means zero and variance one on the other hand.

70
00:05:25,830 --> 00:05:31,080
In cases like batch normalization which you'll learn about later in this course it does refer to the

71
00:05:31,080 --> 00:05:37,940
standardization operation.

72
00:05:37,950 --> 00:05:40,710
The real question is Why do I mentioned this as a caveat.

73
00:05:41,310 --> 00:05:45,250
Here's the problem we're doing two transformations on our data.

74
00:05:45,390 --> 00:05:50,880
This is going to make it more difficult to prove that transistor counts double every two years.

75
00:05:50,940 --> 00:05:57,090
So here's a quick question for you given that the equation for exponential growth is C equals c not.

76
00:05:57,090 --> 00:05:58,920
Times are to the party.

77
00:05:58,920 --> 00:06:04,000
What does r have to be in order for the transistor count to double every two years.

78
00:06:04,020 --> 00:06:10,140
If T is in units of years I would strongly recommend you try to figure this out for yourself before

79
00:06:10,140 --> 00:06:12,520
we move on to the next lecture.

80
00:06:12,540 --> 00:06:17,500
The next question you want to consider is if we take the log of both sides of this equation.

81
00:06:17,610 --> 00:06:21,870
How does that value of r relate to the slope of the line.

82
00:06:21,870 --> 00:06:28,350
And finally if we normalize the data in both the x and y axes how does that affect the slope and intercept

83
00:06:28,350 --> 00:06:29,580
of the line.

84
00:06:29,790 --> 00:06:34,800
We'll go through these calculations in the next lecture in detail but the main point is we are not going

85
00:06:34,800 --> 00:06:37,950
to get the answer directly just by fitting a line to the data.

86
00:06:38,770 --> 00:06:44,320
For example just because the transistor count doubles every two years this does not mean the slope of

87
00:06:44,320 --> 00:06:46,270
the line is going to be 2.

88
00:06:46,420 --> 00:06:52,780
It's going to be related to the number two somehow but to find out exactly how you have to do the math.
