1
00:00:11,610 --> 00:00:15,570
In this lecture, we're going to look at the models we just used more in-depth.

2
00:00:16,230 --> 00:00:21,150
Specifically, if you know a little bit about machine learning, you may recognize that the models we

3
00:00:21,150 --> 00:00:24,990
just used are called linear regression and logistic regression.

4
00:00:25,680 --> 00:00:28,350
We use logistic regression for classification.

5
00:00:28,560 --> 00:00:30,690
And we use linear regression for regression.

6
00:00:31,380 --> 00:00:38,130
So this lecture will look at how these models work and also explain why we refer to logistic regression

7
00:00:38,160 --> 00:00:38,880
as a neuron.

8
00:00:39,840 --> 00:00:45,720
Just to remind you of the high level picture here, deep learning is the study of neural networks and

9
00:00:45,720 --> 00:00:48,330
neural networks are networks of neurons.

10
00:00:48,840 --> 00:00:54,000
So you can think of these as the basic fundamental unit of computation in deep learning.

11
00:00:59,130 --> 00:01:04,739
Let's start with linear regression in its most basic form, linear regression just means line of best

12
00:01:04,739 --> 00:01:05,040
fit.

13
00:01:05,790 --> 00:01:11,910
As you may recall from your high school math studies, a line has the equation Y hat equals m x plus

14
00:01:11,910 --> 00:01:12,300
b.

15
00:01:13,560 --> 00:01:20,910
Here is the input variable M is the slope, and B is the Y intercepts when we put these together, we

16
00:01:20,910 --> 00:01:22,440
get the equation for a line.

17
00:01:23,400 --> 00:01:27,150
Our job, of course, is to find a line that best fits our input data.

18
00:01:27,870 --> 00:01:30,390
The next lecture will cover exactly how to do that.

19
00:01:30,660 --> 00:01:34,230
But in this lecture, it will be concerned with the form of the model itself.

20
00:01:39,440 --> 00:01:42,560
To give you a simple example of how we would use linear regression.

21
00:01:42,920 --> 00:01:47,810
Let's suppose we are trying to predict your salary based on how many years of experience you have.

22
00:01:48,380 --> 00:01:53,270
So in this case, X would be the number of years of experience and Y would be your salary.

23
00:01:54,020 --> 00:01:59,900
Now you might be wondering what is the interpretation of the slope and y intercept in this scenario?

24
00:02:00,710 --> 00:02:04,430
Well, let's try to play around with this equation a little bit and see what happens.

25
00:02:05,120 --> 00:02:10,430
We know that B the Y intercept is the value of Y hat when X equals zero.

26
00:02:11,120 --> 00:02:15,590
So in other words, if you have zero years of experience, this would be your salary.

27
00:02:15,620 --> 00:02:22,730
B, If X equals one, that means you have one year of experience and then your salary would be M plus

28
00:02:22,790 --> 00:02:25,250
b if X equals two.

29
00:02:25,310 --> 00:02:30,410
That means you have two years of experience and then your salary would be two m plus B.

30
00:02:31,190 --> 00:02:37,070
In other words, M is the increase in salary that you get for each additional year of experience.

31
00:02:42,230 --> 00:02:47,690
Of course, in the real world, we might want to make a prediction about your salary based on multiple

32
00:02:47,690 --> 00:02:54,080
factors instead of just years of experience, we might have another input, let's say average industry

33
00:02:54,080 --> 00:02:54,620
salary.

34
00:02:55,190 --> 00:02:59,450
So now your salary can also depend on what industry you are working in.

35
00:03:00,170 --> 00:03:05,510
So we'll call it years of experience x one and we'll call average industry salary x two.

36
00:03:06,260 --> 00:03:13,400
In this scenario, we would write out our model as Y equals one x one plus w two x two plus B.

37
00:03:15,260 --> 00:03:21,710
In this equation, W1 and W2 are called the weights of the model, and they are essentially the slope

38
00:03:21,710 --> 00:03:24,770
for each of the individual inputs X, X1 and X.

39
00:03:26,320 --> 00:03:29,920
Luckily, the interpretation is the same as what we saw previously.

40
00:03:31,140 --> 00:03:40,090
W1 is the increase in salary when X1 increases by one W2 is the increase in salary when X two increases

41
00:03:40,090 --> 00:03:40,660
by one.

42
00:03:41,440 --> 00:03:45,130
B is the salary when both X1 and X two zero.

43
00:03:50,190 --> 00:03:55,980
Another way to think of the Waits is that they tell us how important each input is to predicting output.

44
00:03:56,730 --> 00:04:01,230
Imagine the extreme scenario where we have some use of AI equals zero.

45
00:04:01,860 --> 00:04:07,710
In this case, it doesn't matter how much we increase or decrease exabyte, the output will not change

46
00:04:07,710 --> 00:04:08,130
at all.

47
00:04:08,850 --> 00:04:12,060
So in other words, SBY has no influence on the output.

48
00:04:12,510 --> 00:04:15,330
That's another way of saying exabyte is irrelevant.

49
00:04:16,230 --> 00:04:19,500
Now, imagine that use of AI is very large in magnitude.

50
00:04:20,040 --> 00:04:23,040
Note that it can be either very positive or very negative.

51
00:04:23,670 --> 00:04:27,320
This means that exabyte has a large influence on the output.

52
00:04:28,280 --> 00:04:32,450
Because a small increase in ZABI leads to a large increase in the output.

53
00:04:33,320 --> 00:04:36,920
That's just another way of saying that this input feature is very important.

54
00:04:38,340 --> 00:04:45,630
Also, the sign of Saibai controls the direction of the influence, if WCB is very positive that an

55
00:04:45,630 --> 00:04:49,560
increase in exabyte will lead to a large increase in the output.

56
00:04:50,250 --> 00:04:56,820
If you submit is very negative, then an increase in SBY will lead to a large decrease in the output.

57
00:05:01,980 --> 00:05:04,950
Now, in what way is any of this related to neurons?

58
00:05:05,430 --> 00:05:09,360
Well, to understand this, you have to understand a little bit about biology.

59
00:05:10,170 --> 00:05:12,090
A neuron is a cell in your brain.

60
00:05:12,330 --> 00:05:17,160
You can think of it, as we said before, as the fundamental unit of computation.

61
00:05:18,090 --> 00:05:22,740
Neurons can talk to other neurons using electrical and chemical signals.

62
00:05:23,280 --> 00:05:28,830
So if you picture a neuron, you can imagine that there are many other incoming neurons attached to

63
00:05:28,830 --> 00:05:29,010
it.

64
00:05:29,670 --> 00:05:33,330
These input terminals are called dendrites, as you can see in this image.

65
00:05:34,350 --> 00:05:40,560
Those incoming neurons might come from, say, your eyes or your ears or your nose or your hands, so

66
00:05:40,560 --> 00:05:46,050
if your eyes see something, an electrical signal goes along the nerves in your eyes and travels to

67
00:05:46,050 --> 00:05:46,560
your brain.

68
00:05:47,190 --> 00:05:51,330
The same thing happens when you hear something or you smell something or you touch something.

69
00:05:52,230 --> 00:05:54,450
Now, remember, we're picturing a single neuron.

70
00:05:54,480 --> 00:05:57,330
It's taking in inputs from a lot of different places.

71
00:05:58,890 --> 00:06:04,560
Now, this neuron, it has to decide if it's going to pass this signal on to outgoing neurons.

72
00:06:05,680 --> 00:06:10,270
Because remember this neurons kind of in the middle, there are some neurons going into it and it's

73
00:06:10,420 --> 00:06:12,160
going out to some other neurons.

74
00:06:12,910 --> 00:06:13,870
Now how does it do that?

75
00:06:15,010 --> 00:06:19,030
Well, it does that in a very similar way to linear and logistic regression.

76
00:06:19,600 --> 00:06:21,790
It sums up all the incoming signals.

77
00:06:22,060 --> 00:06:27,310
And then this summation becomes the outgoing signal that gets passed on further down the chain.

78
00:06:32,380 --> 00:06:36,700
The next thing you have to know is that not all neuron connections are created equal.

79
00:06:37,270 --> 00:06:39,730
Some connections are strong, while others are weak.

80
00:06:40,300 --> 00:06:45,700
Some connections excite the receiving neuron, while other connections inhibit the receiving neuron.

81
00:06:46,330 --> 00:06:48,610
This is just like the weights of a regression model.

82
00:06:49,120 --> 00:06:53,500
A large wave, either positive or negative, has a strong influence on the output.

83
00:06:53,650 --> 00:06:54,850
That's a strong connection.

84
00:06:55,390 --> 00:06:59,200
A small or nearly zero way has only a weak influence on the output.

85
00:06:59,470 --> 00:07:05,170
So that's a weak connection in a positive weight influences the output in the positive direction.

86
00:07:05,590 --> 00:07:08,770
So that's like an excitatory neuron in a negative way.

87
00:07:08,770 --> 00:07:11,260
It influences the output in the negative direction.

88
00:07:11,590 --> 00:07:13,390
So that's like an inhibitory neuron.

89
00:07:18,480 --> 00:07:21,540
The signal that gets passed along neurons has a special name.

90
00:07:21,750 --> 00:07:23,400
It's called an action potential.

91
00:07:23,940 --> 00:07:26,460
Basically, it's a spike in electrical potential.

92
00:07:27,240 --> 00:07:32,520
So if you measure the electrical potential over time at a particular point in the neuron, you would

93
00:07:32,520 --> 00:07:33,780
see a signal like this.

94
00:07:34,530 --> 00:07:37,620
Now, action potentials don't behave in a very intuitive way.

95
00:07:38,310 --> 00:07:42,510
In particular, you can think of them as binary outcomes, just like logistic regression.

96
00:07:43,230 --> 00:07:48,780
In other words, the neuron is more like logistic regression than linear regression, although the linear

97
00:07:48,780 --> 00:07:52,950
regression equation is the main calculation that takes place in both cases.

98
00:07:54,030 --> 00:07:56,130
So here's how the action potential works.

99
00:07:56,850 --> 00:08:01,770
Basically, we're going to sum up all the influences from all the incoming neurons.

100
00:08:02,280 --> 00:08:08,400
If the electrical potential of this sum is greater than some threshold than an action potential will

101
00:08:08,400 --> 00:08:10,650
propagate through the receiving neuron.

102
00:08:12,230 --> 00:08:16,670
If the electrical potential is less than this threshold, then nothing happens at all.

103
00:08:17,240 --> 00:08:22,610
This is very much like binary classification, where we make predictions which are either zero or one.

104
00:08:27,700 --> 00:08:33,580
In biology, we call this the all or nothing principle, either an action potential fires or it doesn't.

105
00:08:34,270 --> 00:08:39,260
As you can see here, when the stimulus is too weak, there's only a little bump in the response.

106
00:08:39,280 --> 00:08:41,140
So those are not action potentials.

107
00:08:41,770 --> 00:08:46,990
But once the stimulus reaches a certain threshold, a full action potential spike occurs.

108
00:08:47,560 --> 00:08:52,570
So there's no such thing as a spectrum of action potentials, like an action potential of one volt two

109
00:08:52,570 --> 00:08:54,070
volts, three volts and so forth.

110
00:08:54,460 --> 00:08:56,290
Instead, it's just a binary decision.

111
00:08:56,620 --> 00:09:00,100
You get a spike or you don't the incoming signals.

112
00:09:00,100 --> 00:09:06,010
Some together are strong enough to generate the next action potential, or they will simply be ignored.

113
00:09:11,140 --> 00:09:18,280
So that's how you can think of logistic regression as a neuron, each Zoabi is an input from some incoming

114
00:09:18,280 --> 00:09:18,640
neuron.

115
00:09:19,330 --> 00:09:26,800
Each WCB is a weight that tells us how strong the connection with exabyte is and the sign of W. Saibai

116
00:09:27,100 --> 00:09:32,680
tells us whether that connection is excitatory, meaning it positively influences an action potential

117
00:09:32,980 --> 00:09:36,940
or inhibitory, meaning it negatively influences an action potential.

118
00:09:37,660 --> 00:09:43,900
The weighted sum of each x ABI, which is then added to the bias term or threshold, is then passed

119
00:09:43,900 --> 00:09:45,040
through the sigmoid function.

120
00:09:45,790 --> 00:09:51,820
Once the sigmoid function is rounded, we get either one or zero telling us whether or not an action

121
00:09:51,820 --> 00:09:53,080
potential should occur.

