1
00:00:11,140 --> 00:00:17,670
In this lecture, we are going to look at the linear models we use previously more in depth, specifically,

2
00:00:17,710 --> 00:00:23,050
our final goal is that we want to know how linear regression and logistic regression make up the building

3
00:00:23,050 --> 00:00:24,500
blocks of a neural network.

4
00:00:25,120 --> 00:00:30,220
You've seen that we can use logistic regression for classification and we use linear regression for

5
00:00:30,220 --> 00:00:30,910
regression.

6
00:00:31,990 --> 00:00:37,960
So this lecture will review how those models work and also explain why we refer to logistic regression

7
00:00:38,110 --> 00:00:38,970
as a neuron.

8
00:00:40,120 --> 00:00:45,310
And just to remind you of the high level picture here, deep learning is the study of neural networks

9
00:00:45,310 --> 00:00:47,750
and neural networks are networks of neurons.

10
00:00:48,430 --> 00:00:53,650
So you can think of these as the basic fundamental unit of computation in ziplining.

11
00:00:59,130 --> 00:01:04,770
Let's start with linear regression in its most basic form, linear regression just means line of best

12
00:01:04,770 --> 00:01:05,070
fit.

13
00:01:05,760 --> 00:01:10,080
As you may recall from your high school math studies, a line has the equation.

14
00:01:10,260 --> 00:01:12,300
Y hat equals M X plus B.

15
00:01:13,560 --> 00:01:18,540
Here is the input variable, M is the slope, and B is the Y intercept.

16
00:01:19,170 --> 00:01:22,450
When we put these together, we get the equation for a line.

17
00:01:23,370 --> 00:01:27,180
Our job, of course, is to find a line that best fits our input data.

18
00:01:27,870 --> 00:01:32,820
The next lecture will cover exactly how to do that, but in this lecture it will be concerned with the

19
00:01:32,820 --> 00:01:34,280
form of the model itself.

20
00:01:39,470 --> 00:01:44,150
To give you a simple example of how we would use linear regression, let's suppose we are trying to

21
00:01:44,150 --> 00:01:47,840
predict your salary based on how many years of experience you have.

22
00:01:48,350 --> 00:01:53,270
So in this case, X would be the number of years of experience and Y would be your salary.

23
00:01:54,020 --> 00:01:59,900
Now, you might be wondering what is the interpretation of the slope and y intercept in this scenario?

24
00:02:00,710 --> 00:02:04,440
Well, let's try to play around with this equation a little bit and see what happens.

25
00:02:05,150 --> 00:02:10,430
We know that B, the Y intercept is the value of Y hat when X equals zero.

26
00:02:11,090 --> 00:02:15,580
So in other words, if you have zero years of experience, this would be your salary.

27
00:02:15,620 --> 00:02:22,790
B, if X equals one, that means you have one year of experience and then your salary would be M plus

28
00:02:22,790 --> 00:02:29,650
B if X equals two, that means you have two years of experience and then your salary would be two M,

29
00:02:29,650 --> 00:02:30,410
A plus B..

30
00:02:31,190 --> 00:02:37,100
In other words, M is the increase in salary that you get for each additional year of experience.

31
00:02:42,170 --> 00:02:47,690
Of course, in the real world, we might want to make a prediction about your salary based on multiple

32
00:02:47,690 --> 00:02:54,110
factors instead of just years of experience, we might have another input, let's say average industry

33
00:02:54,110 --> 00:02:54,620
salary.

34
00:02:55,160 --> 00:02:59,460
So now your salary can also depend on what industry you are working in.

35
00:03:00,170 --> 00:03:05,540
So we'll call it years of experience one and we'll call average industry salary X two.

36
00:03:06,230 --> 00:03:13,400
In this scenario, we would write out our model as Y equals one X one plus W two X two plus B.

37
00:03:15,200 --> 00:03:21,740
In this equation, one and two are called the weights of the model, and they are essentially the slope

38
00:03:21,740 --> 00:03:25,190
for each of the individual inputs, X1 and x2.

39
00:03:26,320 --> 00:03:29,950
Luckily, the interpretation is the same as what we saw previously.

40
00:03:31,090 --> 00:03:39,550
One is the increase in salary when X one increases by one, two is the increase in salary when X two

41
00:03:39,550 --> 00:03:45,160
increases by one, B is the salary when both X one and two are zero.

42
00:03:50,160 --> 00:03:55,470
Another way to think of the waits is that they tell us how important each input is to predicting the

43
00:03:55,470 --> 00:03:55,980
output.

44
00:03:56,700 --> 00:04:01,210
Imagine the extreme scenario where we have some WSI equals zero.

45
00:04:01,860 --> 00:04:07,710
In this case, it doesn't matter how much we increase or decrease Zevi, the output will not change

46
00:04:07,710 --> 00:04:08,100
at all.

47
00:04:08,850 --> 00:04:12,090
So in other words, Zibi has no influence on the output.

48
00:04:12,450 --> 00:04:15,330
That's another way of saying Zibi is irrelevant.

49
00:04:16,230 --> 00:04:22,470
Now imagine that Sabai is very large in magnitude, not that it can be either very positive or very

50
00:04:22,470 --> 00:04:23,070
negative.

51
00:04:23,700 --> 00:04:27,360
This means that Zibi has a large influence on the output.

52
00:04:28,280 --> 00:04:32,460
Because a small increase in Zibi leads to a large increase in the output.

53
00:04:33,320 --> 00:04:36,940
That's just another way of saying that this input feature is very important.

54
00:04:38,340 --> 00:04:46,200
Also, the sign of WSI controls the direction of the influence if WSI is very positive that an increase

55
00:04:46,200 --> 00:04:53,430
in Zibi will lead to a large increase in the output, if Sehbai is very negative, that an increase

56
00:04:53,430 --> 00:04:56,880
in Zibi will lead to a large decrease in the output.

57
00:05:01,980 --> 00:05:04,950
Now, in what way is any of this related to neurons?

58
00:05:05,430 --> 00:05:09,350
Well, to understand this, you have to understand a little bit about biology.

59
00:05:10,140 --> 00:05:12,120
A neuron is a cell in your brain.

60
00:05:12,330 --> 00:05:19,650
You can think of it, as we said before, as the fundamental unit of computation neurons can talk to

61
00:05:19,650 --> 00:05:22,750
other neurons using electrical and chemical signals.

62
00:05:23,250 --> 00:05:28,860
So if you picture a neuron, you can imagine that there are many other incoming neurons attached to

63
00:05:28,860 --> 00:05:29,030
it.

64
00:05:29,670 --> 00:05:33,360
These input terminals are called dendrites, as you can see in this image.

65
00:05:34,350 --> 00:05:39,880
Those incoming neurons might come from, say, your eyes or your ears or your nose or your hands.

66
00:05:40,350 --> 00:05:45,990
So if your eyes see something, an electrical signal goes along the nerves in your eyes and travels

67
00:05:45,990 --> 00:05:46,590
to your brain.

68
00:05:47,160 --> 00:05:51,330
The same thing happens when you hear something or you smell something or you touch something.

69
00:05:52,230 --> 00:05:54,470
Now, remember, we're picturing a single neuron.

70
00:05:54,480 --> 00:05:57,360
It's taking in inputs from a lot of different places.

71
00:05:58,890 --> 00:06:04,590
Now, this neuron, it has to decide if it's going to pass this signal on to outgoing neurons.

72
00:06:05,650 --> 00:06:10,450
Because remember this neuron's kind of in the middle, there are some neurons going into it and it's

73
00:06:10,450 --> 00:06:12,200
going out to some other neurons.

74
00:06:12,880 --> 00:06:13,900
Now, how does it do that?

75
00:06:15,010 --> 00:06:21,160
Well, it does that in a very similar way to linear and logistic regression, it sums up all the incoming

76
00:06:21,160 --> 00:06:27,310
signals and then this summation becomes the outgoing signal that gets passed on further down the chain.

77
00:06:32,380 --> 00:06:38,050
The next thing you have to know is that not all neuron connections are created equal, some connections

78
00:06:38,050 --> 00:06:39,790
are strong, while others are weak.

79
00:06:40,300 --> 00:06:45,730
Some connections excite the receiving neuron while other connections inhibit the receiving neuron.

80
00:06:46,330 --> 00:06:48,630
This is just like the weights of a regression model.

81
00:06:49,090 --> 00:06:53,500
A large wave, either positive or negative, has a strong influence on the output.

82
00:06:53,620 --> 00:06:54,830
That's a strong connection.

83
00:06:55,340 --> 00:06:59,210
A small or nearly zero weight has only a weak influence on the output.

84
00:06:59,440 --> 00:07:02,680
So that's a weak connection, a positive way.

85
00:07:02,680 --> 00:07:05,170
It influences the output in the positive direction.

86
00:07:05,560 --> 00:07:08,770
So that's like an excitatory neuron in a negative way.

87
00:07:08,770 --> 00:07:11,270
It influences the output in the negative direction.

88
00:07:11,590 --> 00:07:13,390
So that's like an inhibitory neuron.

89
00:07:18,480 --> 00:07:23,400
The signal that gets passed along Neurons has a special name, it's called an action potential.

90
00:07:23,970 --> 00:07:26,490
Basically, it's a spike in electrical potential.

91
00:07:27,210 --> 00:07:32,520
So if you measure the electrical potential over time at a particular point in the neuron, you would

92
00:07:32,520 --> 00:07:33,790
see a signal like this.

93
00:07:34,530 --> 00:07:37,640
Now, action potentials don't behave in a very intuitive way.

94
00:07:38,280 --> 00:07:42,590
In particular, you can think of them as binary outcomes, just like logistic regression.

95
00:07:43,200 --> 00:07:48,780
In other words, the neuron is more like logistic regression than linear regression, although the linear

96
00:07:48,780 --> 00:07:52,960
regression equation is the main calculation that takes place in both cases.

97
00:07:54,030 --> 00:07:56,130
So here's how the action potential works.

98
00:07:56,880 --> 00:08:01,800
Basically, we're going to sum up all the influences from all the incoming neurons.

99
00:08:02,250 --> 00:08:08,400
If the electrical potential of this sum is greater than some threshold than an action, potential will

100
00:08:08,400 --> 00:08:10,770
propagate through the receiving neuron.

101
00:08:12,190 --> 00:08:16,660
If the electrical potential is less than this threshold, then nothing happens at all.

102
00:08:17,230 --> 00:08:22,630
This is very much like binary classification where we make predictions which are either zero or one.

103
00:08:27,670 --> 00:08:33,600
In biology, we call this the all or nothing principle, either an action, potential fires or it doesn't.

104
00:08:34,240 --> 00:08:39,300
As you can see here, when the stimulus is too weak, there's only a little bump in the response.

105
00:08:39,310 --> 00:08:41,160
So those are not action potentials.

106
00:08:41,770 --> 00:08:47,050
But once the stimulus reaches a certain threshold, a full action potential spike occurs.

107
00:08:47,530 --> 00:08:52,600
So there's no such thing as a spectrum of action potentials like in action, potential of one or two

108
00:08:52,600 --> 00:08:54,090
volts, three volts and so forth.

109
00:08:54,430 --> 00:08:56,320
Instead, it's just a binary decision.

110
00:08:56,620 --> 00:08:58,150
You get a spike or you don't.

111
00:08:58,900 --> 00:09:04,630
The incoming signals, some together are strong enough to generate the next action potential or they

112
00:09:04,630 --> 00:09:06,010
will simply be ignored.

113
00:09:11,110 --> 00:09:18,280
So that's how you can think of logistic regression as a neuron each Zibi is an input from some incoming

114
00:09:18,280 --> 00:09:18,780
neuron.

115
00:09:19,330 --> 00:09:24,940
Each WSIB is a weight that tells us how strong the connection with Zibi is.

116
00:09:25,270 --> 00:09:31,660
And the sign of Sabai tells us whether that connection is excitatory, meaning it positively influences

117
00:09:31,660 --> 00:09:36,920
an action potential or inhibitory, meaning it negatively influences an action potential.

118
00:09:37,660 --> 00:09:44,020
The weighted sum of each Zibi, which is then added to the bias term or threshold, is then passed through

119
00:09:44,020 --> 00:09:45,030
the sigmoid function.

120
00:09:45,790 --> 00:09:51,850
Once the sigmoid function is rounded, we get either one or zero telling us whether or not an action

121
00:09:51,850 --> 00:09:53,100
potential should occur.
