1
00:00:11,710 --> 00:00:16,300
In this lecture we are going to look at the models we just used more in depth.

2
00:00:16,300 --> 00:00:21,370
Specifically if you know a little bit about machine learning you may recognize that the models we just

3
00:00:21,370 --> 00:00:25,600
used are called linear regression and logistic regression.

4
00:00:25,750 --> 00:00:31,450
We use logistic regression for classification and we use linear regression for regression.

5
00:00:31,450 --> 00:00:38,140
So this lecture will look at how these models work and also explain why we refer to logistic regression

6
00:00:38,230 --> 00:00:39,490
as a neuron.

7
00:00:39,880 --> 00:00:42,400
Just to remind you of the high level picture here.

8
00:00:42,640 --> 00:00:48,720
Deep Learning is the study of neural networks and neural networks are networks of neurons.

9
00:00:48,880 --> 00:00:54,010
So you can think of these as the basic fundamental unit of computation in deep learning

10
00:00:59,220 --> 00:01:04,740
let's start with linear regression in its most basic form linear regression just means line of best

11
00:01:04,740 --> 00:01:05,580
fit.

12
00:01:05,880 --> 00:01:12,330
As you may recall from your high school math studies a line has the equation y hat equals M X plus b

13
00:01:13,620 --> 00:01:18,940
here X is the input variable M is the slope and B is the y intercept.

14
00:01:19,260 --> 00:01:25,980
When we put these together we get the equation for a line our job of course is to find a line that best

15
00:01:25,980 --> 00:01:27,140
fits our input data

16
00:01:32,420 --> 00:01:35,860
to give you a simple example of how we would use linear regression.

17
00:01:35,870 --> 00:01:41,330
Let's suppose we are trying to predict your salary based on how many years of experience you have.

18
00:01:41,330 --> 00:01:47,000
So in this case x would be the number of years of experience and y would be your salary.

19
00:01:47,000 --> 00:01:51,830
Now you might be wondering what is the interpretation of the slope in y intercept.

20
00:01:51,830 --> 00:01:52,810
In this scenario.

21
00:01:53,690 --> 00:01:57,980
Well let's try to play around with this equation a little bit and see what happens.

22
00:01:58,100 --> 00:02:04,030
We know that b the y intercept is the value of y half when x equals zero.

23
00:02:04,040 --> 00:02:08,550
So in other words if you have zero years of experience this would be your salary.

24
00:02:08,570 --> 00:02:17,150
B If X equals 1 that means you have one year of experience and then your salary would be M Plus B if

25
00:02:17,150 --> 00:02:18,270
x equals two.

26
00:02:18,290 --> 00:02:24,170
That means you have two years of experience and then your salary would be two M A plus B.

27
00:02:24,170 --> 00:02:30,040
In other words M is the increase in salary that you get for each additional year of experience.

28
00:02:35,190 --> 00:02:41,340
Of course in the real world we might want to make a prediction about your salary based on multiple factors

29
00:02:42,120 --> 00:02:43,850
instead of just years of experience.

30
00:02:43,860 --> 00:02:48,030
We might have another input let's say average industry salary.

31
00:02:48,120 --> 00:02:52,340
So now your salary can also depend on what industry you are working in.

32
00:02:53,130 --> 00:03:00,140
So we'll call it years of experience x 1 and we'll call average industry salary x 2 in this scenario.

33
00:03:00,140 --> 00:03:11,330
We would write out our model as Y equals w 1 x 1 plus W2 x 2 plus B in this equation w 1 and W2 are

34
00:03:11,330 --> 00:03:16,520
called the weights of the model and they are essentially the slope for each of the individual inputs

35
00:03:16,880 --> 00:03:22,850
X1 and X2 luckily the interpretation is the same as what we saw previously.

36
00:03:24,040 --> 00:03:32,980
W 1 is the increase in salary when X1 increases by 1 W2 is the increase in salary when x 2 increases

37
00:03:32,980 --> 00:03:34,260
by 1.

38
00:03:34,390 --> 00:03:38,020
B is the salary when both X1 and x 2 or 0

39
00:03:43,200 --> 00:03:48,390
another way to think of the weights is that they tell us how important each input is to predicting the

40
00:03:48,390 --> 00:03:49,700
output.

41
00:03:49,710 --> 00:03:54,870
Imagine the extreme scenario where we have some Ws of IE equals zero.

42
00:03:54,870 --> 00:04:00,600
In this case it doesn't matter how much we increase or decrease X survive the output will not change

43
00:04:00,600 --> 00:04:01,780
at all.

44
00:04:01,800 --> 00:04:05,460
So in other words X Abi has no influence on the output.

45
00:04:05,460 --> 00:04:09,140
That's another way of saying X Abi is irrelevant.

46
00:04:09,210 --> 00:04:13,020
Now imagine that w Saibai is very large in magnitude.

47
00:04:13,020 --> 00:04:16,630
Note that it can be either very positive or very negative.

48
00:04:16,650 --> 00:04:20,250
This means that X Abi has a large influence on the output.

49
00:04:21,210 --> 00:04:27,540
Because a small increase in x Abi leads to a large increase in output that's just another way of saying

50
00:04:27,570 --> 00:04:29,820
that this input feature is very important.

51
00:04:31,330 --> 00:04:35,580
Also the sign of W's supply controls the direction of the influence.

52
00:04:35,710 --> 00:04:43,240
If W Saibai is very positive then an increase in supply will lead to a large increase in the output.

53
00:04:43,240 --> 00:04:49,750
If W Saibai is very negative then an increase in Saibai will lead to a large decrease in the output

54
00:04:54,970 --> 00:04:58,420
now in what way is any of this related to neurons.

55
00:04:58,420 --> 00:05:02,930
Well to understand this you have to understand a little bit about biology.

56
00:05:03,160 --> 00:05:05,290
A neuron is a cell in your brain.

57
00:05:05,290 --> 00:05:12,740
You can think of it as we said before as the fundamental unit of computation neurons can talk to other

58
00:05:12,740 --> 00:05:16,180
neurons using electrical and chemical signals.

59
00:05:16,220 --> 00:05:22,610
So if you picture a neuron you can imagine that there are many other incoming neurons attached to it.

60
00:05:22,610 --> 00:05:24,820
These input terminals are called dendrites.

61
00:05:24,830 --> 00:05:31,580
As you can see in this image those incoming neurons might come from say your eyes or your ears or your

62
00:05:31,580 --> 00:05:33,290
nose or your hands.

63
00:05:33,290 --> 00:05:38,990
So if your eyes see something an electrical signal goes along the nerves in your eyes and travels to

64
00:05:38,990 --> 00:05:45,230
your brain the same thing happens when you hear something or you smell something or you touch something.

65
00:05:45,230 --> 00:05:51,830
Now remember we're picturing a single neuron it's taking in inputs from a lot of different places.

66
00:05:51,860 --> 00:05:59,210
Now this neuron it has to decide if it's going to pass this signal on to outgoing neurons because remember

67
00:05:59,210 --> 00:06:04,460
this neurons kind of in the middle there are some neurons going into it and it's going out to some other

68
00:06:04,460 --> 00:06:05,860
neurons.

69
00:06:05,870 --> 00:06:07,800
Now how does it do that.

70
00:06:08,010 --> 00:06:12,450
Well it does that in a very similar way to linear and logistic regression.

71
00:06:12,570 --> 00:06:18,690
It sums up all the incoming signals and then this summation becomes the outgoing signal that gets passed

72
00:06:18,690 --> 00:06:25,330
on further down the chain.

73
00:06:25,350 --> 00:06:29,940
The next thing you have to know is that not all neuron connections are created equal.

74
00:06:30,210 --> 00:06:33,040
Some connections are strong while others are weak.

75
00:06:33,270 --> 00:06:39,210
Some connections excite the receiving neuron while other connections inhibit the receiving neuron.

76
00:06:39,270 --> 00:06:42,010
This is just like the weights of a regression model.

77
00:06:42,090 --> 00:06:46,570
A large wave either positive or negative has a strong influence on the output.

78
00:06:46,620 --> 00:06:48,190
That's a strong connection.

79
00:06:48,360 --> 00:06:54,410
A small or nearly zero way has only a weak influence on the output so that's a weak connection.

80
00:06:54,840 --> 00:06:58,530
A positive weight influences the output in the positive direction.

81
00:06:58,530 --> 00:07:04,510
So that's like an excited tree neuron in a negative way it influences the output in the negative direction.

82
00:07:04,560 --> 00:07:06,270
So that's like an inhibitory neuron

83
00:07:11,430 --> 00:07:14,690
the signal that gets passed along neurons has a special name.

84
00:07:14,730 --> 00:07:16,890
It's called an action potential.

85
00:07:16,890 --> 00:07:20,050
Basically it's a spike in electrical potential.

86
00:07:20,190 --> 00:07:25,620
So if you measure the electrical potential over time at a particular point in the neuron you would see

87
00:07:25,620 --> 00:07:27,410
a signal like this.

88
00:07:27,510 --> 00:07:31,210
Now action potentials don't behave in a very intuitive way.

89
00:07:31,290 --> 00:07:36,180
In particular you can think of them as binary outcomes just like logistic regression.

90
00:07:36,210 --> 00:07:41,700
In other words the neuron is more like logistic regression than linear regression although the linear

91
00:07:41,700 --> 00:07:46,930
regression equation is the main calculation that takes place in both cases.

92
00:07:46,980 --> 00:07:49,800
So here's how the action potential works.

93
00:07:49,800 --> 00:07:55,100
Basically we're going to sum up all the influences from all the incoming neurons.

94
00:07:55,230 --> 00:08:01,290
If the electrical potential of this sum is greater than some threshold then an action potential will

95
00:08:01,290 --> 00:08:08,410
propagate through the receiving neuron if the electrical potential is less than this threshold then

96
00:08:08,410 --> 00:08:10,170
nothing happens at all.

97
00:08:10,180 --> 00:08:15,520
This is very much like binary classification where we make predictions which are either 0 or 1

98
00:08:20,690 --> 00:08:21,460
in biology.

99
00:08:21,470 --> 00:08:27,210
We call this the all or nothing principle either an action potential fires or doesn't.

100
00:08:27,260 --> 00:08:32,210
As you can see here when the stimulus is too weak there's only a little bump in the response.

101
00:08:32,240 --> 00:08:38,420
So those are not action potentials but once the stimulus reaches a certain threshold of full action

102
00:08:38,420 --> 00:08:40,380
potential spike occurs.

103
00:08:40,490 --> 00:08:45,500
So there's no such thing as a spectrum of action potentials like an action potential of one full two

104
00:08:45,500 --> 00:08:47,420
volts three volts and so forth.

105
00:08:47,420 --> 00:08:49,470
Instead it's just a binary decision.

106
00:08:49,580 --> 00:08:51,890
You get a spike or you don't.

107
00:08:51,890 --> 00:08:57,650
The incoming signals some together are strong enough to generate the next action potential or they will

108
00:08:57,650 --> 00:08:58,870
simply be ignored

109
00:09:04,100 --> 00:09:07,710
so that's how you can think of logistic regression as a neuron.

110
00:09:07,760 --> 00:09:11,920
Each X Abi is an input from some incoming neuron.

111
00:09:12,290 --> 00:09:19,700
Each w survive is a weight that tells us how strong the connection with X Abi is in the sign of WC Abi

112
00:09:20,060 --> 00:09:25,580
tells us whether that connection is excite a tree meaning it positively influences an action potential

113
00:09:25,940 --> 00:09:30,350
or inhibitory meaning it negatively influences an action potential.

114
00:09:30,620 --> 00:09:37,010
The weighted sum of each x Abi which is then added to the bias term or threshold is then passed to the

115
00:09:37,010 --> 00:09:38,390
sigmoid function.

116
00:09:38,750 --> 00:09:45,320
Once the sigmoid function is rounded we get either 1 or 0 telling us whether or not an action potential

117
00:09:45,320 --> 00:09:46,010
should occur.
