1
00:00:11,040 --> 00:00:14,860
OK, so in this video, we are going to discuss linear regression.

2
00:00:15,270 --> 00:00:19,330
Now, I'm sure most of you already know at least a little bit about linear regression.

3
00:00:19,350 --> 00:00:23,670
So I'll try to keep this short and only focus on new and noteworthy concepts.

4
00:00:24,270 --> 00:00:28,950
OK, so let's start with the simplest form of linear regression, which is just the line of best fit

5
00:00:29,130 --> 00:00:30,220
on a 2D plane.

6
00:00:31,250 --> 00:00:36,540
As usual, we have a bunch of data points called X, Y and a Y one extra Y two and so forth.

7
00:00:36,990 --> 00:00:41,900
As an example, X might represent years of experience and Y might represent salary.

8
00:00:42,360 --> 00:00:46,700
So hopefully it makes sense how you might collect such data points in the real world.

9
00:00:47,160 --> 00:00:52,650
Perhaps you work for the human resources department and you keep this kind of data inside an Excel spreadsheet

10
00:00:53,100 --> 00:00:54,330
and in fact are real.

11
00:00:54,330 --> 00:01:00,330
World use of this model might be to figure out what salary to offer to new hires based on their past

12
00:01:00,330 --> 00:01:01,530
years of experience.

13
00:01:02,400 --> 00:01:06,690
Such a model would allow you to pay your employees in a fair and unbiased way.

14
00:01:07,200 --> 00:01:11,670
Our job, of course, is to find a line that fits nicely through these data points.

15
00:01:12,780 --> 00:01:16,580
In this scenario, we have two parameters the slope and the intercept.

16
00:01:17,070 --> 00:01:21,690
Thus our job boils down to finding out what the slope and intercept should be.

17
00:01:26,300 --> 00:01:32,870
Note one limitation of this model, which is that it can only fit lines the equation Y equals M, X

18
00:01:32,870 --> 00:01:34,970
plus B must be aligned.

19
00:01:35,420 --> 00:01:40,400
So if you're trying to fit a data set that looks like it curves or it turns at any point, then linear

20
00:01:40,400 --> 00:01:41,850
regression is not a good fit.

21
00:01:42,680 --> 00:01:46,520
Later in this course, we'll learn about machine learning models which are non-linear.

22
00:01:51,240 --> 00:01:56,170
Another way that linear regression becomes more complicated is when you have more than one input.

23
00:01:56,850 --> 00:02:01,980
Perhaps you'd like to predict a student's exam grade using the number of hours they studied and the

24
00:02:01,980 --> 00:02:03,990
number of hours they slept the previous night.

25
00:02:04,590 --> 00:02:06,450
Let's just call these X one and X two.

26
00:02:07,230 --> 00:02:13,020
In this case, we typically call the weights W one and two, and we still have an intercept called B.

27
00:02:14,400 --> 00:02:19,350
Note that in this scenario, the objects we are trying to fit is no longer a line, but a plain.

28
00:02:24,030 --> 00:02:29,310
In fact, it's possible to fit a linear regression model with any number of input dimensions.

29
00:02:29,850 --> 00:02:36,360
Now, of course, if we have a million dimensions, we don't want to write one X1 to X2 and so on a

30
00:02:36,360 --> 00:02:37,380
million times.

31
00:02:37,800 --> 00:02:44,550
Instead, it's more convenient to express all the WS in a single vector called W and all the Xs in a

32
00:02:44,550 --> 00:02:45,960
single vector called X.

33
00:02:47,280 --> 00:02:49,640
As you recall from your high school math studies.

34
00:02:49,800 --> 00:02:53,890
This element Y's product in summation is also known as a DOT product.

35
00:02:54,540 --> 00:02:58,740
Therefore, our model can be written as W, transpose X plus B.

36
00:03:00,020 --> 00:03:05,420
Now, this is such an important expression, in fact, you shouldn't even think of this as MAFF, but

37
00:03:05,420 --> 00:03:08,870
rather you should automatically recognize it as merely a pattern.

38
00:03:09,470 --> 00:03:14,390
You'll see the same pattern when you study logistic regression, when you study neural networks, when

39
00:03:14,390 --> 00:03:16,850
you study support vector machines and so forth.

40
00:03:17,240 --> 00:03:21,260
So basically, do not run away when you see this because it looks like scary math.

41
00:03:21,590 --> 00:03:25,230
Nearly everything important in machine learning is based on this.

42
00:03:25,610 --> 00:03:28,720
So if you want to do machine learning, this is fundamental.

43
00:03:29,270 --> 00:03:31,990
And again, it's not math, it's just a pattern.

44
00:03:32,930 --> 00:03:35,840
When we talk about neurons, you'll understand this more in depth.

45
00:03:35,990 --> 00:03:37,790
So just keep this in mind until then.

46
00:03:42,510 --> 00:03:48,180
So the noteworthy part of this lecture is this hopefully you've realize now that the auto regressive

47
00:03:48,180 --> 00:03:50,510
model is nothing but linear regression.

48
00:03:51,360 --> 00:03:56,820
If we replace the X Factor with the lags of a time series and we replace the output with the current

49
00:03:56,820 --> 00:03:59,640
value of the Time series, we have an auto regression.

50
00:04:04,170 --> 00:04:10,000
In fact, we can immediately extend what we learned and make it more powerful, as you recall, our

51
00:04:10,250 --> 00:04:13,080
models can only predict one step into the future.

52
00:04:13,500 --> 00:04:18,490
But with linear regression in general, there's nothing stopping us from simply adding more outputs.

53
00:04:18,900 --> 00:04:21,860
We can have a vector output along with a vector input.

54
00:04:22,350 --> 00:04:27,090
For example, we can use one way to and y three to predict y foreign y five.

55
00:04:28,380 --> 00:04:33,930
In this case, our weight vector becomes a weight matrix and our byas term becomes a base vector.

56
00:04:34,500 --> 00:04:41,340
Basically, if you have D inputs and outputs, then W will be of shape D by K and B will be A vector

57
00:04:41,340 --> 00:04:42,300
of size K.

58
00:04:43,320 --> 00:04:47,410
Alternatively, you can think of this like having parallel linear regressions.

59
00:04:47,790 --> 00:04:53,520
So if you have K parallel models and each of them has a weight vector of size D, then clearly by putting

60
00:04:53,520 --> 00:05:00,150
them together you'll have something of size D by K or covid, but we usually used by K as convention.