1
00:00:02,130 --> 00:00:04,310
Everyone and welcome back to this class.

2
00:00:08,310 --> 00:00:14,070
This lecture is designed to answer some common questions I get about how difficult this course is and

3
00:00:14,070 --> 00:00:19,260
where it fits in terms of academia, the professional world and its practical applicability.

4
00:00:20,040 --> 00:00:25,500
So rather than answering this question individually, over and over again, it seems like a better idea

5
00:00:25,500 --> 00:00:28,620
to make a lecture about it and answer everybody at the same time.

6
00:00:29,490 --> 00:00:33,930
A lot of people will become confused because this is their first machine learning course, and they

7
00:00:33,930 --> 00:00:36,840
just don't know where it fits in the grand scheme of things.

8
00:00:37,350 --> 00:00:39,780
So is this course academic or practical?

9
00:00:40,110 --> 00:00:41,850
And is it for beginners or experts?

10
00:00:42,120 --> 00:00:43,920
Is it fast paced or slow paced?

11
00:00:49,050 --> 00:00:53,910
I'm going to answer the second question first, is this course for beginners or experts?

12
00:00:55,160 --> 00:00:59,450
Interestingly, the problem with this question is a problem for machine learning, too.

13
00:01:00,290 --> 00:01:03,140
Let me explain why this question is itself problematic.

14
00:01:03,910 --> 00:01:09,830
One of the challenges in natural language processing is that of ambiguity when someone says some words.

15
00:01:09,850 --> 00:01:15,190
The problem is those words can mean multiple things, and it just so happens that I have some courses

16
00:01:15,190 --> 00:01:15,760
on an LP.

17
00:01:15,760 --> 00:01:18,010
So this is a problem I think about often.

18
00:01:18,550 --> 00:01:19,960
So how does that apply here?

19
00:01:20,890 --> 00:01:26,110
Well, when we use the words beginner and expert, we have to ask a beginner from whose perspective.

20
00:01:26,350 --> 00:01:31,900
An expert from whose perspective for some of you, you might think that a beginner is a random person.

21
00:01:31,900 --> 00:01:35,650
You take off the street who knows nothing, but even that is ambiguous.

22
00:01:36,550 --> 00:01:41,080
There are people out there who do not even really know how to use a computer beyond turning it on and

23
00:01:41,080 --> 00:01:42,760
checking their email and browsing the web.

24
00:01:43,510 --> 00:01:48,310
There are people out there who are pros at Microsoft Office, so they know how to use a computer and

25
00:01:48,310 --> 00:01:53,080
make a Microsoft Word document or a spreadsheet, but they've never done any programming before.

26
00:01:55,020 --> 00:01:59,520
There are people out there who are very good developers and have lots of programming experience, but

27
00:01:59,520 --> 00:02:01,320
they have no idea what machine learning is.

28
00:02:02,430 --> 00:02:04,740
So all of these people are beginners in some way.

29
00:02:05,250 --> 00:02:09,900
But each of them would require a vastly different amount of work to prepare for this course.

30
00:02:14,940 --> 00:02:17,010
It's the same situation with experts.

31
00:02:17,910 --> 00:02:21,480
Most people consider anyone who knows more than them to be an expert.

32
00:02:22,110 --> 00:02:26,070
But of course, how much you know and how much some other person knows is relative.

33
00:02:26,940 --> 00:02:30,240
If you know how to plug and play into psyche, you learn and nothing else.

34
00:02:30,690 --> 00:02:35,940
You might seem like an expert to someone who only knows excel, but you'll seem like a beginner to someone

35
00:02:35,940 --> 00:02:36,960
who's read bishops.

36
00:02:37,200 --> 00:02:39,240
Pattern recognition and machine learning book.

37
00:02:44,330 --> 00:02:49,010
So just to finalize our answer to this question, is this course for beginners or experts?

38
00:02:49,610 --> 00:02:55,070
This question is misinformed to begin with because beginners and experts are not well defined terms.

39
00:02:55,880 --> 00:03:01,880
Instead, we have to go back to what is the specific skill set you need before starting this course.

40
00:03:02,420 --> 00:03:06,830
And of course, that's defined by the prerequisites which are listed in the course description.

41
00:03:07,370 --> 00:03:09,050
So precision is the key here.

42
00:03:09,470 --> 00:03:15,170
We want to know what is the precise list of skills that I need before taking this course.

43
00:03:15,710 --> 00:03:18,920
This is much better than ambiguous terms like beginner and expert.

44
00:03:23,920 --> 00:03:27,490
Another matter is that people are going to measure their own skills differently.

45
00:03:28,060 --> 00:03:29,920
That's a much tougher issue to deal with.

46
00:03:30,460 --> 00:03:35,770
Sure, I can say you need to know Python coding before starting this course, but you can be sure that

47
00:03:35,770 --> 00:03:41,200
there's a big difference between someone who just learned Helloworld and for loops versus someone working

48
00:03:41,200 --> 00:03:42,790
at a big financial institution.

49
00:03:47,820 --> 00:03:53,130
Now, even in those big financial institutions, there could be big gaps in performance, even within

50
00:03:53,130 --> 00:03:53,860
your own team.

51
00:03:53,880 --> 00:03:55,560
There can be big gaps in performance.

52
00:03:56,160 --> 00:04:01,080
I've seen guys who have been working for 15 years do worse than guys who've been working less than one

53
00:04:01,080 --> 00:04:01,380
year.

54
00:04:02,070 --> 00:04:08,160
So even if I say you need to know python coding, there can be a lot of variation in skill among those

55
00:04:08,160 --> 00:04:09,600
who consider themselves good at it.

56
00:04:10,350 --> 00:04:12,720
A similar situation happens with deep learning.

57
00:04:13,080 --> 00:04:17,670
I can say you need to know how neural networks work, but there's a big difference between someone who

58
00:04:17,670 --> 00:04:22,680
can code them from scratch and has experience using them versus someone who just watched some online

59
00:04:22,680 --> 00:04:24,570
animations for a few hours.

60
00:04:25,230 --> 00:04:28,740
If you had to bet on who knows neural networks better, who would you pick?

61
00:04:29,790 --> 00:04:33,180
So to really answer this question, we have two turns of the course itself.

62
00:04:33,660 --> 00:04:39,180
If I say you need to know Python coding and you claim that you do, but you don't understand some Python

63
00:04:39,180 --> 00:04:44,760
code in this course, then simply put, your Python coding skills are not yet up to par.

64
00:04:45,180 --> 00:04:51,000
So you have to improve your Python coding skills by asking questions on the Q&A and taking the appropriate

65
00:04:51,000 --> 00:04:52,710
steps to cover your gaps.

66
00:04:57,910 --> 00:05:03,400
Quite often, I get people who are just too confident in their own abilities, when unfortunately the

67
00:05:03,400 --> 00:05:09,100
reality is there is a huge discrepancy between what they think they know and what they actually know.

68
00:05:09,940 --> 00:05:13,990
What's funny is these people sometimes say this course requires the IMF.

69
00:05:14,590 --> 00:05:20,980
I assure you, there's absolutely no math in any of my courses that requires a Ph.D. or that only a

70
00:05:21,310 --> 00:05:22,390
Ph.D. can understand.

71
00:05:22,870 --> 00:05:26,080
My courses contain only undergraduate math at most.

72
00:05:26,860 --> 00:05:32,140
Sometimes people even claim to have a Ph.D. and then claim that it's still too hard, which is funny

73
00:05:32,140 --> 00:05:35,060
because that essentially invalidates your entire Ph.D.

74
00:05:35,890 --> 00:05:40,060
The entire point of a Ph.D. is to turn you into an independent researcher.

75
00:05:40,390 --> 00:05:45,010
And so if you're supposed to be at the level of doing independent research, but you can't understand

76
00:05:45,010 --> 00:05:47,800
some undergraduate math, well, that's not very good.

77
00:05:52,960 --> 00:05:58,510
Essentially, what it boils down to is people are really trying to say everything I don't understand

78
00:05:58,510 --> 00:05:59,320
is not important.

79
00:05:59,800 --> 00:06:05,500
Everything I do understand is important and therefore everything you say that I don't understand, I

80
00:06:05,500 --> 00:06:06,220
don't need to know.

81
00:06:06,910 --> 00:06:08,980
You can see why this thinking is dangerous.

82
00:06:09,490 --> 00:06:14,440
If you truly believe that everything you do not know is not important, then you'll never learn anything

83
00:06:14,440 --> 00:06:14,800
new.

84
00:06:19,830 --> 00:06:24,300
Now this brings us to our third question, is this course fast paced or slow paced?

85
00:06:24,930 --> 00:06:26,850
And I think you can see where I'm going with this.

86
00:06:27,360 --> 00:06:33,990
This is yet another example of ambiguity fast pace to whom, whether it's fast pace or not, depends

87
00:06:33,990 --> 00:06:35,880
on which person is taking the course.

88
00:06:36,450 --> 00:06:39,780
So if someone is well prepared, then things will be just right.

89
00:06:40,500 --> 00:06:44,520
If someone already knows most of these topics, then it will probably be too slow.

90
00:06:45,540 --> 00:06:50,910
If someone chose not to prepare for the course or they have an inadequate understanding of the prerequisites,

91
00:06:51,240 --> 00:06:52,800
then it's going to be too fast.

92
00:06:53,490 --> 00:06:58,080
So if I expected you to know something and you didn't, then you have to look it up, which is going

93
00:06:58,080 --> 00:07:00,240
to make the course feel like it's going too fast.

94
00:07:05,320 --> 00:07:06,760
So here is the real test.

95
00:07:07,360 --> 00:07:13,120
If I say something depends on calculus of probability and you claim to know calculus and probability,

96
00:07:13,510 --> 00:07:18,460
but you don't understand the thing that depends on calculus and probability, you have to ask yourself,

97
00:07:18,460 --> 00:07:21,970
honestly, do I really understand calculus and probability?

98
00:07:22,450 --> 00:07:27,700
Or did the instructor invent his own kind of calculus and probability that I don't yet understand?

99
00:07:28,180 --> 00:07:29,920
Well, of course, that's not very likely.

100
00:07:30,430 --> 00:07:35,320
It's most likely that you thought you understood calculus and probability, but you really don't.

101
00:07:40,350 --> 00:07:43,950
Now onto our initial question, what kind, of course, is this?

102
00:07:44,040 --> 00:07:45,060
Is it academic?

103
00:07:45,390 --> 00:07:46,610
Is it for professionals?

104
00:07:46,620 --> 00:07:47,520
Is it practical?

105
00:07:48,000 --> 00:07:53,310
So to understand this, you have to understand where machine learning is usually taught and how it is

106
00:07:53,310 --> 00:07:53,940
usually taught.

107
00:07:55,010 --> 00:08:00,050
Firstly, machine learning is usually taught in upper year computer science and engineering programs

108
00:08:00,260 --> 00:08:01,820
in university and college.

109
00:08:02,390 --> 00:08:07,340
So these are students who already took calculus one to three linear algebra, differential equations,

110
00:08:07,550 --> 00:08:13,700
probability and statistics, discrete math programming and possibly more all in their first two or three

111
00:08:13,700 --> 00:08:14,570
years of college.

112
00:08:15,170 --> 00:08:19,460
So by the time they get to third and fourth year, they know all this stuff and they're ready to apply

113
00:08:19,460 --> 00:08:19,580
it.

114
00:08:24,720 --> 00:08:30,390
So a typical academic course might cover a whole library of machine learning algorithms, so you might

115
00:08:30,390 --> 00:08:36,809
learn closest neighbor logistic regression K means clustering PCA and so on, all in the same course.

116
00:08:37,840 --> 00:08:42,559
Typically, you'd cover one algorithm per lecture, which is about two hours overall.

117
00:08:42,580 --> 00:08:48,190
You might have something like 12 or maybe 20 of those lectures in a term and by cover, I mean, mostly

118
00:08:48,190 --> 00:08:50,890
via math, geometry and derivations.

119
00:08:51,430 --> 00:08:55,090
These courses don't typically have you at your computer doing programming work.

120
00:08:55,900 --> 00:09:00,820
If you do end up doing programming, it's usually part of a lab assignment, which is actually just

121
00:09:00,820 --> 00:09:03,820
a very small part of the course compared to the rest.

122
00:09:04,510 --> 00:09:10,000
So it's mostly theoretical dealing with equations, solving equations, reasoning about equations.

123
00:09:10,450 --> 00:09:14,200
Those equations typically involve probabilities and information theory.

124
00:09:14,710 --> 00:09:18,130
So you have two hours to go through the theory for one algorithm.

125
00:09:18,130 --> 00:09:22,870
And then if there's coding involved, which there usually isn't, you do it on your own time.

126
00:09:23,560 --> 00:09:26,110
The lectures themselves do not involve any coding.

127
00:09:31,160 --> 00:09:34,100
But there's an even bigger problem with these academic courses.

128
00:09:34,240 --> 00:09:34,940
And what is that?

129
00:09:35,870 --> 00:09:38,510
What happens is that the grades turn out to be very low.

130
00:09:41,060 --> 00:09:45,620
Out of 100 percent, the average might be around 25 percent or up to 40 percent.

131
00:09:46,220 --> 00:09:48,440
Now you might think, wow, so everyone fails.

132
00:09:48,890 --> 00:09:53,270
And of course, that's not the case since everyone gets shifted based on the average.

133
00:09:53,900 --> 00:09:55,490
So what is the real problem here?

134
00:09:56,150 --> 00:10:01,040
A lot of people ask me, Well, why don't you cover every machine learning algorithm in one big mass,

135
00:10:01,040 --> 00:10:03,230
of course, and just call it machine learning?

136
00:10:04,010 --> 00:10:09,860
Well, the answer is that if you tried to do too much, you only end up understanding 25 percent of

137
00:10:09,860 --> 00:10:12,650
the material, which I think is clearly demonstrated here.

138
00:10:13,520 --> 00:10:15,500
And as they say, the proof is in the pudding.

139
00:10:16,010 --> 00:10:20,780
If you look at these courses where students are trying to digest 10 different algorithms at the same

140
00:10:20,780 --> 00:10:22,400
time, what is the result?

141
00:10:22,940 --> 00:10:24,980
The result is they don't understand any of them.

142
00:10:25,400 --> 00:10:29,120
If you try to understand too many things, you end up understanding nothing.

143
00:10:30,050 --> 00:10:35,750
I take the perspective that if you're going to take a course, you should aim to understand 100 percent

144
00:10:35,750 --> 00:10:36,620
of the material.

145
00:10:37,880 --> 00:10:43,460
Each course is broken up into appropriate levels based on how difficult they are and by topic.

146
00:10:43,940 --> 00:10:50,030
So for example, you can typically take Calculus one, which focuses on differential calculus calculus

147
00:10:50,030 --> 00:10:55,880
two, which focuses on integral calculus and calculus three, which focuses on multivariable calculus.

148
00:10:56,420 --> 00:11:01,340
You would never learn all of these in just one huge course called math, because that's just too much

149
00:11:01,340 --> 00:11:03,410
stuff to understand all at once.

150
00:11:04,040 --> 00:11:10,790
I want you to understand 100 percent of the material, even 50 percent or 75 percent is not good enough,

151
00:11:10,790 --> 00:11:15,140
in my opinion, if you only understand 75 percent of the material.

152
00:11:15,380 --> 00:11:19,460
And the next course depends on the 25 percent that you didn't understand.

153
00:11:19,760 --> 00:11:22,160
Well, then you're not going to make it to the next level.

154
00:11:27,200 --> 00:11:29,630
Now, let's look at the opposite end of the spectrum.

155
00:11:30,020 --> 00:11:32,780
What most people might consider to be a practical course.

156
00:11:33,780 --> 00:11:37,110
I don't consider these to be practical courses, and I'll explain why later.

157
00:11:37,470 --> 00:11:42,270
But I use the word practical here in quotes, since that's what some of the more beginner students will

158
00:11:42,270 --> 00:11:43,170
refer to the mess.

159
00:11:43,980 --> 00:11:46,380
So how does a practical course typically proceed?

160
00:11:47,970 --> 00:11:52,950
Well, first, you are never really taught how the algorithms work other than at a very high level.

161
00:11:53,520 --> 00:11:56,880
You might learn how they work, but we have analogies and metaphors.

162
00:11:57,360 --> 00:12:00,600
So, for example, suppose you're learning about gradient descent.

163
00:12:01,170 --> 00:12:04,560
Well, the instructor might relate this to a ball rolling down a hill.

164
00:12:05,850 --> 00:12:10,680
Now, some beginner students, they might conclude that I understand a ball rolling down a hill.

165
00:12:10,860 --> 00:12:16,020
Therefore, I understand gradient descent, but they neglect to understand how to take derivatives.

166
00:12:16,260 --> 00:12:18,630
They neglect to understand how to choose a good learning rate.

167
00:12:19,020 --> 00:12:22,620
And most importantly, they neglect how to implement gradient descent in code.

168
00:12:24,160 --> 00:12:29,200
So it's totally wrong to assume that just because you can visualize a ball rolling down a hill that

169
00:12:29,200 --> 00:12:31,690
you understand gradient descent, why?

170
00:12:32,840 --> 00:12:38,060
Well, the problem is, everybody can imagine a ball rolling down a hill, your grandparents who can't

171
00:12:38,060 --> 00:12:41,600
even turn on a computer, can't understand a ball rolling down a hill.

172
00:12:41,990 --> 00:12:46,550
So don't equate understanding an analogy to understanding the real topic.

173
00:12:51,790 --> 00:12:57,550
So what's the next thing these practical courses do, the next thing they usually teach you is how to

174
00:12:57,550 --> 00:13:02,380
plug into an API or, in other words, make use of a library, someone else wrote.

175
00:13:02,980 --> 00:13:04,570
This is also not ideal.

176
00:13:05,200 --> 00:13:07,810
Typically, this involves just two or three lines of code.

177
00:13:08,410 --> 00:13:13,060
You have to wonder if machine learning just boils down to two or three lines of code.

178
00:13:13,420 --> 00:13:15,190
Why is it such a big deal these days?

179
00:13:15,580 --> 00:13:18,610
Why did it take decades for people to realize its usefulness?

180
00:13:19,210 --> 00:13:23,470
And of course, the answer is that there's much more to machine learning than just these two or three

181
00:13:23,470 --> 00:13:24,190
lines of code.

182
00:13:24,880 --> 00:13:29,110
When you use a deep learning library like Keros, you're not doing deep learning.

183
00:13:29,470 --> 00:13:31,600
You were using a deep learning interface.

184
00:13:31,750 --> 00:13:33,010
You're simply programming.

185
00:13:33,430 --> 00:13:39,580
It's more accurate to say I can program with deep learning libraries rather than saying I actually know

186
00:13:39,580 --> 00:13:40,180
deep learning.

187
00:13:44,860 --> 00:13:47,500
But some of you might say, well, that's all I really want to do.

188
00:13:47,710 --> 00:13:51,100
All I need is an API and for the API to give me an answer.

189
00:13:51,520 --> 00:13:53,050
I don't care about how it works.

190
00:13:53,560 --> 00:13:55,450
But then here is where you run into trouble.

191
00:13:56,140 --> 00:13:58,750
Recall that in addition to all data is the same.

192
00:13:59,080 --> 00:14:01,930
We know that all machine learning interfaces are the same.

193
00:14:02,470 --> 00:14:06,580
So the question becomes Which API do you choose if they're all the same?

194
00:14:07,330 --> 00:14:10,420
How can you make an informed choice about which one to use?

195
00:14:11,080 --> 00:14:15,910
Well, the reality is, if you don't know how these things work, then you can't answer this question.

196
00:14:16,540 --> 00:14:21,820
This is why some of these so-called practical courses can cover 20 algorithms in the same course.

197
00:14:22,180 --> 00:14:25,390
It's because they never actually talk about how the algorithms work.

198
00:14:26,080 --> 00:14:28,120
Since the API is the same every time.

199
00:14:28,300 --> 00:14:32,560
It's very easy to just do the same thing 20 times and then not bother with the details.

200
00:14:33,310 --> 00:14:37,510
So this gives people a lot of confidence that they were able to do the same thing 20 times.

201
00:14:38,050 --> 00:14:42,740
But it's dangerous thinking because you don't realize you really only learned one thing.

202
00:14:42,760 --> 00:14:44,470
You just repeated it 20 times.

203
00:14:49,530 --> 00:14:51,990
So how about when it comes to the professional world?

204
00:14:52,930 --> 00:14:58,390
Sometimes people tend to believe that in your profession, you'll be leaning more toward the practical

205
00:14:58,390 --> 00:15:02,800
end of the spectrum, where you don't really need to know how a machine learning model works.

206
00:15:03,040 --> 00:15:04,570
You just have to know its API.

207
00:15:05,500 --> 00:15:08,710
But what it really comes down to is what your job actually is.

208
00:15:09,220 --> 00:15:13,900
If you're a regular everyday programmer and you just want to see how a machine learning model might

209
00:15:13,900 --> 00:15:17,350
perform on your data, then you're going to use a pre-built library.

210
00:15:17,350 --> 00:15:18,070
Most likely.

211
00:15:18,490 --> 00:15:21,250
That's just the two or three lines of code I referenced earlier.

212
00:15:22,780 --> 00:15:28,180
However, that does not mean you don't still have to conform to best practices, make sure you're not

213
00:15:28,180 --> 00:15:30,040
mixing your test set with your train set.

214
00:15:30,460 --> 00:15:33,040
Make sure you're doing proper validation and so on.

215
00:15:33,550 --> 00:15:38,260
And when it comes to tuning your model, then you'll have to be able to dig down into the details and

216
00:15:38,260 --> 00:15:42,790
see which hyper parameters are possible to change and which are likely to have the most impact.

217
00:15:43,780 --> 00:15:48,250
So only knowing how to use an API is only going to get you past the first stage.

218
00:15:48,640 --> 00:15:52,600
You can plug in your data and at least your code is the right syntax and doesn't crash.

219
00:15:52,960 --> 00:15:56,710
But of course, your code not crashing doesn't necessarily mean it's right.

220
00:16:01,680 --> 00:16:06,630
Back on the other end of the spectrum, you have programmers and scientists whose sole job it is to

221
00:16:06,630 --> 00:16:07,530
do machine learning.

222
00:16:08,040 --> 00:16:09,930
These guys are machine learning experts.

223
00:16:10,350 --> 00:16:14,070
They know the math inside and out and have a full computer science education.

224
00:16:15,000 --> 00:16:20,370
And so to even be able to communicate your ideas effectively, you have to be comfortable with the theory

225
00:16:20,370 --> 00:16:23,400
and not just how to plug into an API with two lines of code.

226
00:16:24,300 --> 00:16:29,880
These scientists are building custom models, really digging down into the math to not only use but

227
00:16:29,880 --> 00:16:33,360
invent new things and to be able to invent new things.

228
00:16:33,600 --> 00:16:35,610
You really need to know the math very well.

229
00:16:36,240 --> 00:16:40,260
And so if you're on a team like this, you can't just know how to write two lines of code.

230
00:16:40,560 --> 00:16:43,020
Your skill set has to be far beyond that.

231
00:16:43,980 --> 00:16:49,260
So your theoretical background has to be very solid if you want to be able to communicate effectively

232
00:16:49,410 --> 00:16:50,580
with other scientists.

233
00:16:55,770 --> 00:16:59,880
Now, a lot of people say, well, I don't ever want to be a scientist, so I don't really care how

234
00:16:59,880 --> 00:17:00,900
these algorithms work.

235
00:17:01,560 --> 00:17:05,220
Unfortunately, that's a very bad attitude to have in the workplace.

236
00:17:05,849 --> 00:17:10,150
Suppose, for example, I am running a business now, an employee of mine.

237
00:17:10,170 --> 00:17:13,140
He wants to show off and say he's learning about machine learning.

238
00:17:13,650 --> 00:17:16,319
He says, Oh, I'm taking this machine learning course online.

239
00:17:16,589 --> 00:17:18,960
Therefore, I'll become a machine learning master.

240
00:17:19,619 --> 00:17:23,040
But his approach to machine learning is to learn as little as possible.

241
00:17:23,430 --> 00:17:25,050
He says he doesn't care about the math.

242
00:17:25,079 --> 00:17:26,579
He only wants the practical stuff.

243
00:17:26,940 --> 00:17:28,980
He only wants to know the bare minimum.

244
00:17:29,640 --> 00:17:31,550
He tells me he doesn't care how it works.

245
00:17:31,560 --> 00:17:36,200
He's just going to plug into some API because that's what his buddy told him he could do.

246
00:17:37,270 --> 00:17:40,810
As a business owner, I really don't want this guy working for me.

247
00:17:41,440 --> 00:17:47,140
I run a business that sells something and the life of the company itself and the livelihoods of the

248
00:17:47,140 --> 00:17:51,430
employees all depend on our products being the best they can possibly be.

249
00:17:51,970 --> 00:17:55,210
So if someone comes to me and says I don't care about being the best.

250
00:17:55,510 --> 00:17:57,280
I just care about doing the bare minimum.

251
00:17:57,730 --> 00:18:00,160
My response is I don't want that guy on my team.

252
00:18:00,490 --> 00:18:01,900
In fact, he should be fired.

253
00:18:02,560 --> 00:18:05,080
So be careful if this is your approach to machine learning.

254
00:18:05,620 --> 00:18:10,720
No business owner wants to hear that your approach to the business is to do the bare minimum and to

255
00:18:10,720 --> 00:18:12,940
put off the math because you think it's too hard.

256
00:18:13,450 --> 00:18:15,590
That's a scary prospect for a business owner.

257
00:18:20,640 --> 00:18:25,020
Now, if we go back to our original question, maybe you've realized we have an answer to yet.

258
00:18:25,650 --> 00:18:27,060
What kind, of course, is this?

259
00:18:27,720 --> 00:18:30,780
Well, you can think of it as a marriage between these approaches.

260
00:18:31,470 --> 00:18:35,310
We want to cover all the math and derivations because that's very important.

261
00:18:35,670 --> 00:18:40,530
If you want to be able to communicate effectively with other scientists to be able to debug your code

262
00:18:40,530 --> 00:18:46,590
effectively and to be able to invent new approaches, this is also what we mean by machine learning.

263
00:18:47,100 --> 00:18:51,180
We mean really knowing machine learning and not just programming with an API.

264
00:18:51,990 --> 00:18:54,330
So the theoretical background is very important.

265
00:18:54,810 --> 00:19:00,630
At the same time, we're not going to introduce 20 different algorithms in the same course because unlike

266
00:19:00,630 --> 00:19:06,840
a typical academic course where you can pass just by understanding 25 percent of the material in this

267
00:19:06,840 --> 00:19:12,120
course, it's structured in a way that if you have any missing gaps, it's going to be impossible for

268
00:19:12,120 --> 00:19:13,680
you to move on to the next stage.

269
00:19:15,080 --> 00:19:18,140
But academic courses don't contain much coding, if at all.

270
00:19:18,620 --> 00:19:22,880
So in this course, every algorithm we learn about will be implemented in code.

271
00:19:23,480 --> 00:19:27,080
My motto is if you can't implement it, then you don't understand it.

272
00:19:27,830 --> 00:19:29,360
You can pretend to understand it.

273
00:19:29,630 --> 00:19:32,240
And when you talk about it, you might sound like you understand it.

274
00:19:32,510 --> 00:19:34,580
But implementation is the true test.

275
00:19:35,060 --> 00:19:36,170
It's like a math exam.

276
00:19:36,170 --> 00:19:41,360
Rather than writing an essay when you write an essay, you might be able to fool the teacher into thinking

277
00:19:41,360 --> 00:19:42,440
you understand something.

278
00:19:42,830 --> 00:19:46,790
But in a math exam, either your answers are right or your answers are wrong.

279
00:19:47,330 --> 00:19:50,000
In this context, either your code works or it doesn't.

280
00:19:50,180 --> 00:19:51,230
It's a very clear cut.

281
00:19:55,910 --> 00:19:58,490
When it comes to practicality, we cover that, too.

282
00:19:59,030 --> 00:20:03,350
We demonstrate our algorithms both on real world data like text and images.

283
00:20:03,620 --> 00:20:08,930
And we demonstrate our algorithms on two dimensional data sets so that you can visualize what an algorithm

284
00:20:08,930 --> 00:20:09,410
is doing.

285
00:20:10,190 --> 00:20:11,420
Both of these are important.

286
00:20:12,290 --> 00:20:16,040
Texts and images are quite possibly the most practical data there is.

287
00:20:16,460 --> 00:20:20,660
Some billion dollar companies exist and thrive because of those capabilities.

288
00:20:21,470 --> 00:20:26,180
2D data sets are critical as well because as humans, that's the only thing we can see.

289
00:20:26,720 --> 00:20:31,460
But at the same time, we don't try to pull the wool over your eyes and pretend like you need 20 different

290
00:20:31,460 --> 00:20:32,750
data sets to practice on.

291
00:20:33,290 --> 00:20:35,460
Instead, we learn something much more valuable.

292
00:20:35,810 --> 00:20:41,000
How to use an algorithm on an infinite number of data sets because we are able to abstract the idea

293
00:20:41,000 --> 00:20:42,620
that all data is the same.

294
00:20:43,280 --> 00:20:45,910
Remember that all the computer sees is a list of numbers.

295
00:20:45,920 --> 00:20:51,290
It doesn't know that those numbers represent height measurements or an image pixel, and of course,

296
00:20:51,290 --> 00:20:56,060
by experiencing both ends of the spectrum, both the theory and the implementation.

297
00:20:56,330 --> 00:21:01,340
And then finally, plugging data into your model that's going to make you very well-rounded professionally.

298
00:21:06,580 --> 00:21:11,950
To conclude this lecture, I want to pose a final question, how does all this knowledge help you?

299
00:21:12,610 --> 00:21:17,170
Why does understanding the approach of this course make you better at machine learning in general?

300
00:21:18,070 --> 00:21:22,360
Well, my hope is that this gives you a very high level bird's eye view of the landscape.

301
00:21:22,900 --> 00:21:26,140
You can see the different approaches of people from different backgrounds.

302
00:21:26,920 --> 00:21:32,050
Academic people typically don't see what other non-academic people are doing, and those people who

303
00:21:32,050 --> 00:21:35,500
hate math typically don't see what people who love math are doing.

304
00:21:35,950 --> 00:21:38,440
So with this bird's eye view, you can see everybody.

305
00:21:39,130 --> 00:21:43,840
If you're learning machine learning, it helps you understand who am I competing with when I apply for

306
00:21:43,840 --> 00:21:44,380
a job?

307
00:21:44,770 --> 00:21:46,300
And what kinds of jobs are out there?

308
00:21:46,300 --> 00:21:48,190
And what would be adequate preparation?

309
00:21:48,760 --> 00:21:53,290
If you're hiring for a machine learning team, it helps you understand the different backgrounds of

310
00:21:53,290 --> 00:21:54,070
individuals.

311
00:21:54,070 --> 00:21:58,720
You have to compare to one another to understand the strengths and weaknesses and what they bring to

312
00:21:58,720 --> 00:21:59,200
the table.

