1
00:00:02,100 --> 00:00:04,350
Everyone, and welcome back to this class.

2
00:00:08,310 --> 00:00:14,100
This lecture is designed to answer some common questions I get about how difficult this course is and

3
00:00:14,100 --> 00:00:19,280
where it fits in terms of academia, the professional world and its practical applicability.

4
00:00:20,010 --> 00:00:25,530
So rather than answering this question individually over and over again, it seems like a better idea

5
00:00:25,530 --> 00:00:27,750
to make a lecture about it and answer everybody.

6
00:00:27,750 --> 00:00:33,630
At the same time, a lot of people become confused because this is their first machine learning course

7
00:00:33,630 --> 00:00:36,840
and they just don't know where it fits in the grand scheme of things.

8
00:00:37,320 --> 00:00:41,880
So is this course academic or practical and is it for beginners or experts?

9
00:00:42,120 --> 00:00:43,950
Is it fast paced or slow paced?

10
00:00:49,020 --> 00:00:53,970
I'm going to answer the second question first, is this cause for beginners or experts?

11
00:00:55,130 --> 00:01:01,250
Interestingly, the problem with this question is a problem for machine learning to let me explain why

12
00:01:01,250 --> 00:01:03,170
this question is itself problematic.

13
00:01:03,940 --> 00:01:09,870
One of the challenges in natural language processing is that of ambiguity, when someone says some words,

14
00:01:09,880 --> 00:01:15,220
the problem is those words can mean multiple things and it just so happens that I have some courses

15
00:01:15,220 --> 00:01:15,740
on it up.

16
00:01:15,760 --> 00:01:18,020
So this is a problem I think about often.

17
00:01:18,550 --> 00:01:19,950
So how does that apply here?

18
00:01:20,890 --> 00:01:26,170
Well, when we use the words beginner and expert, we have to ask a beginner from whose perspective

19
00:01:26,320 --> 00:01:31,920
an expert from whose perspective for some of you, you might think that a beginner is a random person.

20
00:01:31,930 --> 00:01:33,700
You take off the street who knows nothing.

21
00:01:34,180 --> 00:01:35,660
But even that is ambiguous.

22
00:01:36,550 --> 00:01:41,110
There are people out there who do not even really know how to use a computer beyond turning it on and

23
00:01:41,110 --> 00:01:42,800
checking their email and browsing the web.

24
00:01:43,510 --> 00:01:48,550
There are people out there who are pros at Microsoft Office so they know how to use a computer and make

25
00:01:48,550 --> 00:01:53,110
a Microsoft Word document or a spreadsheet, but they've never done any programming before.

26
00:01:55,020 --> 00:01:59,520
There are people out there who are very good developers and have lots of programming experience, but

27
00:01:59,550 --> 00:02:01,320
they have no idea what machine learning is.

28
00:02:02,430 --> 00:02:07,470
So all of these people are beginners in some way, but each of them would require a vastly different

29
00:02:07,470 --> 00:02:09,930
amount of work to prepare for this course.

30
00:02:14,910 --> 00:02:21,500
It's the same situation with experts most people consider anyone who knows more than them to be an expert,

31
00:02:22,080 --> 00:02:26,100
but of course, how much you know and how much some other person knows is relative.

32
00:02:26,940 --> 00:02:32,040
If you know how to plug and play into you learn and nothing else, you might seem like an expert to

33
00:02:32,040 --> 00:02:37,620
someone who only knows Excel, but you'll seem like a beginner to someone who's read Bishop's pattern

34
00:02:37,620 --> 00:02:39,270
recognition and machine learning book.

35
00:02:44,330 --> 00:02:49,020
So just to finalize our answer to this question, is this cause for beginners or experts?

36
00:02:49,580 --> 00:02:55,080
This question is misinformed to begin with because beginners and experts are not well-defined terms.

37
00:02:55,850 --> 00:03:01,900
Instead, we have to go back to what is the specific skill set you need before starting this course.

38
00:03:02,420 --> 00:03:06,880
And of course, that's defined by the prerequisites which are listed in the course description.

39
00:03:07,370 --> 00:03:09,050
So precision is the key here.

40
00:03:09,500 --> 00:03:15,200
We want to know what is the precise list of skills that I need before taking this course.

41
00:03:15,710 --> 00:03:18,950
This is much better than ambiguous terms like beginner and expert.

42
00:03:23,860 --> 00:03:28,990
Another matter is that people are going to measure their own skills differently, that's a much tougher

43
00:03:28,990 --> 00:03:29,950
issue to deal with.

44
00:03:30,460 --> 00:03:35,800
Sure, I can say you need to know python coding before starting this course, but you can be sure that

45
00:03:35,800 --> 00:03:41,230
there's a big difference between someone who just learned hello world and for loops versus someone working

46
00:03:41,230 --> 00:03:42,820
at a big financial institution.

47
00:03:47,820 --> 00:03:53,160
Now, even in those big financial institutions, there could be big gaps in performance, even within

48
00:03:53,160 --> 00:03:55,580
your own team, there can be big gaps in performance.

49
00:03:56,130 --> 00:04:01,080
I've seen guys who've been working for 15 years do worse than guys who've been working less than one

50
00:04:01,080 --> 00:04:01,390
year.

51
00:04:02,070 --> 00:04:08,190
So even if I say you need to know python coding, there can be a lot of variation in skill among those

52
00:04:08,190 --> 00:04:09,600
who consider themselves good at it.

53
00:04:10,340 --> 00:04:12,750
A similar situation happens with deep learning.

54
00:04:13,080 --> 00:04:17,670
I can say you need to know how neural networks work, but there's a big difference between someone who

55
00:04:17,670 --> 00:04:22,710
can code them from scratch and has experience using them versus someone who just watched them online

56
00:04:22,710 --> 00:04:24,630
animations for a few hours.

57
00:04:25,230 --> 00:04:27,610
If you had to bet on who knows neural networks better.

58
00:04:27,900 --> 00:04:28,770
Who would you pick?

59
00:04:29,790 --> 00:04:34,740
So it's a really answer this question, we have to turn to the course itself, if I say you need to

60
00:04:34,740 --> 00:04:40,170
know python coding and you claim that you do, but you don't understand some python code in this course,

61
00:04:40,860 --> 00:04:44,760
then simply put, your python coding skills are not yet up to par.

62
00:04:45,160 --> 00:04:51,000
So you have to improve your python coding skills by asking questions on the Q&amp;A and taking the appropriate

63
00:04:51,000 --> 00:04:52,730
steps to cover your gaps.

64
00:04:57,910 --> 00:05:03,940
Quite often I get people who are just too confident in their own abilities when unfortunately the reality

65
00:05:03,940 --> 00:05:09,100
is there is a huge discrepancy between what they think they know and what they actually know.

66
00:05:09,970 --> 00:05:14,020
What's funny is these people sometimes say this course requires PhD math.

67
00:05:14,590 --> 00:05:21,520
I assure you there's absolutely no math in any of my courses that requires a Ph.D. or that only a Ph.D.

68
00:05:21,530 --> 00:05:22,420
can understand.

69
00:05:22,840 --> 00:05:26,110
My courses contain only undergraduate math at most.

70
00:05:26,860 --> 00:05:32,170
Sometimes people even claim to have a Ph.D. and then claim that it's still too hard, which is funny

71
00:05:32,170 --> 00:05:38,560
because that essentially invalidates your entire Ph.D. The entire point of a PhD is to turn you into

72
00:05:38,560 --> 00:05:40,060
an independent researcher.

73
00:05:40,330 --> 00:05:45,040
And so if you're supposed to be at the level of doing independent research, but you can't understand

74
00:05:45,040 --> 00:05:47,830
some undergraduate math, well, that's not very good.

75
00:05:52,930 --> 00:05:58,540
Essentially, what it boils down to is people are really trying to say everything I don't understand

76
00:05:58,540 --> 00:05:59,330
is not important.

77
00:05:59,800 --> 00:06:02,020
Everything I do understand is important.

78
00:06:02,290 --> 00:06:06,250
And therefore, everything you say that I don't understand, I don't need to know.

79
00:06:06,890 --> 00:06:09,010
You can see why this thinking is dangerous.

80
00:06:09,460 --> 00:06:14,470
If you truly believe that everything you do not know is not important, then you'll never learn anything

81
00:06:14,470 --> 00:06:14,800
new.

82
00:06:19,800 --> 00:06:24,330
Now, this brings us to our third question, is this course fast paced or slow paced?

83
00:06:24,900 --> 00:06:26,850
And I think you can see where I'm going with this.

84
00:06:27,360 --> 00:06:29,610
This is yet another example of ambiguity.

85
00:06:30,150 --> 00:06:35,890
Fast pace to whom, whether it's fast paced or not, depends on which person is taking the course.

86
00:06:36,420 --> 00:06:39,800
So if someone is well prepared, then things will be just right.

87
00:06:40,470 --> 00:06:44,550
If someone already knows most of these topics, then it will probably be too slow.

88
00:06:45,480 --> 00:06:50,940
If someone chose not to prepare for the course or they have an inadequate understanding of the prerequisites,

89
00:06:51,270 --> 00:06:52,810
then it's going to be too fast.

90
00:06:53,460 --> 00:06:58,110
So if I expected you to know something and you didn't, then you have to look it up, which is going

91
00:06:58,110 --> 00:07:00,270
to make the course feel like it's going too fast.

92
00:07:05,290 --> 00:07:11,620
So here is the real test, if I say something depends on calculus of probability and you claim to know

93
00:07:11,830 --> 00:07:17,070
calculus and probability, but you don't understand the thing that depends on calculus and probability,

94
00:07:17,380 --> 00:07:23,800
you have to ask yourself, honestly, do I really understand calculus and probability or did the instructor

95
00:07:24,040 --> 00:07:27,700
invent his own kind of calculus and probability that I don't yet understand?

96
00:07:28,180 --> 00:07:29,920
Well, of course, that's not very likely.

97
00:07:30,430 --> 00:07:35,320
It's most likely that you thought you understood calculus and probability, but you really don't.

98
00:07:40,350 --> 00:07:44,040
Now on to our initial question, what kind, of course, is this?

99
00:07:44,070 --> 00:07:45,060
Is it academic?

100
00:07:45,390 --> 00:07:46,650
Is it for professionals?

101
00:07:46,650 --> 00:07:49,340
Is a practical sort of understand this.

102
00:07:49,350 --> 00:07:53,970
You have to understand where machine learning is usually taught in how it is usually taught.

103
00:07:54,980 --> 00:08:00,110
Firstly, machine learning is usually taught in upper year computer science and engineering programs

104
00:08:00,260 --> 00:08:01,850
in university and college.

105
00:08:02,360 --> 00:08:07,370
So these are students who already took calculus one to three linear algebra, differential equations,

106
00:08:07,550 --> 00:08:13,700
probability and statistics, discrete math programming and possibly more all in their first two or three

107
00:08:13,700 --> 00:08:14,580
years of college.

108
00:08:15,140 --> 00:08:19,610
So by the time they get to third and fourth year, they know this stuff and they're ready to apply it.

109
00:08:24,690 --> 00:08:30,420
So a typical academic course might cover a whole library of machine learning algorithms, so you might

110
00:08:30,420 --> 00:08:36,840
learn K nearest neighbor, logistic regression K means clustering PCA and so on, all in the same course.

111
00:08:37,870 --> 00:08:43,090
Typically, you'd cover one algorithm per lecture, which is about two hours overall, you might have

112
00:08:43,090 --> 00:08:45,960
something like 12 or maybe 20 of those lectures in a term.

113
00:08:46,660 --> 00:08:50,920
And by cover, I mean mostly via math, geometry and derivations.

114
00:08:51,430 --> 00:08:55,150
These courses don't typically have you at your computer doing programming work.

115
00:08:55,870 --> 00:09:00,850
If you do end up doing programming, it's usually part of a lab assignment, which is actually just

116
00:09:00,850 --> 00:09:03,830
a very small part of the course compared to the rest.

117
00:09:04,510 --> 00:09:10,030
So it's mostly theoretical dealing with equations, solving equations, reasoning about equations.

118
00:09:10,480 --> 00:09:14,220
Those equations typically involve probabilities and information theory.

119
00:09:14,680 --> 00:09:18,160
So you have two hours to go through the theory for one algorithm.

120
00:09:18,160 --> 00:09:22,860
And then if there's coding involved, which there usually isn't, you do it on your own time.

121
00:09:23,530 --> 00:09:26,170
The lectures themselves do not involve any coding.

122
00:09:31,160 --> 00:09:34,970
But there's an even bigger problem with these academic courses and what is that?

123
00:09:35,870 --> 00:09:38,540
What happens is that the grades turn out to be very low.

124
00:09:41,030 --> 00:09:45,650
Out of 100 percent, the average might be around 25 percent or up to 40 percent.

125
00:09:46,190 --> 00:09:48,460
Now you might think, wow, so everyone fails.

126
00:09:48,860 --> 00:09:53,280
And of course, that's not the case since everyone gets shifted based on the average.

127
00:09:53,870 --> 00:09:55,490
So what is the real problem here?

128
00:09:56,090 --> 00:10:01,040
A lot of people ask me, well, why don't you cover every machine learning algorithm in one big mass,

129
00:10:01,040 --> 00:10:03,230
of course, and just call it machine learning?

130
00:10:04,040 --> 00:10:09,860
Well, the answer is that if you tried to do too much, you only end up understanding 25 percent of

131
00:10:09,860 --> 00:10:12,670
the material, which I think is clearly demonstrated here.

132
00:10:13,460 --> 00:10:15,550
And as they say, the proof is in the pudding.

133
00:10:16,010 --> 00:10:20,780
If you look at these courses where students are trying to digest 10 different algorithms at the same

134
00:10:20,780 --> 00:10:22,410
time, what is the result?

135
00:10:22,970 --> 00:10:24,990
The result is they don't understand any of them.

136
00:10:25,370 --> 00:10:29,150
If you try to understand too many things, you end up understanding nothing.

137
00:10:30,020 --> 00:10:35,780
I take the perspective that if you're going to take a course, you should aim to understand 100 percent

138
00:10:35,780 --> 00:10:36,650
of the material.

139
00:10:37,880 --> 00:10:43,490
Each course is broken up into appropriate levels based on how difficult they are and by topic.

140
00:10:43,940 --> 00:10:50,030
So, for example, you can typically take Calculus One, which focuses on differential calculus, Calculus

141
00:10:50,030 --> 00:10:55,890
two, which focuses on integral calculus and calculus three, which focuses on multivariable calculus.

142
00:10:56,420 --> 00:11:01,340
You would never learn all of these and just one huge course called math, because that's just too much

143
00:11:01,340 --> 00:11:03,410
stuff to understand all at once.

144
00:11:04,040 --> 00:11:10,100
I want you to understand, one hundred percent of the material, even 50 percent or 75 percent, is

145
00:11:10,100 --> 00:11:11,600
not good enough in my opinion.

146
00:11:12,140 --> 00:11:18,470
If you only understand 75 percent of the material and the next course depends on the 25 percent that

147
00:11:18,470 --> 00:11:22,160
you didn't understand, well, then you're not going to make it to the next level.

148
00:11:27,200 --> 00:11:32,360
Now, let's look at the opposite end of the spectrum, what most people might consider to be a practical

149
00:11:32,360 --> 00:11:32,810
course.

150
00:11:33,750 --> 00:11:38,790
I don't consider these to be practical courses, and I'll explain why later, but I use the word practical

151
00:11:38,790 --> 00:11:43,170
here in quotes, since that's what some of the more beginner students will refer to the mess.

152
00:11:43,980 --> 00:11:46,440
So how does a practical course typically proceed?

153
00:11:48,000 --> 00:11:52,950
Well, first, you were never really taught how the algorithms work other than at a very high level.

154
00:11:53,520 --> 00:11:56,890
You might learn how they work by way of analogies and metaphors.

155
00:11:57,330 --> 00:12:00,610
So, for example, suppose you're learning about gradient descent.

156
00:12:01,200 --> 00:12:04,560
Well, the instructor might relate this to a ball rolling down a hill.

157
00:12:05,820 --> 00:12:10,680
Now, some beginner students, they might conclude that I understand a ball rolling down a hill.

158
00:12:10,860 --> 00:12:12,960
Therefore, I understand gradient descent.

159
00:12:13,500 --> 00:12:16,050
But they neglect to understand how to take derivatives.

160
00:12:16,260 --> 00:12:18,690
They neglect to understand how to choose a good learning rate.

161
00:12:19,020 --> 00:12:22,650
And most importantly, they neglect how to implement gradient descent in code.

162
00:12:24,130 --> 00:12:29,230
So it's totally wrong to assume that just because you can visualize a ball rolling down a hill, that

163
00:12:29,230 --> 00:12:31,720
you understand gradient descent, why?

164
00:12:32,840 --> 00:12:36,210
Well, the problem is everybody can imagine a ball rolling down a hill.

165
00:12:36,650 --> 00:12:41,600
Your grandparents, who can't even turn on a computer, can understand a ball rolling down a hill.

166
00:12:41,990 --> 00:12:46,600
So don't equate understanding an analogy to understanding the real topic.

167
00:12:51,760 --> 00:12:57,580
So what's the next thing these practical courses do, the next thing they usually teach you is how to

168
00:12:57,580 --> 00:13:02,420
plug into an API or in other words, make use of a library, someone else wrote.

169
00:13:02,980 --> 00:13:04,570
This is also not ideal.

170
00:13:05,230 --> 00:13:07,830
Typically, this involves just two or three lines of code.

171
00:13:08,410 --> 00:13:13,100
You have to wonder if machine learning just boils down to two or three lines of code.

172
00:13:13,450 --> 00:13:15,200
Why is it such a big deal these days?

173
00:13:15,610 --> 00:13:18,640
Why did it take decades for people to realize its usefulness?

174
00:13:19,210 --> 00:13:23,500
And of course, the answer is that there's much more to machine learning than just these two or three

175
00:13:23,500 --> 00:13:24,220
lines of code.

176
00:13:24,880 --> 00:13:29,140
When you use a deep learning library like Carus, you're not doing deep learning.

177
00:13:29,470 --> 00:13:31,720
You were using a deep learning interface.

178
00:13:31,730 --> 00:13:33,010
You're simply programming.

179
00:13:33,400 --> 00:13:39,610
It's more accurate to say I can program with deep learning libraries rather than saying I actually know

180
00:13:39,610 --> 00:13:40,210
deep learning.

181
00:13:44,860 --> 00:13:47,500
But some of you might say, well, it's all I really want to do.

182
00:13:47,710 --> 00:13:53,070
All I need is an API and for the API to give me an answer, I don't care about how it works.

183
00:13:53,560 --> 00:13:55,460
But then here is where you run into trouble.

184
00:13:56,140 --> 00:13:58,780
Recall that in addition to all data is the same.

185
00:13:59,110 --> 00:14:01,930
We know that all machine learning interfaces are the same.

186
00:14:02,470 --> 00:14:05,520
So the question becomes, which API do you choose?

187
00:14:05,530 --> 00:14:10,420
If they're all the same, how can you make an informed choice about which one to use?

188
00:14:11,110 --> 00:14:15,920
Well, the reality is, if you don't know how these things work, then you can't answer this question.

189
00:14:16,540 --> 00:14:21,840
This is why some of these so-called practical courses can cover 20 algorithms in the same course.

190
00:14:22,120 --> 00:14:27,760
It's because they never actually talk about how the algorithms work since the API is the same every

191
00:14:27,760 --> 00:14:28,130
time.

192
00:14:28,270 --> 00:14:32,620
It's very easy to just do the same thing 20 times and then not bother with the details.

193
00:14:33,310 --> 00:14:37,570
So this gives people a lot of confidence that they were able to do the same thing 20 times.

194
00:14:38,050 --> 00:14:42,760
But it's dangerous thinking because you don't realize you really only learned one thing.

195
00:14:42,760 --> 00:14:44,500
You just repeated it 20 times.

196
00:14:49,500 --> 00:14:52,050
So how about when it comes to the professional world?

197
00:14:52,930 --> 00:14:58,420
Sometimes people tend to believe that in your profession, you'll be leaning more toward the practical

198
00:14:58,420 --> 00:15:02,860
end of the spectrum where you don't really need to know how a machine learning model works.

199
00:15:03,040 --> 00:15:04,570
You just have to know its API.

200
00:15:05,500 --> 00:15:08,750
But what it really comes down to is what your job actually is.

201
00:15:09,190 --> 00:15:13,900
If you're a regular everyday programmer and you just want to see how a machine learning model might

202
00:15:13,900 --> 00:15:17,380
perform on your data, then you're going to use a pre-built library.

203
00:15:17,380 --> 00:15:21,250
Most likely that's just the two or three lines of code I referenced earlier.

204
00:15:22,750 --> 00:15:28,210
However, that does not mean you don't still have to conform to best practices, make sure you're not

205
00:15:28,210 --> 00:15:30,070
mixing your test set with your transfer.

206
00:15:30,460 --> 00:15:33,070
Make sure you're doing proper validation and so on.

207
00:15:33,520 --> 00:15:38,500
And when it comes to tuning your model, then you have to be able to dig down into the details and see

208
00:15:38,500 --> 00:15:42,840
which hyper parameters are possible to change and which are likely to have the most impact.

209
00:15:43,780 --> 00:15:48,270
So only knowing how to use an API is only going to get you past the first stage.

210
00:15:48,640 --> 00:15:52,630
You can plug in your data and at least your code is the right syntax and doesn't crash.

211
00:15:52,960 --> 00:15:56,740
But of course, your code not crashing doesn't necessarily mean it's right.

212
00:16:01,710 --> 00:16:06,660
Back on the other end of the spectrum, you have programmers and scientists whose sole job it is to

213
00:16:06,660 --> 00:16:07,560
do machine learning.

214
00:16:08,070 --> 00:16:09,950
These guys are machine learning experts.

215
00:16:10,350 --> 00:16:14,090
They know the math inside and out and have a full computer science education.

216
00:16:14,970 --> 00:16:20,400
And so to even be able to communicate your ideas effectively, you have to be comfortable with a theory

217
00:16:20,400 --> 00:16:23,430
and not just how to plug into an API with two lines of code.

218
00:16:24,330 --> 00:16:29,910
These scientists are building custom models, really digging down into the math to not only use but

219
00:16:29,910 --> 00:16:33,420
invent new things and to be able to invent new things.

220
00:16:33,600 --> 00:16:35,600
You really need to know the math very well.

221
00:16:36,180 --> 00:16:40,290
And so if you're on a team like this, you can't just know how to write two lines of code.

222
00:16:40,560 --> 00:16:43,050
Your skill set has to be far beyond that.

223
00:16:43,950 --> 00:16:49,290
So your theoretical background has to be very solid if you want to be able to communicate effectively

224
00:16:49,410 --> 00:16:50,610
with other scientists.

225
00:16:55,740 --> 00:16:59,880
Now, a lot of people say, well, I don't ever want to be a scientist, so I don't really care how

226
00:16:59,880 --> 00:17:05,220
these algorithms work, unfortunately, that's a very bad attitude to have in the workplace.

227
00:17:05,220 --> 00:17:11,310
I suppose, for example, I am running a business now, an employee of mine, he wants to show off and

228
00:17:11,310 --> 00:17:13,170
say he's learning about machine learning.

229
00:17:13,650 --> 00:17:18,450
He says, oh, I'm taking this machine learning course online, therefore I will become a machine learning

230
00:17:18,450 --> 00:17:18,980
master.

231
00:17:19,620 --> 00:17:23,050
But his approach to machine learning is to learn as little as possible.

232
00:17:23,430 --> 00:17:25,080
He says he doesn't care about the math.

233
00:17:25,080 --> 00:17:26,610
He only wants the practical stuff.

234
00:17:26,970 --> 00:17:29,000
He only wants to know the bare minimum.

235
00:17:29,640 --> 00:17:31,560
He tells me he doesn't care how it works.

236
00:17:31,560 --> 00:17:36,210
He's just going to plug into some API because that's what his buddy told him he could do.

237
00:17:37,240 --> 00:17:43,420
As a business owner, I really don't want this guy working for me, I run a business that sells something

238
00:17:43,750 --> 00:17:49,810
and the life of the company itself and the livelihoods of the employees all depend on our products being

239
00:17:49,810 --> 00:17:51,430
the best they can possibly be.

240
00:17:51,940 --> 00:17:56,560
So if someone comes to me and says, I don't care about being the best, I just care about doing the

241
00:17:56,560 --> 00:18:00,140
bare minimum, my response is I don't want that guy on my team.

242
00:18:00,460 --> 00:18:01,960
In fact, he should be fired.

243
00:18:02,560 --> 00:18:05,090
So be careful if this is your approach to machine learning.

244
00:18:05,590 --> 00:18:10,720
No business owner wants to hear that your approach to the business is to do the bare minimum and to

245
00:18:10,720 --> 00:18:12,980
put off some math because you think it's too hard.

246
00:18:13,450 --> 00:18:15,610
That's a scary prospect for a business owner.

247
00:18:20,610 --> 00:18:25,050
Now, if we go back to our original question, maybe you've realized we haven't answered it yet.

248
00:18:25,650 --> 00:18:30,800
What kind, of course, is this where you can think of it as a marriage between these approaches?

249
00:18:31,470 --> 00:18:35,330
We want to cover all the math and derivations because that's very important.

250
00:18:35,640 --> 00:18:40,560
If you want to be able to communicate effectively with other scientists, to be able to debug your code

251
00:18:40,560 --> 00:18:43,590
effectively and to be able to invent new approaches.

252
00:18:44,310 --> 00:18:46,620
This is also what we mean by machine learning.

253
00:18:47,100 --> 00:18:51,200
We mean really knowing machine learning and not just programming with an API.

254
00:18:51,990 --> 00:18:54,360
So the theoretical background is very important.

255
00:18:54,810 --> 00:19:00,630
At the same time, we're not going to introduce 20 different algorithms in the same course because unlike

256
00:19:00,630 --> 00:19:06,870
a typical academic course where you can pass just by understanding 25 percent of the material in this

257
00:19:06,870 --> 00:19:12,120
course, it's structured in a way that if you have any missing gaps, it's going to be impossible for

258
00:19:12,120 --> 00:19:13,710
you to move on to the next stage.

259
00:19:15,080 --> 00:19:18,150
But academic courses don't contain much coding, if at all.

260
00:19:18,620 --> 00:19:22,880
So in this course, every algorithm we learn about will be implemented in code.

261
00:19:23,450 --> 00:19:27,090
My motto is, if you can't implement it, then you don't understand it.

262
00:19:27,830 --> 00:19:29,390
You can pretend to understand it.

263
00:19:29,600 --> 00:19:32,260
And when you talk about it, you might sound like you understand it.

264
00:19:32,510 --> 00:19:34,640
But implementation is the true test.

265
00:19:35,060 --> 00:19:36,170
It's like a math exam.

266
00:19:36,170 --> 00:19:41,060
Rather than writing an essay, when you write an essay, you might be able to fool the teacher into

267
00:19:41,060 --> 00:19:42,460
thinking you understand something.

268
00:19:42,830 --> 00:19:46,810
But in a math exam, either your answers are right or your answers are wrong.

269
00:19:47,330 --> 00:19:51,260
In this context, either your code works or doesn't, it's very clear cut.

270
00:19:55,910 --> 00:20:01,940
When it comes to practicality, we cover that, too, we demonstrate our algorithms both on real world

271
00:20:01,940 --> 00:20:07,400
data like text and images, and we demonstrate our algorithms on two dimensional data sets so that you

272
00:20:07,400 --> 00:20:09,440
can visualize what an algorithm is doing.

273
00:20:10,160 --> 00:20:11,450
Both of these are important.

274
00:20:12,320 --> 00:20:15,490
Text and images are quite possibly the most practical data.

275
00:20:15,500 --> 00:20:20,690
There is some billion dollar companies exist and thrive because of those capabilities.

276
00:20:21,470 --> 00:20:26,170
Tuti data sets are critical as well because as humans, that's the only thing we can see.

277
00:20:26,720 --> 00:20:31,490
But at the same time, we don't try to pull the wool over your eyes and pretend like you need 20 different

278
00:20:31,490 --> 00:20:32,790
data sets to practice on.

279
00:20:33,260 --> 00:20:38,270
Instead, we learn something much more valuable how to use an algorithm on an infinite number of data

280
00:20:38,270 --> 00:20:42,620
sets, because we are able to abstract the idea that all data is the same.

281
00:20:43,250 --> 00:20:45,950
Remember that all the computer sees is a list of numbers.

282
00:20:45,950 --> 00:20:49,790
It doesn't know that those numbers represent height measurements or an image pixel.

283
00:20:50,660 --> 00:20:56,070
And of course, by experiencing both ends of the spectrum, both the theory and the implementation.

284
00:20:56,300 --> 00:21:01,370
And then finally plugging data into your model, that's going to make you very well-rounded professionally.

285
00:21:06,580 --> 00:21:11,910
To conclude this lecture, I want to pose a final question, how does all this knowledge help you?

286
00:21:12,640 --> 00:21:17,170
Why does understanding the approach of this course make you better at machine learning in general?

287
00:21:18,100 --> 00:21:22,360
Well, my hope is that this gives you a very high level bird's eye view of the landscape.

288
00:21:22,870 --> 00:21:26,170
You can see the different approaches of people from different backgrounds.

289
00:21:26,860 --> 00:21:30,850
Academic people typically don't see what other non-academic people are doing.

290
00:21:31,120 --> 00:21:35,500
And those people who hate math typically don't see what people who love math are doing.

291
00:21:35,920 --> 00:21:40,840
So with this bird's eye view, you can see everybody if you're learning machine learning, it helps

292
00:21:40,840 --> 00:21:46,540
you understand who am I competing with when I apply for a job and what kinds of jobs are out there and

293
00:21:46,540 --> 00:21:48,190
what would be adequate preparation.

294
00:21:48,730 --> 00:21:53,320
If you're hiring for a machine learning team, it helps you understand the different backgrounds of

295
00:21:53,320 --> 00:21:54,100
individuals.

296
00:21:54,100 --> 00:21:58,630
You have to compare to one another to understand their strengths and weaknesses and what they bring

297
00:21:58,630 --> 00:21:59,220
to the table.
