1
00:00:02,250 --> 00:00:04,380
Everyone and welcome back to this class

2
00:00:08,360 --> 00:00:14,090
this lecture is designed to answer some common questions I get about how difficult this course is and

3
00:00:14,090 --> 00:00:20,090
where it fits in terms of academia the professional world and its practical applicability.

4
00:00:20,090 --> 00:00:25,520
So rather than answering this question individually over and over again it seems like a better idea

5
00:00:25,520 --> 00:00:31,550
to make a lecture about it and answer everybody at the same time a lot of people become confused because

6
00:00:31,550 --> 00:00:36,380
this is their first machine learning course and they just don't know where it fits in the grand scheme

7
00:00:36,380 --> 00:00:37,390
of things.

8
00:00:37,400 --> 00:00:42,190
So is this course academic or practical and is it for beginners or experts.

9
00:00:42,200 --> 00:00:49,100
Is it fast paced or slow paced.

10
00:00:49,140 --> 00:00:51,930
I'm going to answer the second question first.

11
00:00:51,930 --> 00:00:56,250
Is this course for beginners or experts interestingly.

12
00:00:56,280 --> 00:01:00,340
The problem with this question is a problem for machine learning too.

13
00:01:00,360 --> 00:01:03,970
Let me explain why this question is itself problematic.

14
00:01:04,020 --> 00:01:09,900
One of the challenges in natural language processing is that of ambiguity when someone says some words.

15
00:01:09,900 --> 00:01:15,240
The problem is those words can mean multiple things and it just so happens that I have some courses

16
00:01:15,240 --> 00:01:15,780
on it.

17
00:01:15,810 --> 00:01:18,590
So this is a problem I think about often.

18
00:01:18,630 --> 00:01:23,220
So how does that apply here well when we use the words beginner an expert.

19
00:01:23,360 --> 00:01:29,720
We have to ask a beginner from whose perspective an expert from whose perspective for some of you you

20
00:01:29,720 --> 00:01:34,000
might think that a beginner is a random person you take off the street who knows nothing.

21
00:01:34,190 --> 00:01:36,560
But even that is ambiguous.

22
00:01:36,620 --> 00:01:39,900
There are people out there who do not even really know how to use a computer.

23
00:01:39,920 --> 00:01:44,600
Beyond turning it on and checking their email and browsing the web there are people out there who are

24
00:01:44,600 --> 00:01:50,300
pros at Microsoft Office so they know how to use a computer and make a Microsoft Word document or a

25
00:01:50,300 --> 00:01:50,870
spreadsheet.

26
00:01:51,170 --> 00:01:53,340
But they've never done any programming before.

27
00:01:55,100 --> 00:01:59,540
There are people out there who are very good developers and have lots of programming experience but

28
00:01:59,540 --> 00:02:05,630
they have no idea what machine learning is so all of these people are beginners in some way but each

29
00:02:05,630 --> 00:02:14,960
of them would require a vastly different amount of work to prepare for this course.

30
00:02:15,040 --> 00:02:17,740
It's the same situation with experts.

31
00:02:17,980 --> 00:02:23,590
Most people consider anyone who knows more than them to be an expert but of course how much you know

32
00:02:23,860 --> 00:02:26,760
and how much some other person knows is relative.

33
00:02:27,040 --> 00:02:30,580
If you know how to plug and play into psyche you learn and nothing else.

34
00:02:30,760 --> 00:02:35,980
You might seem like an expert to someone who only knows excel but you'll seem like a beginner to someone

35
00:02:35,980 --> 00:02:39,280
who's read Bishop's pattern recognition and machine learning book

36
00:02:44,410 --> 00:02:49,380
so just to finalize our answer to this question is this course for beginners or experts.

37
00:02:49,660 --> 00:02:55,980
This question is misinformed to begin with because beginners and experts are not well-defined terms.

38
00:02:55,990 --> 00:03:02,450
Instead we have to go back to what is the specific skill set you need before starting this course.

39
00:03:02,530 --> 00:03:07,450
And of course that's defined by the prerequisites which are listed in the course description.

40
00:03:07,450 --> 00:03:09,510
So precision is the key here.

41
00:03:09,580 --> 00:03:15,740
We want to know what is the precise list of skills that I need before taking this course.

42
00:03:15,760 --> 00:03:18,910
This is much better than ambiguous terms like beginner and expert

43
00:03:24,020 --> 00:03:28,080
another matter is that people are going to measure their own skills differently.

44
00:03:28,130 --> 00:03:30,530
That's a much tougher issue to deal with.

45
00:03:30,530 --> 00:03:35,980
Sure I can say you need to know Python coding before starting this course but you can be sure that there's

46
00:03:35,990 --> 00:03:41,330
a big difference between someone who just learned hello world and for loops versus someone working at

47
00:03:41,330 --> 00:03:42,800
a big financial institution

48
00:03:47,920 --> 00:03:50,250
now even in those big financial institutions.

49
00:03:50,260 --> 00:03:53,920
There could be big gaps in performance even within your own team.

50
00:03:53,950 --> 00:03:56,190
There can be big gaps in performance.

51
00:03:56,260 --> 00:04:01,090
I've seen guys who've been working for 15 years do worse than guys who've been working less than one

52
00:04:01,090 --> 00:04:02,140
year.

53
00:04:02,140 --> 00:04:08,170
So even if I say you need to know Python coding there can be a lot of variation in skill among those

54
00:04:08,170 --> 00:04:10,420
who consider themselves good at it.

55
00:04:10,420 --> 00:04:12,980
A similar situation happens with deep learning.

56
00:04:13,180 --> 00:04:17,680
I can say you need to know how neural networks work but there's a big difference between someone who

57
00:04:17,680 --> 00:04:23,440
can code them from scratch and has experience using them versus someone who just watch them online animations

58
00:04:23,680 --> 00:04:25,260
for a few hours.

59
00:04:25,330 --> 00:04:27,870
If you had to bet on who knows neuron that works better.

60
00:04:27,970 --> 00:04:33,650
Who would you pick so it's a really answer this question we have to turns off the course itself.

61
00:04:33,770 --> 00:04:39,200
If I say you need to know Python coding and you claim that you do but you don't understand some Python

62
00:04:39,200 --> 00:04:45,070
code in this course then simply put your python coding skills are not yet up to par.

63
00:04:45,230 --> 00:04:51,020
So you have to improve your python coding skills by asking questions on the Q and A and taking the appropriate

64
00:04:51,020 --> 00:04:52,760
steps to cover your gaps

65
00:04:57,940 --> 00:04:59,030
quite often.

66
00:04:59,050 --> 00:05:04,690
I get people who are just too confident in their own abilities when unfortunately the reality is there

67
00:05:04,690 --> 00:05:09,930
is a huge discrepancy between what they think they know and what they actually know.

68
00:05:10,030 --> 00:05:16,510
What's funny is these people sometimes say this course requires the math I assure you there is absolutely

69
00:05:16,510 --> 00:05:18,660
no math in any of my courses.

70
00:05:18,730 --> 00:05:25,600
That requires a PHC or that only AP HD can understand my courses contain only undergraduate math at

71
00:05:25,600 --> 00:05:26,920
most.

72
00:05:26,920 --> 00:05:31,510
Sometimes people even claim to have a PHC and then claim that it's still too hard.

73
00:05:31,570 --> 00:05:38,110
Which is funny because that essentially invalidates your entire HD the entire point of AP HD is to turn

74
00:05:38,110 --> 00:05:40,440
you into an independent researcher.

75
00:05:40,480 --> 00:05:45,040
And so if you're supposed to be at the level of doing independent research but you can't understand

76
00:05:45,040 --> 00:05:47,830
some undergraduate math well that's not very good

77
00:05:53,030 --> 00:05:58,700
essentially what it boils down to is people are really trying to say everything I don't understand is

78
00:05:58,700 --> 00:05:59,770
not important.

79
00:05:59,930 --> 00:06:05,720
Everything I do understand is important and therefore everything you say that I don't understand I don't

80
00:06:05,720 --> 00:06:06,840
need to know.

81
00:06:06,980 --> 00:06:09,560
You can see why this thinking is dangerous.

82
00:06:09,560 --> 00:06:14,450
If you truly believe that everything you do not know is not important then you'll never learn anything

83
00:06:14,450 --> 00:06:19,920
new.

84
00:06:19,920 --> 00:06:24,790
Now this brings us to our third question is this course fast paced or slow paced.

85
00:06:25,030 --> 00:06:27,380
And I think you can see where I'm going with this.

86
00:06:27,420 --> 00:06:30,160
This is yet another example of ambiguity.

87
00:06:30,240 --> 00:06:36,500
Fast pace to whom whether it's fast paced or not depends on which person is taking the course.

88
00:06:36,510 --> 00:06:39,830
So if someone is well-prepared then things will be just right.

89
00:06:40,590 --> 00:06:44,930
If someone already knows most of these topics then it will probably be too slow.

90
00:06:45,610 --> 00:06:50,950
If someone chose not to prepare for the course or they have an inadequate understanding of the prerequisites

91
00:06:51,310 --> 00:06:53,550
then it's going to be too fast.

92
00:06:53,560 --> 00:06:58,210
So if I expected you to know something and you did it then you have to look it up which is going to

93
00:06:58,210 --> 00:07:00,250
make the course feel like it's going too fast

94
00:07:05,390 --> 00:07:07,330
so here is the real test.

95
00:07:07,430 --> 00:07:13,130
If I say something depends on calculus of probability and you claim to know calculus and probability

96
00:07:13,520 --> 00:07:17,390
but you don't understand the thing that depends on calculus and probability.

97
00:07:17,480 --> 00:07:23,780
You have to ask yourself honestly do I really understand calculus and probability or did the instructor

98
00:07:24,140 --> 00:07:28,280
invent his own kind of calculus and probability that I don't yet understand.

99
00:07:28,280 --> 00:07:30,450
Well of course that's not very likely.

100
00:07:30,530 --> 00:07:35,320
It's most likely that you thought you understood calculus and probability but you really don't

101
00:07:40,430 --> 00:07:44,110
now onto our initial question what kind of course is this.

102
00:07:44,120 --> 00:07:45,470
Is it academic.

103
00:07:45,470 --> 00:07:46,640
Is it for professionals.

104
00:07:46,640 --> 00:07:49,460
Is it practical sort of understand this.

105
00:07:49,460 --> 00:07:55,610
You have to understand where machine learning is usually taught in how it is usually taught firstly

106
00:07:55,820 --> 00:08:01,130
machine learning is usually taught in upper year computer science and engineering programs in university

107
00:08:01,130 --> 00:08:02,450
and college.

108
00:08:02,450 --> 00:08:08,210
So these are students who already took Calculus 1 to 3 linear algebra differential equations probability

109
00:08:08,210 --> 00:08:14,030
and statistics discrete math programming and possibly more all in their first two or three years of

110
00:08:14,030 --> 00:08:15,220
college.

111
00:08:15,230 --> 00:08:19,490
So by the time they get to third and fourth year they know all this stuff and they're ready to apply

112
00:08:19,490 --> 00:08:19,610
it.

113
00:08:24,800 --> 00:08:29,820
So a typical academic course might cover a whole library of machine learning algorithms.

114
00:08:29,960 --> 00:08:36,110
So you might learn K nearest neighbor logistic regression K Means clustering PCH and so on all in the

115
00:08:36,110 --> 00:08:42,110
same course typically you'd cover one algorithm per lecture which is about two hours.

116
00:08:42,110 --> 00:08:46,300
Overall you might have something like 12 or maybe 20 of those lectures in a term.

117
00:08:46,790 --> 00:08:51,470
And by cover I mean mostly via math geometry and derivations.

118
00:08:51,470 --> 00:08:55,340
These courses don't typically have you at your computer doing programming work.

119
00:08:55,970 --> 00:09:01,190
If you do end up doing programming it's usually part of a lab assignment which is actually just a very

120
00:09:01,190 --> 00:09:04,330
small part of the course compared to the rest.

121
00:09:04,580 --> 00:09:10,730
So it's mostly theoretical dealing with equations solving equations reasoning about equations those

122
00:09:10,730 --> 00:09:14,750
equations typically involve probabilities and information theory.

123
00:09:14,750 --> 00:09:19,760
So you have two hours to go through the theory for one algorithm and then if there is coding involved

124
00:09:20,060 --> 00:09:23,450
which there usually isn't you do it on your own time.

125
00:09:23,610 --> 00:09:26,180
The lecturers themselves do not involve any coding

126
00:09:31,210 --> 00:09:34,330
but there is an even bigger problem with these academic courses.

127
00:09:34,330 --> 00:09:35,950
And what is that.

128
00:09:35,970 --> 00:09:42,440
What happens is that the grades turn out to be very low out of 100 percent.

129
00:09:42,470 --> 00:09:45,640
The average might be around 25 percent or up to 40 percent.

130
00:09:46,230 --> 00:09:47,350
And you might think wow.

131
00:09:47,360 --> 00:09:48,850
So everyone fails.

132
00:09:48,980 --> 00:09:53,920
And of course that's not the case since everyone gets shifted based on the average.

133
00:09:53,960 --> 00:09:56,200
So what is the real problem here.

134
00:09:56,210 --> 00:10:01,160
A lot of people ask me Well why don't you cover every machine learning algorithm in one big mass of

135
00:10:01,160 --> 00:10:04,090
course and just call it machine learning.

136
00:10:04,100 --> 00:10:09,950
Well the answer is that if you tried to do too much you only end up understanding 25 percent of the

137
00:10:09,950 --> 00:10:13,450
material which I think is clearly demonstrated here.

138
00:10:13,610 --> 00:10:15,910
And as they say the proof is in the pudding.

139
00:10:16,100 --> 00:10:20,780
If you look at these courses where students are trying to digest 10 different algorithms at the same

140
00:10:20,780 --> 00:10:22,990
time what is the result.

141
00:10:23,000 --> 00:10:25,490
The result is they don't understand any of them.

142
00:10:25,490 --> 00:10:29,150
If you try to understand too many things you end up understanding nothing.

143
00:10:30,140 --> 00:10:35,720
I take the perspective that if you're going to take a course you should aim to understand 100 percent

144
00:10:35,840 --> 00:10:42,600
of the material each course is broken up into appropriate levels based on how difficult they are.

145
00:10:42,630 --> 00:10:43,930
And by topic.

146
00:10:44,010 --> 00:10:50,490
So for example you can typically take Calculus 1 which focuses on differential calculus Calculus 2 which

147
00:10:50,490 --> 00:10:56,280
focuses on integral calculus in calculus 3 which focuses on multi variable calculus.

148
00:10:56,520 --> 00:11:01,350
You would never learn all of these in just one huge course called math because that's just too much

149
00:11:01,350 --> 00:11:04,100
stuff to understand all at once.

150
00:11:04,140 --> 00:11:10,820
I want you to understand 100 percent of the material even 50 percent or 75 percent is not good enough

151
00:11:10,830 --> 00:11:12,240
in my opinion.

152
00:11:12,240 --> 00:11:18,480
If you only understand 75 percent of the material and the next course depends on the 25 percent that

153
00:11:18,480 --> 00:11:27,310
you didn't understand well then you're not going to make it to the next level.

154
00:11:27,310 --> 00:11:29,950
Now let's look at the opposite end of the spectrum.

155
00:11:30,130 --> 00:11:33,660
What most people might consider to be a practical course.

156
00:11:33,880 --> 00:11:38,800
I don't consider these to be practical courses and I'll explain why later but I use the word practical

157
00:11:38,800 --> 00:11:44,020
here in quotes since that's what some of the more beginner students will refer to the maths.

158
00:11:44,050 --> 00:11:46,640
So how does a practical course typically proceed.

159
00:11:48,080 --> 00:11:53,510
Well first you were never really taught how the algorithms work other than at a very high level.

160
00:11:53,630 --> 00:11:57,410
You might learn how they work by way of analogies and metaphors.

161
00:11:57,410 --> 00:12:02,870
So for example suppose you're learning about gradient descent while the instructor might relate this

162
00:12:03,050 --> 00:12:05,920
to a ball rolling down a hill.

163
00:12:05,930 --> 00:12:10,940
Now some beginner students they might conclude that I understand a ball rolling down a hill.

164
00:12:10,940 --> 00:12:16,310
Therefore I understand gradient descent but they neglect to understand how to take derivatives.

165
00:12:16,310 --> 00:12:20,720
They neglect to understand how to choose a good learning rate and most importantly they neglect how

166
00:12:20,720 --> 00:12:27,410
to implement gradient descent in code so it's totally wrong to assume that just because you can visualize

167
00:12:27,410 --> 00:12:34,720
a ball rolling down a hill that you understand gradient descent y well the problem is everybody can

168
00:12:34,720 --> 00:12:36,760
imagine a ball rolling down a hill.

169
00:12:36,760 --> 00:12:41,920
Your grandparents who can't even turn on a computer can understand a ball rolling down a hill.

170
00:12:42,040 --> 00:12:51,800
So don't equate understanding an analogy to understanding the real topic.

171
00:12:51,850 --> 00:12:55,130
So what's the next thing these practical courses do.

172
00:12:55,330 --> 00:13:01,570
The next thing they usually teach you is how to plug into an API or in other words make use of a library.

173
00:13:01,570 --> 00:13:02,880
Someone else wrote.

174
00:13:03,040 --> 00:13:05,180
This is also not ideal.

175
00:13:05,260 --> 00:13:08,480
Typically this involves just two or three lines of code.

176
00:13:08,500 --> 00:13:13,540
You have to wonder if machine learning just boils down to two or three lines of code.

177
00:13:13,540 --> 00:13:15,700
Why is it such a big deal these days.

178
00:13:15,700 --> 00:13:19,170
Why did it take decades for people to realize its usefulness.

179
00:13:19,330 --> 00:13:23,500
And of course the answer is that there's much more to machine learning than just these two or three

180
00:13:23,500 --> 00:13:24,640
lines of code.

181
00:13:25,000 --> 00:13:29,550
When you use a deep learning library like carrots you're not doing deep learning.

182
00:13:29,580 --> 00:13:36,070
You are using a deep learning interface you're simply programming it's more accurate to say I can program

183
00:13:36,100 --> 00:13:40,180
with deep learning libraries rather than saying I actually know deep learning

184
00:13:44,880 --> 00:13:47,760
but some of you might say well that's all I really want to do.

185
00:13:47,790 --> 00:13:51,450
All I need is an API and for the API to give me an answer.

186
00:13:51,630 --> 00:13:56,220
I don't care about how it works but then here is where you run into trouble.

187
00:13:56,220 --> 00:13:59,080
Recall in addition to all data is the same.

188
00:13:59,160 --> 00:14:02,510
We know that all machine learning interfaces are the same.

189
00:14:02,550 --> 00:14:05,520
So the question becomes which API do you choose.

190
00:14:05,520 --> 00:14:11,180
If they're all the same how can you make an informed choice about which ones to use.

191
00:14:11,190 --> 00:14:16,590
Well the reality is if you don't know how these things work then you can't answer this question.

192
00:14:16,590 --> 00:14:22,130
This is why some of these so-called practical courses can cover 20 algorithms in the same course.

193
00:14:22,260 --> 00:14:27,750
It's because they never actually talk about how the algorithms work since the API is the same every

194
00:14:27,750 --> 00:14:28,380
time.

195
00:14:28,380 --> 00:14:33,280
It's very easy to just do the same thing 20 times and then not bother with the details.

196
00:14:33,390 --> 00:14:38,070
So this gives people a lot of confidence that they were able to do the same thing 20 times.

197
00:14:38,070 --> 00:14:43,560
But it's dangerous thinking because you don't realize you really only learned one thing you just repeated

198
00:14:43,560 --> 00:14:49,550
it 20 times.

199
00:14:49,600 --> 00:14:52,780
So how about when it comes to the professional world.

200
00:14:52,990 --> 00:14:58,420
Sometimes people tend to believe that in your profession you'll be leaning more toward the practical

201
00:14:58,420 --> 00:15:03,070
end of the spectrum where you don't really need to know how a machine learning model works.

202
00:15:03,100 --> 00:15:09,260
You just have to know its API but what it really comes down to is what your job actually is.

203
00:15:09,310 --> 00:15:13,900
If you're a regular everyday programmer and you just want to see how a machine learning model might

204
00:15:13,900 --> 00:15:18,550
perform on your data then you're going to use a pre-built library most likely.

205
00:15:18,550 --> 00:15:21,270
That's just the two or three lines of code I referenced earlier.

206
00:15:22,850 --> 00:15:27,460
However that does not mean you don't still have to conform to best practices.

207
00:15:27,560 --> 00:15:32,390
Make sure you're not mixing your test set with your train set make sure you're doing proper validation

208
00:15:32,420 --> 00:15:37,880
and so on and when it comes to tuning your model then you'll have to be able to dig down into the details

209
00:15:38,180 --> 00:15:43,750
and see which hyper parameters are possible to change and which are likely to have the most impact.

210
00:15:43,850 --> 00:15:48,610
So only knowing how to use an API is only going to get you past the first stage.

211
00:15:48,740 --> 00:15:52,840
You can plug in your data and at least your code is the right syntax and doesn't crash.

212
00:15:53,000 --> 00:15:56,720
But of course your code not crashing doesn't necessarily mean it's right

213
00:16:01,760 --> 00:16:06,830
back on the other end of the spectrum you have programmers and scientists whose sole job it is to do

214
00:16:06,830 --> 00:16:08,090
machine learning.

215
00:16:08,090 --> 00:16:10,340
These guys are machine learning experts.

216
00:16:10,400 --> 00:16:16,340
They know the math inside and out and have a full computer science education and so to even be able

217
00:16:16,340 --> 00:16:18,500
to communicate your ideas effectively.

218
00:16:18,650 --> 00:16:24,230
You have to be comfortable with a theory and not just how to plug into an API with two lines of code.

219
00:16:24,350 --> 00:16:30,260
These scientists are building custom models really digging down into the math to not only use but invent

220
00:16:30,260 --> 00:16:36,320
new things and to be able to invent new things you really need to know the math very well.

221
00:16:36,320 --> 00:16:41,410
And so if you're on a team like this you can't just know how to write two lines of code your skill set

222
00:16:41,690 --> 00:16:46,740
has to be far beyond that so your theoretical background has to be very solid.

223
00:16:46,900 --> 00:16:50,620
If you want to be able to communicate effectively with other scientists

224
00:16:55,860 --> 00:17:00,540
now a lot of people say well I don't ever want to be a scientist so I don't really care how these algorithms

225
00:17:00,540 --> 00:17:01,680
work.

226
00:17:01,680 --> 00:17:05,910
Unfortunately that's a very bad attitude to have in the workplace.

227
00:17:05,910 --> 00:17:10,210
Suppose for example I am running a business now an employee of mine.

228
00:17:10,230 --> 00:17:13,700
He wants to show off and say he's learning about machine learning.

229
00:17:13,710 --> 00:17:18,990
He says Oh I'm taking this machine learning course online therefore I'll become a machine learning master.

230
00:17:19,650 --> 00:17:23,490
But his approach to machine learning is to learn as little as possible.

231
00:17:23,490 --> 00:17:25,080
He says he doesn't care about the math.

232
00:17:25,110 --> 00:17:27,030
He only wants the practical stuff.

233
00:17:27,030 --> 00:17:29,760
He only wants to know the bare minimum.

234
00:17:29,760 --> 00:17:31,590
He tells me he doesn't care how it works.

235
00:17:31,590 --> 00:17:38,060
He's just going to plug into some API because that's what his buddy told him he could do as a business

236
00:17:38,120 --> 00:17:38,890
owner.

237
00:17:38,930 --> 00:17:41,480
I really don't want this guy working for me.

238
00:17:41,540 --> 00:17:47,150
I run a business that sells something and the life of the company itself and the livelihoods of the

239
00:17:47,150 --> 00:17:52,000
employees all depend on our products being the best they can possibly be.

240
00:17:52,010 --> 00:17:56,810
So if someone comes to me and says I don't care about being the best I just care about doing the bare

241
00:17:56,810 --> 00:17:57,710
minimum.

242
00:17:57,800 --> 00:18:00,580
My response is I don't want that guy on my team.

243
00:18:00,590 --> 00:18:02,600
In fact he should be fired.

244
00:18:02,630 --> 00:18:07,670
So be careful if this is your approach to machine learning no business owner wants to hear that your

245
00:18:07,670 --> 00:18:12,560
approach to the business is to do the bare minimum and to put off some math because you think it's too

246
00:18:12,560 --> 00:18:13,520
hard.

247
00:18:13,520 --> 00:18:20,700
That's a scary prospect for a business owner.

248
00:18:20,720 --> 00:18:25,720
Now if we go back to our original question maybe you've realized we haven't answered it yet.

249
00:18:25,730 --> 00:18:27,830
What kind of course is this.

250
00:18:27,830 --> 00:18:31,550
Well you can think of it as a marriage between these approaches.

251
00:18:31,550 --> 00:18:35,670
We want to cover all the math and derivations because that's very important.

252
00:18:35,750 --> 00:18:40,580
If you want to be able to communicate effectively with other scientists to be able to debug your code

253
00:18:40,580 --> 00:18:44,090
effectively and to be able to invent new approaches.

254
00:18:44,360 --> 00:18:47,210
This is also what we mean by machine learning.

255
00:18:47,210 --> 00:18:52,070
We mean really knowing machine learning and not just programming with an API.

256
00:18:52,070 --> 00:18:54,690
So the theoretical background is very important.

257
00:18:54,950 --> 00:19:00,650
At the same time we're not going to introduce 20 different algorithms in the same course because unlike

258
00:19:00,650 --> 00:19:06,860
a typical academic course where you can pass just by understanding 25 percent of the material in this

259
00:19:06,860 --> 00:19:12,260
course it's structured in a way that if you have any missing gaps it's going to be impossible for you

260
00:19:12,260 --> 00:19:18,180
to move on to the next stage but academic courses don't contain much coding if at all.

261
00:19:18,690 --> 00:19:23,550
So in this course every algorithm we learn about will be implemented in code.

262
00:19:23,550 --> 00:19:27,930
My motto is If you can't implement it then you don't understand it.

263
00:19:27,930 --> 00:19:32,760
You can pretend to understand it and when you talk about it you might sound like you understand it but

264
00:19:32,760 --> 00:19:35,160
implementation is the true test.

265
00:19:35,160 --> 00:19:37,790
It's like a math exam rather than writing an essay.

266
00:19:38,070 --> 00:19:42,680
When you write an essay you might be able to fool the teacher into thinking you understand something.

267
00:19:42,870 --> 00:19:48,330
But in a math exam either your answers are right or your answers are wrong in this context.

268
00:19:48,340 --> 00:19:50,190
Either your code works or doesn't.

269
00:19:50,280 --> 00:19:51,240
It's very clear cut

270
00:19:56,010 --> 00:19:57,590
when it comes to practicality.

271
00:19:57,600 --> 00:19:59,110
We cover that too.

272
00:19:59,130 --> 00:20:05,220
We demonstrate our algorithms both on real world data like text and images and we demonstrate our algorithms

273
00:20:05,220 --> 00:20:09,990
on two dimensional data sets so that you can visualize what an algorithm is doing.

274
00:20:10,230 --> 00:20:15,510
Both of these are important texts and images are quite possibly the most practical data.

275
00:20:15,510 --> 00:20:22,440
There is some billion dollar companies exist and thrive because of those capabilities to 2D data sets

276
00:20:22,440 --> 00:20:26,670
are critical as well because as humans that's the only thing we can see.

277
00:20:26,760 --> 00:20:31,500
But at the same time we don't try to pull the wool over your eyes and pretend like you need 20 different

278
00:20:31,500 --> 00:20:33,370
data sets to practice on.

279
00:20:33,390 --> 00:20:38,700
Instead we learn something much more valuable how to use an algorithm on an infinite number of datasets

280
00:20:38,760 --> 00:20:43,300
because we are able to abstract the idea that all data is the same.

281
00:20:43,350 --> 00:20:48,090
Remember that all the computer sees is a list of numbers it doesn't know that those numbers represent

282
00:20:48,090 --> 00:20:50,460
height measurements or an image pixel.

283
00:20:50,790 --> 00:20:56,400
And of course by experiencing both ends of the spectrum both the theory and the implementation.

284
00:20:56,400 --> 00:21:01,380
And then finally plugging data into your model that's going to make you very well rounded professionally

285
00:21:06,660 --> 00:21:08,030
to conclude this lecture.

286
00:21:08,100 --> 00:21:10,170
I want to pose a final question.

287
00:21:10,320 --> 00:21:12,680
How does all this knowledge help you.

288
00:21:12,720 --> 00:21:18,180
Why does understanding the approach of this course make you better at machine learning in general.

289
00:21:18,180 --> 00:21:22,890
Well my hope is that this gives you a very high level bird's eye view of the landscape.

290
00:21:22,980 --> 00:21:28,260
You can see the different approaches of people from different backgrounds academic people typically

291
00:21:28,260 --> 00:21:31,260
don't see what other non-academic people are doing.

292
00:21:31,260 --> 00:21:35,930
And those people who hate math typically don't see what people who love math are doing.

293
00:21:36,000 --> 00:21:40,980
So with this bird's eye view you can see everybody if you're learning machine learning it helps you

294
00:21:40,980 --> 00:21:46,350
understand who am I competing with when I apply for a job and what kinds of jobs are out there.

295
00:21:46,380 --> 00:21:50,810
And what would be adequate preparation if you're hiring for a machine learning team.

296
00:21:50,820 --> 00:21:55,590
It helps you understand the different backgrounds of individuals you have to compare to one another

297
00:21:55,890 --> 00:21:59,250
to understand their strengths and weaknesses and what they bring to the table.
