1
00:00:02,100 --> 00:00:07,180
Once we've studied Arnon's, we can go even deeper into how deep learning can be used on text.

2
00:00:07,590 --> 00:00:13,320
That's the next course, natural language processing with deep learning in Python or deep learning in

3
00:00:13,320 --> 00:00:14,520
Python, part six.

4
00:00:15,440 --> 00:00:20,840
In this course, we look at a very important concept called a Werdum betting's, these allow us to turn

5
00:00:20,840 --> 00:00:26,400
words which are categorical variables into vectors, which are numbers that in turn that word can read.

6
00:00:27,260 --> 00:00:32,000
This is why it's very important to have studied unsupervised learning before this, because finding

7
00:00:32,000 --> 00:00:34,670
Werdum betting's is actually an unsupervised task.

8
00:00:35,360 --> 00:00:40,400
Word embedding also allow us to make use of pre training, which was discussed in deep learning in Python

9
00:00:40,400 --> 00:00:41,030
part for.

10
00:00:42,020 --> 00:00:47,510
In this course, we also look at a very advanced model for doing sentiment analysis called a recursive

11
00:00:47,510 --> 00:00:49,430
neural network or a tree neural network.

12
00:00:50,180 --> 00:00:55,580
This is an example of a dynamic neural network because it changes its structure based on what input

13
00:00:55,580 --> 00:00:56,090
you give it.

14
00:00:56,900 --> 00:01:00,920
Most deep learning libraries are not equipped to handle dynamic neural networks.

15
00:01:00,930 --> 00:01:06,740
And I demonstrate what happens if you try to build one the naive way you basically end up building a

16
00:01:06,740 --> 00:01:11,420
separate neural network for each of your training samples and it's going to eat up all your ram and

17
00:01:11,420 --> 00:01:13,350
make your computer slow down to a crawl.

18
00:01:14,150 --> 00:01:19,280
So you get a firsthand perspective of why working with dynamic neural networks is not easy.

19
00:01:20,000 --> 00:01:24,430
But luckily we are able to make use of our knowledge of recurrent neural networks.

20
00:01:24,440 --> 00:01:30,860
And remember, that's the prerequisite to this course in order to convert a tree into a sequence which

21
00:01:30,860 --> 00:01:32,820
Anadan is capable of handling.

22
00:01:33,350 --> 00:01:36,350
So that's why Arnon's is a prerequisite to this course.

23
00:01:38,250 --> 00:01:42,480
Now, you notice that this course, deep in L.P, has no outgoing edges.

24
00:01:42,510 --> 00:01:45,830
This is because it's the most advanced course that I have on this path.

25
00:01:45,840 --> 00:01:49,780
For now, I certainly plan on expanding in this direction in the future.

26
00:01:50,460 --> 00:01:54,180
However, this is not the last deep learning course in the series.

27
00:01:54,810 --> 00:01:59,910
After this, we have deep reinforcement learning, which is deep learning in Python part seven.

28
00:02:00,660 --> 00:02:05,430
With that said, I think now is a good time to go back and start exploring the reinforcement learning

29
00:02:05,430 --> 00:02:05,780
path.

30
00:02:07,020 --> 00:02:11,220
So you'll notice that I have Bayesian machine learning, feeding into reinforcement learning.

31
00:02:12,550 --> 00:02:17,740
On the surface, these two corpses might seem unrelated, but there is a very important concept you'll

32
00:02:17,740 --> 00:02:23,350
learn that applies to both called the explore exploit dilemma in Bayesian machine learning.

33
00:02:23,380 --> 00:02:28,480
You learn this idea in the context of trying to optimize it, click through rate or a conversion rate,

34
00:02:28,960 --> 00:02:33,850
or in other words, the number of times people buy things from your website versus the number of times

35
00:02:33,850 --> 00:02:35,180
people visited your website.

36
00:02:35,650 --> 00:02:36,880
Very practical concept.

37
00:02:36,880 --> 00:02:43,000
I think if you do anything related to e-commerce now in reinforcement learning, we look at the explore

38
00:02:43,000 --> 00:02:44,350
exploit dilemma again.

39
00:02:44,380 --> 00:02:50,020
But in the context of playing games, reinforcement learning is like a third branch of machine learning,

40
00:02:50,020 --> 00:02:53,600
whereas the other two are supervised and unsupervised learning.

41
00:02:54,280 --> 00:03:00,010
The main difference is that supervised and unsupervised learning look at static data and reinforcement

42
00:03:00,010 --> 00:03:00,360
learning.

43
00:03:00,370 --> 00:03:04,300
The idea is more like you have a robot living in the real world.

44
00:03:04,510 --> 00:03:10,350
It can take the experiences it had today and based on them, behave more intelligently tomorrow.

45
00:03:10,840 --> 00:03:14,030
So the learning paradigm and reinforcement learning is sequential.

46
00:03:14,650 --> 00:03:20,290
This is supposed to supervise an unsupervised learning where your data set usually resides in some file

47
00:03:20,290 --> 00:03:21,220
on your hard drive.

48
00:03:22,370 --> 00:03:28,370
So in reinforcement learning, part one, we get all the basics, as you might expect, this is a prerequisite

49
00:03:28,370 --> 00:03:33,890
to deep reinforcement learning, since deep reinforcement learning applies those concepts to more difficult

50
00:03:33,890 --> 00:03:34,490
games.

51
00:03:35,220 --> 00:03:39,290
But you'll notice that this is not the only prerequisite to deep reinforcement learning.

52
00:03:40,100 --> 00:03:41,440
As the title suggests.

53
00:03:41,450 --> 00:03:47,720
This is also dependent on knowledge of deep learning, in particular convolutional neural networks.

54
00:03:49,550 --> 00:03:54,920
But as we know, in order to build a CNN, we have to know how to build a regular neural network, which

55
00:03:54,920 --> 00:03:58,350
means we have to know what a neural network is and why it's useful and so on.

56
00:03:58,940 --> 00:04:02,330
So in deep reinforcement learning, these two paths converge.

57
00:04:02,750 --> 00:04:07,610
You combine your knowledge of both reinforcement learning and deep learning in this course.

58
00:04:08,890 --> 00:04:16,450
Now, the reason it depends on CNN's and not, say, Arnon's, which is over here, is because we'll

59
00:04:16,450 --> 00:04:18,250
be learning to play visual games.

60
00:04:18,250 --> 00:04:23,800
So, for example, we can learn how to play a video game like Pong, a breakout, which are classic

61
00:04:23,800 --> 00:04:25,320
Atari games from the old days.

62
00:04:25,690 --> 00:04:29,830
And so those are images because they're basically screenshots from the screen.

63
00:04:30,760 --> 00:04:36,550
In the future, we might end up applying Arnon's, in which case Arnon's will become a prerequisite

64
00:04:36,550 --> 00:04:37,300
to that course.

65
00:04:39,050 --> 00:04:41,900
So that's the end of the reinforcement learning path for now.

66
00:04:42,110 --> 00:04:45,880
I'm very excited to bring you more updates in this area in the future.

67
00:04:48,280 --> 00:04:50,890
Let's now jump back to you, logistic regression.

68
00:04:52,000 --> 00:04:58,600
Where we can see another outgoing edge to supervise machine learning, so why is this edge here?

69
00:04:59,490 --> 00:05:04,620
Well, you may recall that linear regression and logistic regression are both linear models that do

70
00:05:04,620 --> 00:05:06,870
regression and classification, respectively.

71
00:05:07,560 --> 00:05:09,690
These are both supervised learning tasks.

72
00:05:11,700 --> 00:05:17,130
And so it makes sense that now that, you know, one model for progression and one model for classification,

73
00:05:17,520 --> 00:05:20,080
it's time to dig deeper into supervised learning.

74
00:05:20,940 --> 00:05:25,730
The thing with linear regression and logistic regression is that they aren't really different models.

75
00:05:25,740 --> 00:05:29,130
They're both just the line because they do different tasks.

76
00:05:29,130 --> 00:05:32,040
The techniques and interpretations are slightly different to.

77
00:05:33,120 --> 00:05:37,410
And there are, of course, different models that are not linear models that can do these tests, and

78
00:05:37,410 --> 00:05:39,390
that's what this course is all about.

79
00:05:40,140 --> 00:05:45,750
In this course, we look at classic supervised machine learning techniques like Canibus Neighbor Decision

80
00:05:45,750 --> 00:05:48,570
Trees, the Perceptron and the baize classifier.

81
00:05:49,530 --> 00:05:54,990
Much like how logistic regression was the basic building block of the neural network, classic models

82
00:05:54,990 --> 00:05:58,800
like decision trees are the basic building block of ensemble methods.

83
00:05:59,780 --> 00:06:06,710
So that's why this course is a prerequisite to ensemble machine learning, again, we use the same logo

84
00:06:06,710 --> 00:06:11,480
with a different color to signify that these two courses are very closely related.

85
00:06:12,440 --> 00:06:14,010
In ensemble machine learning.

86
00:06:14,030 --> 00:06:19,640
We learn how to combine multiple decision trees in different ways in order to make some very powerful

87
00:06:19,640 --> 00:06:20,530
classifiers.

88
00:06:21,260 --> 00:06:26,190
What's really remarkable about these methods is that they are very easy to plug and play on data.

89
00:06:26,660 --> 00:06:31,820
So if you're looking for a plug and play solution without having to learn a lot of theory, then deep

90
00:06:31,820 --> 00:06:33,830
learning is most likely not for you.

91
00:06:34,190 --> 00:06:36,140
But ensemble methods are a great fit.

92
00:06:37,090 --> 00:06:41,620
Deep learning is very dependent on hyper parameters, and if you choose incorrectly, your model will

93
00:06:41,620 --> 00:06:42,790
perform very poorly.

94
00:06:43,420 --> 00:06:47,360
Sometimes it requires immense computing power to find good hyper parameters.

95
00:06:47,890 --> 00:06:50,860
This is an active area of research that has not yet solved.

96
00:06:51,670 --> 00:06:55,210
This is why you can implement what you see in a deep learning paper.

97
00:06:55,240 --> 00:07:00,760
But suppose the author left out some seemingly insignificant detail so you end up having to make an

98
00:07:00,760 --> 00:07:03,400
assumption and then your results end up totally different.

99
00:07:03,850 --> 00:07:08,060
So deep learning is fragile, but luckily ensemble methods are not.

100
00:07:08,770 --> 00:07:13,560
We focus on two very famous ensemble methods, the random forests and the boost.

101
00:07:14,380 --> 00:07:17,180
So that's everything on the supervised machine learning track.

102
00:07:17,200 --> 00:07:17,800
For now.

103
00:07:19,060 --> 00:07:25,720
Next, we see an edge going from supervised machine learning to unsupervised machine learning, in particular

104
00:07:25,720 --> 00:07:26,870
cluster analysis.

105
00:07:27,730 --> 00:07:32,920
The reason we study supervised learning before unsupervised learning is because unsupervised learning

106
00:07:32,920 --> 00:07:34,360
is a little more abstract.

107
00:07:34,900 --> 00:07:40,300
It takes more effort on the student's part to realize why it's practical and what it can be used for.

108
00:07:41,290 --> 00:07:47,210
Cluster analysis shows us how to model data that does not come with targets, as you might guess.

109
00:07:47,230 --> 00:07:49,150
We do this in the form of clustering.

110
00:07:49,930 --> 00:07:52,060
The idea behind clustering is very simple.

111
00:07:52,630 --> 00:07:57,580
We want to know how many naturally occurring groups of data there are and what are the relationships

112
00:07:57,580 --> 00:07:59,390
between the data in these clusters.

113
00:07:59,800 --> 00:08:04,900
So, for example, if you were clustering books, you might find a book about Steve Jobs and a book

114
00:08:04,900 --> 00:08:06,800
about Elon Musk in the same cluster.

115
00:08:07,360 --> 00:08:10,660
This cluster is probably about tech companies in Silicon Valley.

116
00:08:12,640 --> 00:08:17,620
But you don't need a label in your data to tell you that you can discover it yourself by looking at

117
00:08:17,920 --> 00:08:19,840
how the data naturally groups together.

118
00:08:21,190 --> 00:08:26,440
Now, as I mentioned earlier, I think once you learn about both supervised and unsupervised learning,

119
00:08:26,710 --> 00:08:29,290
you'll be ready to jump into reinforcement learning.

120
00:08:30,410 --> 00:08:35,780
I haven't made cluster analysis a prerequisite to reinforcement learning, since none of the material

121
00:08:35,780 --> 00:08:36,920
depends on this course.

122
00:08:37,250 --> 00:08:42,050
But it's good to know about these techniques so that you have a more mature and experienced view on

123
00:08:42,050 --> 00:08:42,740
machine learning.

124
00:08:46,320 --> 00:08:50,670
What is sort of a sequel to Cluster Analysis is hidden Markov models.

125
00:08:51,880 --> 00:08:55,330
The reason might not be clear at first, so let me give you two reasons.

126
00:08:56,500 --> 00:09:02,500
Number one, they are both unsupervised machine learning models, just that ACMS is harder, so it's

127
00:09:02,500 --> 00:09:04,540
natural to learn about clustering first.

128
00:09:05,020 --> 00:09:09,040
Clustering is also about static data, whereas ECMS are about sequences.

129
00:09:09,400 --> 00:09:11,950
So it's similar to the process we did in deep learning.

130
00:09:11,950 --> 00:09:16,180
We looked at static data like images, then sequential data like text.

131
00:09:17,760 --> 00:09:23,400
Reason number two, in cluster analysis, we learn about a technique called the Gaussian mixture model,

132
00:09:23,430 --> 00:09:25,800
which we make use of in the same course.

133
00:09:26,580 --> 00:09:31,910
One key point is they both learn by using the expectation maximization algorithm.

134
00:09:32,490 --> 00:09:35,640
So it's good to first see the algorithm on a simple model.

135
00:09:35,640 --> 00:09:41,610
And then when you see them again on a more complicated model like the human, it won't be as intimidating.

136
00:09:43,130 --> 00:09:49,400
One key concept you learn in acronyms is the mark of assumption that just means the current state depends

137
00:09:49,400 --> 00:09:52,700
only on the previous state, but not any states before it.

138
00:09:53,480 --> 00:09:57,410
This is a simplifying assumption that usually makes the math easier to work with.

139
00:09:58,280 --> 00:10:03,140
You will also notice that we encountered the mark of assumption and reinforcement learning, however,

140
00:10:03,410 --> 00:10:05,390
it's not too hard to learn it from scratch.

141
00:10:05,660 --> 00:10:11,060
And so for that reason, I do not consider ECMS to be a prerequisite to reinforcement learning.

142
00:10:11,690 --> 00:10:14,630
The mark of assumption is really the only thing they have in common.

143
00:10:15,590 --> 00:10:22,400
There is also a slight connection between cluster analysis and unsupervised learning, so I'm not going

144
00:10:22,400 --> 00:10:28,670
to draw the link right now, but I sometimes consider this to be unsupervised machining part one and

145
00:10:28,940 --> 00:10:31,400
this to be unsupervised machine learning, part two.

146
00:10:33,420 --> 00:10:38,340
We also see that HMS feeds into the answer, of course, which is about deep learning.

147
00:10:38,580 --> 00:10:39,660
So why might that be?

148
00:10:40,560 --> 00:10:46,140
This is, of course, because both these models are models that can learn about sequences, in particular

149
00:10:46,590 --> 00:10:49,410
in both these courses with model text as sequences.

150
00:10:50,040 --> 00:10:56,190
But whereas the HMS makes use of the mark of assumption, he Arnon does not and hence the Arnon is a

151
00:10:56,190 --> 00:10:57,300
more powerful model.

152
00:10:58,270 --> 00:11:03,280
And so this just goes along with the main theme that we always go from simple basic models to more complex

153
00:11:03,280 --> 00:11:03,790
models.

154
00:11:04,390 --> 00:11:06,730
This is also something you should do in your work as well.

155
00:11:07,330 --> 00:11:11,560
If you start with a simple model, you often find that it is faster and more robust.

156
00:11:11,920 --> 00:11:17,410
Complex models sometimes break down, but they are also more difficult to implement and might not even

157
00:11:17,410 --> 00:11:18,310
be fast enough.

158
00:11:18,670 --> 00:11:20,650
Of course, that's just a generalization.

159
00:11:20,660 --> 00:11:25,300
So you always want to analyze engineering tradeoffs individually for every problem you have.

160
00:11:27,350 --> 00:11:31,970
Now, there is one last link in this part of the graph here that I want to explain, and that's the

161
00:11:31,970 --> 00:11:34,780
first A.P. course you can see there.

162
00:11:34,790 --> 00:11:38,840
It depends on supervised machine learning and feeds into deep enthalpy.

163
00:11:40,220 --> 00:11:46,030
The main purpose of this basic A.P. course is to apply basic machine learning models to text.

164
00:11:46,520 --> 00:11:51,470
So that's why I supervised machine learning comes before it is because that this course was all about

165
00:11:51,680 --> 00:11:53,210
basic machine learning models.

166
00:11:54,110 --> 00:12:00,050
The important skill for NLP was not the implementation of those models, but rather a bigger picture

167
00:12:00,050 --> 00:12:02,500
perspective on how machine learning is used.

168
00:12:03,140 --> 00:12:05,650
What is the interface between the data in the model?

169
00:12:06,080 --> 00:12:07,190
What does the model do?

170
00:12:07,820 --> 00:12:09,400
What is its input and output?

171
00:12:09,560 --> 00:12:10,930
How is the output interpreted?

172
00:12:11,660 --> 00:12:14,780
And so we take those principles and we apply them to text.

173
00:12:15,320 --> 00:12:20,570
In this way we can see that text can be created in such a way that you don't have to think about it

174
00:12:20,810 --> 00:12:23,030
any differently than any other data.

175
00:12:24,310 --> 00:12:29,920
This reinforces the principle that all data is the same, the machine learning model doesn't care what

176
00:12:29,920 --> 00:12:32,790
your data is, all it sees is a table of numbers.

177
00:12:32,800 --> 00:12:36,750
It doesn't care if it's text or images or radar signals from space.

178
00:12:37,270 --> 00:12:41,170
The model just does what it was designed to do on the numbers that you give it.

179
00:12:41,770 --> 00:12:46,930
So this course gives you a high level systems perspective on working with machine learning models in

180
00:12:46,930 --> 00:12:47,470
text.

181
00:12:48,280 --> 00:12:54,430
This easy A.P. course also feeds into deep entropy, which is of course not so easy because it depends

182
00:12:54,430 --> 00:12:56,140
on a lot of background in deep learning.

183
00:12:57,170 --> 00:13:03,260
One of the main questions I get in the NLB course is how do I improve the results of these basic models?

184
00:13:03,830 --> 00:13:08,300
And a lot of the time the answer to that is, well, you have to use a more complex model.

185
00:13:08,310 --> 00:13:12,620
But of course, that necessitates learning how that complex model works.

186
00:13:13,460 --> 00:13:19,460
And Deep in OP is an example of that because we learn a state of the art method for sentiment analysis,

187
00:13:19,460 --> 00:13:22,670
whereas in SCMP we used on the linear model.

188
00:13:23,330 --> 00:13:28,580
So it's important to understand that while, yes, it's possible to improve the predictive ability on

189
00:13:28,580 --> 00:13:33,130
simple basic models, as you can see, it's not always an easy path to get there.

190
00:13:33,590 --> 00:13:35,540
So you have to make sure you're prepared.

191
00:13:36,200 --> 00:13:40,200
Case in point, just look at all the time spent just to get to deep.

192
00:13:40,550 --> 00:13:41,810
It's not an easy task.

193
00:13:43,930 --> 00:13:50,260
Let's now scroll over to Gan's and recreational auto encoders, just like how deep reinforcement learning

194
00:13:50,260 --> 00:13:55,550
is not related to deep in OP, this isn't really related to deep reinforcement learning either.

195
00:13:56,140 --> 00:13:59,650
This is deep learning Part eight by order of creation only.

196
00:14:00,160 --> 00:14:06,310
It is the spiritual sequel to Unsupervised Deep Learning, which was Deep Learning in Python Part four.

197
00:14:07,590 --> 00:14:14,730
OK, so just to reiterate, this is part six, part seven and part eight by order only, but they are

198
00:14:14,730 --> 00:14:20,130
not related to each other conceptually, although it's always nice to know all these things because

199
00:14:20,130 --> 00:14:22,840
more context makes future things easier to learn.

200
00:14:24,230 --> 00:14:32,710
So the reason this is linked to this is because gangs and recreational auto encoders are also unsupervised

201
00:14:32,710 --> 00:14:38,720
deep learning models, but whereas unsupervised deep learning was all about how to improve supervised

202
00:14:38,720 --> 00:14:44,720
learning gains and variation, auto encoders don't have any direct benefit to supervised learning at

203
00:14:44,720 --> 00:14:48,620
all, although we do make use of supervised learning within the course.

204
00:14:49,460 --> 00:14:52,640
In this course, the focus is on generating images.

205
00:14:53,420 --> 00:14:58,940
We've seen that Gan's can create photo realistic images based on a dual neural network system.

206
00:14:59,630 --> 00:15:03,950
That's pretty cool because before Gan's we didn't have any kind of machine learning model that could

207
00:15:03,950 --> 00:15:05,960
generate real looking images.

208
00:15:06,680 --> 00:15:12,800
Nowadays, Gan's are able to generate high resolution, high quality images of people that you can't

209
00:15:12,800 --> 00:15:14,350
even tell are not real people.

210
00:15:14,810 --> 00:15:18,140
It certainly makes the idea of the Matrix seem very possible.

211
00:15:19,800 --> 00:15:22,110
All right, so I hope you found this lecture helpful.

212
00:15:22,620 --> 00:15:27,030
We saw that these courses are related to each other in some pretty complicated ways.

213
00:15:27,840 --> 00:15:30,300
Learning machine learning is not exactly linear.

214
00:15:30,750 --> 00:15:34,140
Sometimes you have to take one course before you can take the next.

215
00:15:34,710 --> 00:15:39,930
Sometimes one course might just be related to another course by some key concept, but maybe in one

216
00:15:39,930 --> 00:15:42,150
context, it's a lot easier to understand.

217
00:15:42,900 --> 00:15:48,600
So remember to keep in mind that these arrows did not all indicate strong prerequisites, or rather

218
00:15:48,600 --> 00:15:51,440
there's just a relationship between the two courses.

219
00:15:52,110 --> 00:15:57,030
I hope that this lecture explained any nuances between what is a prerequisite and what is not.

220
00:15:57,360 --> 00:15:59,340
And I hope I did a good job of answering.

221
00:15:59,670 --> 00:16:02,160
Which order should you take these courses in and why?