1
00:00:02,130 --> 00:00:07,140
Once we've studied art ends, we can go even deeper into how deep learning can be used on text.

2
00:00:07,590 --> 00:00:13,320
That's the next course natural language processing with deep learning in Python or deep learning in

3
00:00:13,320 --> 00:00:14,490
Python Part six.

4
00:00:15,470 --> 00:00:19,070
In this course, we look at a very important concepts called the word embeddings.

5
00:00:19,640 --> 00:00:25,280
These allow us to turn words which are categorical variables into vectors, which are numbers that a

6
00:00:25,280 --> 00:00:26,390
neural network can read.

7
00:00:27,230 --> 00:00:31,990
This is why it's very important to have studied unsupervised learning before this, because finding

8
00:00:32,000 --> 00:00:34,640
word embeddings is actually an unsupervised task.

9
00:00:35,330 --> 00:00:39,950
Word embeddings also allow us to make use of pre-training, which was discussed in deep learning in

10
00:00:39,950 --> 00:00:41,000
Python Part four.

11
00:00:42,020 --> 00:00:47,480
In this course, we also look at a very advanced model for doing sentiment analysis called a recursive

12
00:00:47,480 --> 00:00:49,430
neural network or a tree neural network.

13
00:00:50,180 --> 00:00:55,550
This is an example of a dynamic neural network because it changes its structure based on what input

14
00:00:55,550 --> 00:00:56,060
you give it.

15
00:00:56,900 --> 00:01:01,820
Most deep learning libraries are not equipped to handle dynamic neural networks, and I demonstrate

16
00:01:02,240 --> 00:01:07,370
what happens if you try to build one the naive way you basically and the building a separate neural

17
00:01:07,370 --> 00:01:12,200
network for each of your training samples, and it's going to eat up all your RAM and make your computer

18
00:01:12,200 --> 00:01:13,340
slow down to a crawl.

19
00:01:14,180 --> 00:01:19,280
So you get a firsthand perspective of why working with dynamic neural networks is not easy.

20
00:01:20,030 --> 00:01:24,430
But luckily, we are able to make use of our knowledge of recurrent neural networks.

21
00:01:24,440 --> 00:01:30,830
And remember, that's the prerequisite to this course in order to convert a tree into a sequence which

22
00:01:30,830 --> 00:01:32,810
Noonan is capable of handling.

23
00:01:33,380 --> 00:01:36,320
So that's why CNN's is a prerequisite to this course.

24
00:01:38,280 --> 00:01:42,450
Now you'll notice that this course deep in OP, has no outgoing edges.

25
00:01:42,480 --> 00:01:45,750
This is because it's the most advanced course that I have on this path.

26
00:01:45,840 --> 00:01:49,740
For now, I certainly plan on expanding in this direction in the future.

27
00:01:50,490 --> 00:01:54,150
However, this is not the last deep learning course in the series.

28
00:01:54,810 --> 00:01:59,910
After this, we have deep reinforcement learning, which is deep learning in Python Part seven.

29
00:02:00,660 --> 00:02:05,400
With that said, I think now is a good time to go back and start exploring the reinforcement learning

30
00:02:05,400 --> 00:02:05,760
path.

31
00:02:06,990 --> 00:02:11,190
So you'll notice that I have Bayesian machine learning feeding into reinforcement learning.

32
00:02:12,610 --> 00:02:17,710
On the surface, these two courses might seem unrelated, but there is a very important concept you'll

33
00:02:17,710 --> 00:02:23,320
learn that applies to both called the Explore exploit dilemma in Bayesian machine learning.

34
00:02:23,330 --> 00:02:28,450
You learn this idea in the context of trying to optimize it, click through rate or a conversion rate,

35
00:02:28,990 --> 00:02:33,820
or, in other words, the number of times people buy things from your website versus the number of times

36
00:02:33,820 --> 00:02:35,170
people visit at your web site.

37
00:02:35,650 --> 00:02:36,850
Very practical concept.

38
00:02:36,850 --> 00:02:43,000
I think if you do anything related to e-commerce now in reinforcement learning, we look at the Explore

39
00:02:43,000 --> 00:02:44,370
exploit dilemma again.

40
00:02:44,380 --> 00:02:49,990
But in the context of playing games, reinforcement learning is like a third branch of machine learning,

41
00:02:49,990 --> 00:02:53,590
whereas the other two are supervised and unsupervised learning.

42
00:02:54,220 --> 00:02:59,980
The main difference is that supervised and unsupervised learning look at static data and reinforcement

43
00:02:59,980 --> 00:03:00,340
learning.

44
00:03:00,350 --> 00:03:04,270
The idea is more like you have a robot living in the real world.

45
00:03:04,540 --> 00:03:10,330
It can take the experiences it had today and based on them, behave more intelligently tomorrow.

46
00:03:10,840 --> 00:03:14,020
So the learning paradigm in reinforcement learning is sequential.

47
00:03:14,590 --> 00:03:20,260
This is opposed to supervised, unsupervised learning, where your dataset usually resides in some file

48
00:03:20,260 --> 00:03:21,190
on your hard drive.

49
00:03:22,370 --> 00:03:26,480
So in reinforcement learning, part one, we get all the basics, as you might expect.

50
00:03:27,020 --> 00:03:32,510
This is a prerequisite to deep reinforcement learning, since deep reinforcement learning applies those

51
00:03:32,510 --> 00:03:34,430
concepts to more difficult games.

52
00:03:35,210 --> 00:03:40,790
But you'll notice that this is not the only prerequisite to deep reinforcement learning, as the title

53
00:03:40,790 --> 00:03:41,400
suggests.

54
00:03:41,420 --> 00:03:47,690
This is also dependent on knowledge of deep learning, in particular convolutional neural networks.

55
00:03:49,550 --> 00:03:54,920
But as we know, in order to build a CNN, we have to know how to build a regular neural network, which

56
00:03:54,920 --> 00:03:58,370
means we have to know what a neural network is and why it's useful and so on.

57
00:03:58,970 --> 00:04:02,300
So in deep reinforcement learning, these two paths converge.

58
00:04:02,750 --> 00:04:07,580
You combine your knowledge of both reinforcement learning and deep learning in this course.

59
00:04:08,920 --> 00:04:16,540
Now, the reason it depends on CNN's and not say aren't ends, which is over here is because we'll be

60
00:04:16,540 --> 00:04:18,230
learning to play visual games.

61
00:04:18,250 --> 00:04:24,130
So for example, we can learn how to play a video game like Pong, a breakout which are classic Atari

62
00:04:24,130 --> 00:04:25,300
games from the old days.

63
00:04:25,690 --> 00:04:29,830
And so those are images because they're basically screenshots from the screen.

64
00:04:30,790 --> 00:04:36,550
In the future, we might end up applying Ardennes, in which case Ardennes will become a prerequisite

65
00:04:36,550 --> 00:04:37,270
to that course.

66
00:04:39,080 --> 00:04:41,900
So that's the end of the reinforcement learning path for now.

67
00:04:42,140 --> 00:04:45,860
I'm very excited to bring you more updates in this area in the future.

68
00:04:48,310 --> 00:04:50,850
Let's now jump back to logistic regression.

69
00:04:52,020 --> 00:04:56,070
Where we can see another outgoing age to supervised machine learning.

70
00:04:57,270 --> 00:04:58,620
So why is this edge here?

71
00:04:59,490 --> 00:05:04,590
Well, you may recall that linear regression and logistic regression are both linear models that do

72
00:05:04,590 --> 00:05:06,870
regression and classification, respectively.

73
00:05:07,530 --> 00:05:09,660
These are both supervised learning tasks.

74
00:05:11,760 --> 00:05:17,100
And so it makes sense that now that you know, one model for regression and one model for classification,

75
00:05:17,580 --> 00:05:20,070
it's time to dig deeper into supervised learning.

76
00:05:20,910 --> 00:05:25,700
The thing with linear regression and logistic regression is that they aren't really different models.

77
00:05:25,710 --> 00:05:29,100
They are both just the line because they do different tasks.

78
00:05:29,100 --> 00:05:32,040
The techniques and interpretations are slightly different, though.

79
00:05:33,150 --> 00:05:37,380
And there are, of course, different models that are not linear models that can do these tasks, and

80
00:05:37,380 --> 00:05:39,360
that's what this course is all about.

81
00:05:40,140 --> 00:05:45,720
In this course, we look at classic supervised machine learning techniques like K Nearest Neighbor Decision

82
00:05:45,720 --> 00:05:48,540
Trees, the perceptron and the Bayes classifier.

83
00:05:49,530 --> 00:05:53,730
Much like how logistic regression was the basic building block of the neural network.

84
00:05:54,210 --> 00:05:58,770
Classic models like decision trees are the basic building block of ensemble methods.

85
00:05:59,810 --> 00:06:04,250
So that's why this course is a prerequisite to ensemble machine learning.

86
00:06:04,940 --> 00:06:10,970
Again, we use the same logo with a different color to signify that these two courses are very closely

87
00:06:10,970 --> 00:06:17,060
related in ensemble machine learning, we learn how to combine multiple decision trees in different

88
00:06:17,060 --> 00:06:20,480
ways in order to make some very powerful classifiers.

89
00:06:21,260 --> 00:06:26,180
What's really remarkable about these methods is that they are very easy to plug and play on data.

90
00:06:26,660 --> 00:06:31,820
So if you're looking for a plug and play solution without having to learn a lot of theory, then deep

91
00:06:31,820 --> 00:06:33,800
learning is most likely not for you.

92
00:06:34,190 --> 00:06:36,110
But ensemble methods are a great fit.

93
00:06:37,060 --> 00:06:41,620
Deep learning is very dependent on hyper parameters, and if you choose incorrectly, your model will

94
00:06:41,620 --> 00:06:42,760
perform very poorly.

95
00:06:43,450 --> 00:06:47,350
Sometimes it requires immense computing power to find good hyper parameters.

96
00:06:47,890 --> 00:06:50,800
This is an active area of research it has not yet solved.

97
00:06:51,670 --> 00:06:55,180
This is why you can implement what you see in a deep learning paper.

98
00:06:55,240 --> 00:07:00,730
But suppose the author left out some seemingly insignificant detail, so you end up having to make an

99
00:07:00,730 --> 00:07:03,400
assumption and then your results end up totally different.

100
00:07:03,850 --> 00:07:08,050
So deep learning is fragile, but luckily ensemble methods are not.

101
00:07:08,770 --> 00:07:15,370
We focus on two very famous ensemble methods the random forests and Ada Boost, so that's everything

102
00:07:15,370 --> 00:07:17,770
on these supervised machine learning track for now.

103
00:07:19,060 --> 00:07:25,690
Next, we see an edge going from supervised machine learning to unsupervised machine learning, in particular

104
00:07:25,690 --> 00:07:26,830
cluster analysis.

105
00:07:27,700 --> 00:07:32,920
The reason we study supervised learning before unsupervised learning is because unsupervised learning

106
00:07:32,920 --> 00:07:34,300
is a little more abstract.

107
00:07:34,960 --> 00:07:40,270
It takes more effort on the student's part to realize why it's practical and what it can be used for.

108
00:07:41,290 --> 00:07:47,380
Cluster analysis shows us how to model data that does not come with targets, as you might guess, we

109
00:07:47,380 --> 00:07:49,120
do this in the form of clustering.

110
00:07:49,900 --> 00:07:52,060
The idea behind clustering is very simple.

111
00:07:52,600 --> 00:07:57,550
We want to know how many naturally occurring groups of data there are and what are the relationships

112
00:07:57,550 --> 00:07:59,350
between the data in these clusters.

113
00:07:59,800 --> 00:08:05,140
So for example, if you were clustering books, you might find a book about Steve Jobs and a book about

114
00:08:05,140 --> 00:08:06,790
Elon Musk in the same cluster.

115
00:08:07,360 --> 00:08:10,660
This cluster is probably about tech companies in Silicon Valley.

116
00:08:12,610 --> 00:08:17,590
But you don't need a label in your data to tell you that you can discover it yourself by looking at

117
00:08:17,920 --> 00:08:19,810
how the data naturally groups together.

118
00:08:21,220 --> 00:08:26,410
Now, as I mentioned earlier, I think once you learn about both supervised and unsupervised learning,

119
00:08:26,710 --> 00:08:29,260
you'll be ready to jump into reinforcement learning.

120
00:08:30,440 --> 00:08:35,750
I haven't made cluster analysis a prerequisite to reinforcement learning since none of the material

121
00:08:35,750 --> 00:08:40,820
depends on this course, but it's good to know about these techniques so that you have a more mature

122
00:08:40,820 --> 00:08:42,740
and experienced view on machine learning.

123
00:08:46,320 --> 00:08:50,640
What is sort of a sequel to cluster analysis is hidden Markov models.

124
00:08:51,880 --> 00:08:55,300
The reason might not be clearer first, so let me give you two reasons.

125
00:08:56,530 --> 00:08:57,220
Number one.

126
00:08:57,400 --> 00:09:03,130
They're both unsupervised machine learning models, just that H.M. Ms is harder, so it's natural to

127
00:09:03,130 --> 00:09:04,510
learn about clustering first.

128
00:09:05,020 --> 00:09:10,810
Clustering is also about static data, whereas humans are about sequences, so it's similar to the process

129
00:09:10,810 --> 00:09:11,830
we did in deep learning.

130
00:09:11,950 --> 00:09:16,150
We looked at static data like images, then sequential data like text.

131
00:09:17,760 --> 00:09:23,310
Reason number two, in cluster analysis, we learn about a technique called the Gaussian mixture model,

132
00:09:23,400 --> 00:09:25,800
which we make use of in the emem course.

133
00:09:26,550 --> 00:09:33,470
One key point is they both learn by using the expectation maximization algorithm, so it's good to first

134
00:09:33,480 --> 00:09:35,600
see the EMR algorithm on a simple model.

135
00:09:35,610 --> 00:09:41,610
And then when you see em again on a more complicated model like the hmm, it won't be as intimidating.

136
00:09:43,130 --> 00:09:48,920
One key concept you learn in your members is the mark of assumption that just means the current state

137
00:09:48,920 --> 00:09:52,700
depends only on the previous state, but not any states before it.

138
00:09:53,480 --> 00:09:57,380
This is a simplifying assumption that usually makes the math easier to work with.

139
00:09:58,310 --> 00:10:02,520
You will also notice that we encounter the mark of assumption in reinforcement learning.

140
00:10:02,540 --> 00:10:05,360
However, it's not too hard to learn it from scratch.

141
00:10:05,690 --> 00:10:11,000
And so for that reason, I do not consider humans to be a prerequisite to reinforcement learning.

142
00:10:11,690 --> 00:10:14,570
The mark of assumption is really the only thing they have in common.

143
00:10:15,560 --> 00:10:22,190
There is also a slight connection between cluster analysis and unsupervised deep learning, so I'm not

144
00:10:22,190 --> 00:10:27,800
going to draw the link right now, but I sometimes consider this to be unsupervised machine learning

145
00:10:27,800 --> 00:10:31,370
part one and this to be unsupervised machine learning part two.

146
00:10:33,390 --> 00:10:38,340
We also see that each imams feeds into the renowned course, which is about deep learning.

147
00:10:38,610 --> 00:10:39,660
So why might that be?

148
00:10:40,560 --> 00:10:46,110
This is, of course, because both these models are models that can learn about sequences in particular.

149
00:10:46,590 --> 00:10:49,350
In both these courses, we model text as sequences.

150
00:10:50,040 --> 00:10:56,190
But whereas the HMO makes use of the mark of assumption, the Arnon does not enhance the Arnon is a

151
00:10:56,190 --> 00:10:57,300
more powerful model.

152
00:10:58,300 --> 00:11:02,740
And so this just goes along with the main theme that we always go from simple, basic models to more

153
00:11:02,740 --> 00:11:03,730
complex models.

154
00:11:04,390 --> 00:11:06,730
This is also something you should do in your work as well.

155
00:11:07,330 --> 00:11:11,500
If you start with a simple model, you often find that it is faster and more robust.

156
00:11:11,890 --> 00:11:17,410
Complex models sometimes break down, but they are also more difficult to implement and might not even

157
00:11:17,410 --> 00:11:18,280
be fast enough.

158
00:11:18,700 --> 00:11:23,740
Of course, that's just a generalization, so you always want to analyze engineering tradeoffs individually

159
00:11:24,040 --> 00:11:25,260
for every problem you have.

160
00:11:27,410 --> 00:11:30,890
Now there is one last link in this part of the graph here that I want to explain.

161
00:11:31,310 --> 00:11:36,890
And that's the first, you know, because you can see that it depends on supervised machine learning

162
00:11:36,890 --> 00:11:38,840
and feeds into deep NLP.

163
00:11:40,220 --> 00:11:45,980
The main purpose of this basic NLP course is to apply basic machine learning models to text.

164
00:11:46,520 --> 00:11:51,410
So that's why supervised machine learning comes before it is because that this course was all about

165
00:11:51,680 --> 00:11:53,180
basic machine learning models.

166
00:11:54,110 --> 00:12:00,020
The important skills for NLP was not the implementation of those models, but rather a bigger picture

167
00:12:00,020 --> 00:12:02,450
perspective on how machine learning is used.

168
00:12:03,140 --> 00:12:05,660
What is the interface between the data in the model?

169
00:12:06,080 --> 00:12:07,190
What does the model do?

170
00:12:07,790 --> 00:12:09,380
What is its input and output?

171
00:12:09,560 --> 00:12:10,910
How is the output interpreted?

172
00:12:11,720 --> 00:12:17,540
And so we take those principles and we apply them to text, and this way we can see that text can be

173
00:12:17,540 --> 00:12:23,000
created in such a way that you don't have to think about it any differently than any other data.

174
00:12:24,310 --> 00:12:27,280
This reinforces the principle that all data is the same.

175
00:12:27,910 --> 00:12:30,730
The machine learning model doesn't care what your data is.

176
00:12:30,940 --> 00:12:32,770
All it sees is a table of numbers.

177
00:12:32,770 --> 00:12:36,730
It doesn't care if it's text or images or radio signals from space.

178
00:12:37,270 --> 00:12:41,140
The model just does what it was designed to do on the numbers that you give it.

179
00:12:41,800 --> 00:12:46,900
So this course gives you a high level systems perspective on working with machine learning models in

180
00:12:46,900 --> 00:12:47,440
text.

181
00:12:48,280 --> 00:12:54,400
This easy NLP course also feeds into deep NLP, which is, of course, not so easy because it depends

182
00:12:54,400 --> 00:12:56,110
on a lot of background in deep learning.

183
00:12:57,200 --> 00:13:03,230
One of the main questions I get in the NLP course is how do I improve the results of these basic models?

184
00:13:03,860 --> 00:13:08,300
And a lot of the time the answer to that is, well, you have to use a more complex model.

185
00:13:08,300 --> 00:13:12,590
But of course, that necessitates learning how that complex model works.

186
00:13:13,490 --> 00:13:19,430
And deep NLP is an example of that because we learn a state of the art method for sentiment analysis,

187
00:13:19,430 --> 00:13:22,670
whereas in Easy A.P., we used only a linear model.

188
00:13:23,360 --> 00:13:28,550
So it's important to understand that while yes, it's possible to improve the predictive ability on

189
00:13:28,550 --> 00:13:33,140
simple, basic models, as you can see, it's not always an easy path to get there.

190
00:13:33,620 --> 00:13:35,510
So you have to make sure you're prepared.

191
00:13:36,200 --> 00:13:40,370
Case in point, just look at all the time spent just to get to deep in LP.

192
00:13:40,550 --> 00:13:41,780
It's not an easy task.

193
00:13:43,930 --> 00:13:47,500
Let's now scroll over to ganze and variational auto encoders.

194
00:13:48,280 --> 00:13:54,040
Just like how deep reinforcement learning is not related to deep in LP, this isn't really related to

195
00:13:54,040 --> 00:13:55,510
deep reinforcement learning either.

196
00:13:56,140 --> 00:13:59,620
This is deep learning part eight by order of creation only.

197
00:14:00,190 --> 00:14:06,280
It is the spiritual sequel to Unsupervised Deep Learning, which was deep learning in Python Part four.

198
00:14:07,620 --> 00:14:12,960
OK, so just to reiterate, this is part six, part seven and part eight.

199
00:14:13,110 --> 00:14:19,140
By order only, but they are not related to each other conceptually, although it's always nice to know

200
00:14:19,140 --> 00:14:22,830
all these things because more context makes future things easier to learn.

201
00:14:24,230 --> 00:14:32,660
So the reason this is linked to this is because Gans and variational auto and coders are also unsupervised

202
00:14:32,660 --> 00:14:33,650
deep learning models.

203
00:14:34,520 --> 00:14:40,820
But whereas unsupervised deep learning was all about how to improve supervised learning, Gans and variational

204
00:14:40,820 --> 00:14:44,990
auto encoders don't have any direct benefit to supervised learning at all.

205
00:14:45,470 --> 00:14:51,350
Although we do make use of supervised learning within the course, in this course, the focus is on

206
00:14:51,350 --> 00:14:52,640
generating images.

207
00:14:53,420 --> 00:14:58,940
We've seen that Gans can create photorealistic images based on a dual neural network system.

208
00:14:59,630 --> 00:15:03,920
That's pretty cool because before Gans, we didn't have any kind of machine learning model that could

209
00:15:03,920 --> 00:15:05,930
generate real looking images.

210
00:15:06,680 --> 00:15:12,800
Nowadays, Gans are able to generate high resolution, high quality images of people that you can't

211
00:15:12,800 --> 00:15:14,330
even tell are not real people.

212
00:15:14,870 --> 00:15:18,110
It certainly makes the idea of the Matrix seem very possible.

213
00:15:19,830 --> 00:15:22,080
All right, so I hope you found this lecture helpful.

214
00:15:22,620 --> 00:15:27,000
We saw that these courses are related to each other in some pretty complicated ways.

215
00:15:27,840 --> 00:15:30,270
Learning machine learning is not exactly linear.

216
00:15:30,780 --> 00:15:34,110
Sometimes you have to take one course before you can take the next.

217
00:15:34,740 --> 00:15:38,760
Sometimes one course might just be related to another course by some key concepts.

218
00:15:39,120 --> 00:15:42,120
But maybe in one context, it's a lot easier to understand.

219
00:15:42,900 --> 00:15:48,570
So remember to keep in mind that these arrows did not all indicate strong prerequisites, but rather

220
00:15:48,570 --> 00:15:51,390
there's just a relationship between the two courses.

221
00:15:52,140 --> 00:15:57,030
I hope that this lecture explained any nuances between what is a prerequisite and what is not.

222
00:15:57,420 --> 00:16:02,130
And I hope I did a good job of answering which order should you take these courses in and why?

