1
00:00:02,200 --> 00:00:07,990
Once we've studied earnings we can go even deeper into how deep learning can be used on text as the

2
00:00:07,990 --> 00:00:13,770
next course natural language processing with deep learning in Python or deep learning in Python.

3
00:00:13,770 --> 00:00:16,180
Part 6 in this course.

4
00:00:16,190 --> 00:00:19,550
We look at a very important concept called a word in buildings.

5
00:00:19,670 --> 00:00:25,580
These allow us to turn words which are categorical variables into vectors which are numbers that a neuron

6
00:00:25,580 --> 00:00:27,290
that we can read.

7
00:00:27,290 --> 00:00:31,040
This is why it's very important to have studied unsupervised learning before this.

8
00:00:31,040 --> 00:00:35,190
Because finding word embedding is actually an unsupervised task.

9
00:00:35,430 --> 00:00:40,400
Word embedding also allow us to make use of pre training which was discussed in deep learning in Python.

10
00:00:40,400 --> 00:00:47,500
Part 4 in this course we also look at a very advanced model for doing sentiment analysis called a recursive

11
00:00:47,500 --> 00:00:50,250
neural network or a tree neural network.

12
00:00:50,260 --> 00:00:55,570
This is an example of a dynamic neural network because it changes its structure based on what input

13
00:00:55,570 --> 00:00:56,950
you give it.

14
00:00:56,950 --> 00:01:02,470
Most Deep Learning libraries are not equipped to handle dynamic neural networks and I demonstrate what

15
00:01:02,470 --> 00:01:05,150
happens if you try to build one the naive way.

16
00:01:05,290 --> 00:01:09,760
You basically end up building a separate neural network for each of your training samples and it's going

17
00:01:09,760 --> 00:01:14,230
to eat up all your RAM and make your computer slow down to a crawl.

18
00:01:14,230 --> 00:01:19,570
So you get a firsthand perspective of why working with dynamic neural networks is not easy.

19
00:01:20,080 --> 00:01:25,240
But luckily we are able to make use of our knowledge of recurrent neural networks and remember that's

20
00:01:25,240 --> 00:01:32,110
the prerequisite to this course in order to convert a tree into a sequence which in Arnon is capable

21
00:01:32,110 --> 00:01:32,850
of handling.

22
00:01:33,430 --> 00:01:41,170
So that's why ends is a prerequisite to this course now you'll notice that this course deep in OPI has

23
00:01:41,170 --> 00:01:42,520
no outgoing edges.

24
00:01:42,520 --> 00:01:45,860
This is because it's the most advanced course that I have on this path.

25
00:01:45,880 --> 00:01:50,530
For now I certainly plan on expanding in this direction in the future.

26
00:01:50,530 --> 00:01:54,910
However this is not the last deep learning course in the series.

27
00:01:54,910 --> 00:02:00,760
After this we have deep reinforcement learning which is deep learning in Python part 7.

28
00:02:00,760 --> 00:02:05,410
With that said I think now is a good time to go back and start exploring the reinforcement learning

29
00:02:05,410 --> 00:02:12,880
path so you'll notice that I have Bayesian machine learning feeding into reinforcement learning on the

30
00:02:12,880 --> 00:02:13,450
surface.

31
00:02:13,450 --> 00:02:18,670
These two courses might seem unrelated but there is a very important concept you'll learn that applies

32
00:02:18,670 --> 00:02:23,340
to both called the explore exploit dilemma in Bayesian machine learning.

33
00:02:23,350 --> 00:02:28,450
You'll learn this idea in the context of trying to optimize a click through rate or conversion rate

34
00:02:29,050 --> 00:02:33,820
or in other words the number of times people buy things from your website versus the number of times

35
00:02:33,820 --> 00:02:35,700
people visit at your Web site.

36
00:02:35,710 --> 00:02:36,880
Very practical concept.

37
00:02:36,880 --> 00:02:43,020
I think if you do anything related to e-commerce now in reinforcement learning we look at the exploit

38
00:02:43,020 --> 00:02:44,300
exploit dilemma again.

39
00:02:44,410 --> 00:02:49,960
But in the context of playing games reinforcement learning is like a third branch of machine learning

40
00:02:50,020 --> 00:02:54,310
whereas the other two are supervised and unsupervised learning.

41
00:02:54,310 --> 00:03:00,010
The main difference is that supervised and unsupervised learning look at static data and reinforcement

42
00:03:00,010 --> 00:03:00,370
learning.

43
00:03:00,370 --> 00:03:04,480
The idea is more like you have a robot living in the real world.

44
00:03:04,600 --> 00:03:10,900
It can take the experiences it had today and based on them behave more intelligently tomorrow.

45
00:03:10,900 --> 00:03:14,620
So the learning paradigm and reinforcement learning is sequential.

46
00:03:14,650 --> 00:03:20,290
This is opposed to supervised and unsupervised learning where your dataset usually resides in some file

47
00:03:20,290 --> 00:03:27,010
on your harddrive so in reinforcement learning part 1 We get all the basics as you might expect.

48
00:03:27,060 --> 00:03:32,520
This is a prerequisite to deep reinforcement learning since deep reinforcement learning applies those

49
00:03:32,520 --> 00:03:38,850
concepts to more difficult games but you'll notice that this is not the only prerequisite to deep reinforcement

50
00:03:38,850 --> 00:03:41,480
learning as the title suggests.

51
00:03:41,490 --> 00:03:49,780
This is also dependent on knowledge of deep learning in particular convolution or neural networks but

52
00:03:49,780 --> 00:03:55,240
as we know in order to build a CNN we have to know how to build a regular neural network which means

53
00:03:55,240 --> 00:03:59,020
we have to know what a neural network is and why it's useful and so on.

54
00:03:59,020 --> 00:04:05,530
So in deep reinforcement learning these two paths converge you combine your knowledge of both reinforcement

55
00:04:05,530 --> 00:04:08,970
learning and deep learning in this course.

56
00:04:08,990 --> 00:04:16,880
Now the reason it depends on CNN and not say aren't ends which is over here is because we'll be learning

57
00:04:16,880 --> 00:04:18,290
to play visual games.

58
00:04:18,290 --> 00:04:24,440
So for example we can learn how to play a video game like Pong a breakout which are classic Atari games

59
00:04:24,440 --> 00:04:25,700
from the old days.

60
00:04:25,820 --> 00:04:30,380
And so those are images because they're basically screenshots from the screen.

61
00:04:30,860 --> 00:04:36,530
In the future we might end up applying art ends in which case our own ends will become a prerequisite

62
00:04:36,530 --> 00:04:41,390
to that course so that's the end of the reinforcement learning path.

63
00:04:41,390 --> 00:04:49,570
For now I'm very excited to bring you more updates in this area in the future Let's now jump back to

64
00:04:49,640 --> 00:04:57,070
logistic regression where we can see another outgoing age to supervise machine learning.

65
00:04:57,340 --> 00:04:58,690
So why is this edge here.

66
00:04:59,590 --> 00:05:05,140
Well you may recall that linear regression and logistic regression are both linear models that do regression

67
00:05:05,140 --> 00:05:07,600
and classification respectively.

68
00:05:07,600 --> 00:05:11,610
These are both supervised learning tasks.

69
00:05:11,830 --> 00:05:17,140
And so it makes sense that now that you know one model for regression and one model for classification

70
00:05:17,650 --> 00:05:20,910
it's time to dig deeper into supervised learning.

71
00:05:20,980 --> 00:05:25,720
The thing with linear regression and logistic regression is that they aren't really different models.

72
00:05:25,720 --> 00:05:31,450
They are both just the line because they do different tests the techniques and interpretations are slightly

73
00:05:31,450 --> 00:05:36,750
different though and there are of course different models that are not linear models that can do these

74
00:05:36,750 --> 00:05:37,210
tasks.

75
00:05:37,230 --> 00:05:40,880
And that's what this course is all about in this course.

76
00:05:40,890 --> 00:05:47,190
We look at classic supervised machine learning techniques like Kenya's neighbor decision trees the perception

77
00:05:47,190 --> 00:05:53,240
and the base classifier much like how logistic regression was the basic building block of the neural

78
00:05:53,240 --> 00:05:54,090
network.

79
00:05:54,230 --> 00:06:00,650
Classic models like decision trees are the basic building block of ensemble methods so that's why this

80
00:06:00,650 --> 00:06:05,020
course is a prerequisite to ensemble machine learning.

81
00:06:05,030 --> 00:06:10,970
Again we use the same logo with a different color to signify that these two courses are very closely

82
00:06:10,970 --> 00:06:14,100
related in ensemble machine learning.

83
00:06:14,120 --> 00:06:19,640
We learn how to combine multiple decision trees in different ways in order to make some very powerful

84
00:06:19,640 --> 00:06:21,320
classifiers.

85
00:06:21,350 --> 00:06:26,660
What's really remarkable about these methods is that they are very easy to plug and play on data.

86
00:06:26,720 --> 00:06:31,820
So if you're looking for a plug and play solution without having to learn a lot of theory then deep

87
00:06:31,820 --> 00:06:38,500
learning is most likely not for you but ensemble methods are a great fit deep learning is very dependent

88
00:06:38,500 --> 00:06:39,630
on hyper parameters.

89
00:06:39,640 --> 00:06:42,790
And if you choose incorrectly your model will perform very poorly.

90
00:06:43,480 --> 00:06:47,950
Sometimes it requires immense computing power to find good hyper parameters.

91
00:06:47,950 --> 00:06:49,750
This is an active area of research.

92
00:06:49,750 --> 00:06:51,700
It has not yet solved.

93
00:06:51,700 --> 00:06:55,280
This is why you can implement what you see in a deplaning paper.

94
00:06:55,300 --> 00:06:59,230
But suppose the author left out some seemingly insignificant detail.

95
00:06:59,560 --> 00:07:03,910
So you end up having to make an assumption and then your results end up totally different.

96
00:07:03,910 --> 00:07:05,920
So deep learning is fragile.

97
00:07:05,920 --> 00:07:08,550
But luckily ensemble methods are not.

98
00:07:08,830 --> 00:07:14,440
We focus on two very famous ensemble methods be random forest and a boost.

99
00:07:14,440 --> 00:07:17,220
So that's everything on these supervised machine learning track.

100
00:07:17,230 --> 00:07:24,590
For now Next we see an edge going from supervised machine learning to unsupervised machine learning

101
00:07:25,010 --> 00:07:27,720
in particular cluster analysis.

102
00:07:27,770 --> 00:07:32,900
The reason we study supervised learning before unsupervised learning is because unsupervised learning

103
00:07:32,960 --> 00:07:34,970
is a little more abstract.

104
00:07:35,030 --> 00:07:40,280
It takes more effort on the student's part to realize why it's practical and what it can be used for

105
00:07:41,350 --> 00:07:47,230
cluster analysis shows us how to model data that does not come with targets as you might guess.

106
00:07:47,230 --> 00:07:52,540
We do this in the form of clustering the idea behind clustering is very simple.

107
00:07:52,690 --> 00:07:57,550
We want to know how many naturally occurring groups of data there are and what are the relationships

108
00:07:57,550 --> 00:07:59,830
between the data in these clusters.

109
00:07:59,860 --> 00:08:05,170
So for example if you were clustering books you might find a book about Steve Jobs and a book about

110
00:08:05,170 --> 00:08:07,170
Elon Musk in the same cluster.

111
00:08:07,420 --> 00:08:14,480
This cluster is probably about tech companies in Silicon Valley but you don't need a label in your data

112
00:08:14,480 --> 00:08:21,440
to tell you them you can discover it yourself by looking at how the data naturally groups together now

113
00:08:21,440 --> 00:08:26,900
as I mentioned earlier I think once you learn about both supervised and unsupervised learning you'll

114
00:08:26,900 --> 00:08:30,460
be ready to jump into reinforcement learning.

115
00:08:30,520 --> 00:08:35,770
I haven't made cluster analysis a prerequisite to reinforcement learning since none of the material

116
00:08:35,770 --> 00:08:37,140
depends on this course.

117
00:08:37,270 --> 00:08:42,040
But it's good to know about these techniques so that you have a more mature and experienced view on

118
00:08:42,040 --> 00:08:42,780
machine learning.

119
00:08:46,390 --> 00:08:51,920
What is sort of a sequel to cluster analysis is Hidden Markov models.

120
00:08:51,930 --> 00:08:53,620
The reason might not be clearer first.

121
00:08:53,640 --> 00:08:56,590
So let me give you two reasons.

122
00:08:56,600 --> 00:09:01,910
Number one they're both unsupervised machine learning models just that H.M. MS is harder.

123
00:09:02,180 --> 00:09:07,720
So it's natural to learn about clustering first clustering is also about static data whereas a.k.a.

124
00:09:07,730 --> 00:09:09,470
memes are about sequences.

125
00:09:09,470 --> 00:09:11,960
So it's similar to the process we did in deep learning.

126
00:09:12,050 --> 00:09:20,550
We looked at static data like images then sequential data like text reason number two in cluster analysis.

127
00:09:20,550 --> 00:09:26,390
We learn about a technique called the Ghazi and mixer model which we make use of in the H M M course.

128
00:09:26,640 --> 00:09:32,550
One key point is they both learn by using the expectation maximization algorithm.

129
00:09:32,550 --> 00:09:37,770
So it's good to first see the Eon algorithm on a simple model and then when you see them again on a

130
00:09:37,770 --> 00:09:44,850
more complicated model like the H M M it won't be as intimidating one key concept you learned in eight

131
00:09:44,860 --> 00:09:50,720
members is the mark of assumption that just means the current state depends only on the previous state

132
00:09:50,750 --> 00:09:52,850
but not any state before.

133
00:09:53,540 --> 00:09:59,400
This is a simplifying assumption that usually makes the math easier to work with you will also notice

134
00:09:59,400 --> 00:10:02,520
that we encounter the Markov assumption in reinforcement learning.

135
00:10:02,580 --> 00:10:05,760
However it's not too hard to learn it from scratch.

136
00:10:05,760 --> 00:10:11,850
And so for that reason I do not consider H amounts to be a prerequisite to reinforcement learning the

137
00:10:11,850 --> 00:10:15,620
Markov assumption is really the only thing they have in common.

138
00:10:15,620 --> 00:10:21,110
There is also a slight connection between cluster analysis and unsupervised deep learning.

139
00:10:21,110 --> 00:10:27,800
So I'm not going to draw the link right now but I sometimes consider this to be unsupervised machining

140
00:10:27,800 --> 00:10:33,400
part one and this to be unsupervised machine learning part 2.

141
00:10:33,490 --> 00:10:39,400
We also see that each imams feeds into the aunt course which is about deep learning so why might that

142
00:10:39,400 --> 00:10:39,700
be.

143
00:10:40,630 --> 00:10:46,120
This is of course because both these models are models that can learn about sequences in particular

144
00:10:46,690 --> 00:10:53,050
in both these courses we model text as sequences but whereas the H M M makes use of the mark of assumption

145
00:10:53,260 --> 00:11:00,240
the RNA does not enhance the N is a more powerful model and so this just goes along with the main theme

146
00:11:00,240 --> 00:11:04,470
that we always go from simple basic models to more complex models.

147
00:11:04,470 --> 00:11:07,280
This is also something you should do in your work as well.

148
00:11:07,410 --> 00:11:13,410
If you start with a simple model you often find that it is faster and more robust complex models sometimes

149
00:11:13,720 --> 00:11:18,740
break down but they are also more difficult to implement and might not even be fast enough.

150
00:11:18,780 --> 00:11:20,660
Of course that's just a generalization.

151
00:11:20,670 --> 00:11:25,420
So you always want to analyze engineering tradeoffs individually for every problem you have.

152
00:11:27,320 --> 00:11:31,260
But there is one last link in this part of the graph here that I want to explain.

153
00:11:31,410 --> 00:11:37,910
And that's the first because you can see that it depends on supervised machine learning and feeds into

154
00:11:38,010 --> 00:11:40,140
deep end up.

155
00:11:40,280 --> 00:11:46,580
The main purpose of this basic an AP course is to apply basic machine learning models to text.

156
00:11:46,580 --> 00:11:51,440
So that's why supervised machine learning comes before it is because that this course was all about

157
00:11:51,740 --> 00:11:54,000
basic machine learning models.

158
00:11:54,170 --> 00:12:00,770
The important skill for MLP was not the implementation of those models but rather a bigger picture perspective

159
00:12:00,800 --> 00:12:03,230
on how machine learning is used.

160
00:12:03,230 --> 00:12:05,980
What is the interface between the data and the model.

161
00:12:06,170 --> 00:12:07,870
What does the model do.

162
00:12:07,880 --> 00:12:09,610
What is its input an output.

163
00:12:09,620 --> 00:12:11,730
How is the output interpreted.

164
00:12:11,810 --> 00:12:17,960
And so we take those principles and we apply them to text in this way we can see that text can be treated

165
00:12:17,960 --> 00:12:25,190
in such a way that you don't have to think about it any differently than any other data this reinforces

166
00:12:25,190 --> 00:12:27,940
the principle that all data is the same.

167
00:12:27,940 --> 00:12:30,950
The machine learning model doesn't care what your data is.

168
00:12:31,010 --> 00:12:32,800
All it sees is a table of numbers.

169
00:12:32,810 --> 00:12:37,240
It doesn't care if it's text or images or radar signals from space.

170
00:12:37,340 --> 00:12:41,840
The model just does what it was designed to do on the numbers that you give it.

171
00:12:41,840 --> 00:12:46,910
So this course gives you a high level systems perspective on working with machine learning models in

172
00:12:46,910 --> 00:12:48,350
text.

173
00:12:48,350 --> 00:12:54,410
This easy A.P. course also feeds into deep entropy which is of course not so easy because it depends

174
00:12:54,410 --> 00:13:00,930
on a lot of background and deep learning one of the main questions I get in the MLP course is how do

175
00:13:00,930 --> 00:13:03,650
I improve the results of these basic models.

176
00:13:03,940 --> 00:13:08,980
And a lot of the time the answer to that is well you have to use a more complex model but of course

177
00:13:09,310 --> 00:13:16,120
that necessitates learning how that complex model works and deep NRP is an example of that because we

178
00:13:16,120 --> 00:13:23,220
learn a state of the art method for sentiment analysis whereas an easy A.P. we used only a linear model.

179
00:13:23,410 --> 00:13:28,930
So it's important to understand that while yes it's possible to improve the predictive ability on simple

180
00:13:28,930 --> 00:13:33,670
basic models as you can see it's not always an easy path to get there.

181
00:13:33,670 --> 00:13:36,220
So you have to make sure you're prepared.

182
00:13:36,250 --> 00:13:37,000
Case in point.

183
00:13:37,000 --> 00:13:40,530
Just look at all the time spent just to get too deep in IP.

184
00:13:40,570 --> 00:13:49,340
It's not an easy task let's now scroll over to Ganz and variation all auto encoders just like how deep

185
00:13:49,340 --> 00:13:52,600
reinforcement learning is not related to deep end LP.

186
00:13:52,760 --> 00:13:56,180
This isn't really related to deep reinforcement learning either.

187
00:13:56,180 --> 00:14:03,020
This is deep learning part 8 by order of creation only it is these spiritual sequel to unsupervised

188
00:14:03,020 --> 00:14:07,760
deep learning which was deep learning in Python part 4.

189
00:14:07,810 --> 00:14:13,180
So just to reiterate this is part 6 part 7 and part 8.

190
00:14:13,210 --> 00:14:19,150
By order only but they are not related to each other conceptually although it's always nice to know

191
00:14:19,150 --> 00:14:26,930
all these things because more context makes future things easier to learn so the reason this is linked

192
00:14:26,930 --> 00:14:33,680
to this is because it gains and variation all auto encoders are also unsupervised deep learning models

193
00:14:34,520 --> 00:14:40,760
but whereas unsupervised deep learning was all about how to improve supervised learning gains and variation

194
00:14:40,760 --> 00:14:46,730
of auto and coders don't have any direct benefit to supervised learning at all although we do make use

195
00:14:46,730 --> 00:14:50,440
of supervised learning within the course in this course.

196
00:14:50,480 --> 00:14:57,450
The focus is on generating images we've seen that again can create photo realistic images based on a

197
00:14:57,450 --> 00:14:59,690
do on their own network system.

198
00:14:59,690 --> 00:15:03,950
That's pretty cool because before games we didn't have any kind of machine learning model that could

199
00:15:03,950 --> 00:15:06,410
generate real looking images.

200
00:15:06,770 --> 00:15:13,010
Nowadays Ganz are able to generate high resolution high quality images of people that you can't even

201
00:15:13,010 --> 00:15:14,910
tell are not real people.

202
00:15:14,960 --> 00:15:19,740
It certainly makes the idea of the matrix seem very possible.

203
00:15:19,890 --> 00:15:22,690
All right so I hope you found this lecture helpful.

204
00:15:22,690 --> 00:15:28,570
We saw that these courses are related to each other in some pretty complicated ways learning machine

205
00:15:28,570 --> 00:15:30,790
learning is not exactly linear.

206
00:15:30,820 --> 00:15:36,280
Sometimes you have to take a one course before you can take the next sometimes one course might just

207
00:15:36,280 --> 00:15:42,190
be related to another course by some key concept but maybe in one context it's a lot easier to understand.

208
00:15:42,970 --> 00:15:48,810
So remember to keep in mind that these arrows did not all indicate strong prerequisites or rather there's

209
00:15:48,820 --> 00:15:52,170
just a relationship between the two courses.

210
00:15:52,210 --> 00:15:57,700
I hope that this lecture explained any nuances between what is a prerequisite and what is not and I

211
00:15:57,700 --> 00:16:02,140
hope I did a good job of answering which order should you take these courses in and why.