1
00:00:02,120 --> 00:00:04,220
Everyone and welcome back to this class.

2
00:00:05,370 --> 00:00:10,050
In this lecture, I'm going to answer a question that I get pretty much multiple times per day.

3
00:00:10,470 --> 00:00:13,200
And that is what order should I take your courses in?

4
00:00:14,100 --> 00:00:17,970
This is a great question and it's something that even I have to think about sometimes.

5
00:00:18,090 --> 00:00:19,890
What is the best way to learn all this stuff?

6
00:00:20,580 --> 00:00:25,860
So I think that what you'll find is this ordering makes a lot of sense in that we're always building

7
00:00:25,860 --> 00:00:27,210
on skills we learned before.

8
00:00:27,780 --> 00:00:29,430
So it's just like learning calculus.

9
00:00:29,880 --> 00:00:34,860
First, you learn calculus one, then you learn Calculus two, which is kind of the opposite of Calculus

10
00:00:34,860 --> 00:00:35,220
one.

11
00:00:35,580 --> 00:00:40,050
Then you learn Calculus three, which expands on the ideas from both Calculus one and two.

12
00:00:40,290 --> 00:00:43,830
And then from there, you can go on to even more advanced topics like probability.

13
00:00:44,250 --> 00:00:47,100
So this is a process based on skill building.

14
00:00:49,040 --> 00:00:54,980
Now, it's important to keep in mind that this lecture is not just about my courses in how to navigate

15
00:00:54,980 --> 00:00:59,840
them, although that is the question I'm answering, since that's the question I get multiple times

16
00:00:59,840 --> 00:01:00,350
per day.

17
00:01:01,100 --> 00:01:06,770
But this is also a lecture on how different machine learning topics are related and what depends on

18
00:01:06,770 --> 00:01:07,310
what else.

19
00:01:07,790 --> 00:01:13,520
So even if you end up not taking any of my courses, it doesn't change the fact that these topics are

20
00:01:13,520 --> 00:01:16,160
still related to each other in these ways.

21
00:01:17,150 --> 00:01:20,480
So in some sense, this lecture isn't really about my courses at all.

22
00:01:20,900 --> 00:01:25,910
It's a general lecture about how different types of machine learning models are related and in what

23
00:01:25,910 --> 00:01:29,330
order you might want to learn them from a skill building perspective.

24
00:01:32,310 --> 00:01:34,650
Now, I want to show you this stuff in a visual way.

25
00:01:34,890 --> 00:01:39,330
I've actually set up a webpage where you can go to and look at the chart I made for yourself.

26
00:01:39,720 --> 00:01:42,030
So what I want you to do is go to this website.

27
00:01:42,390 --> 00:01:43,350
Deep learning courses.

28
00:01:43,350 --> 00:01:43,980
Dot com.

29
00:01:45,250 --> 00:01:47,860
Now I want you to click on the catalog link.

30
00:01:49,130 --> 00:01:52,190
And from there, you want to click on the Click Here link.

31
00:01:54,740 --> 00:01:57,440
So this brings us to the court's order webpage.

32
00:01:59,040 --> 00:02:04,710
So at the top, you have sort of this linear chart, right, it's just one course after the other,

33
00:02:04,710 --> 00:02:05,370
after the other.

34
00:02:06,850 --> 00:02:09,610
But this is not really what I want to look at in this lecture.

35
00:02:10,210 --> 00:02:13,540
This linear chart gives you a basic overview, but it's not ideal.

36
00:02:14,260 --> 00:02:18,940
The reason is the relationships between courses are way more complex than this.

37
00:02:19,390 --> 00:02:22,960
Sometimes one course gives you the skills you need for two other courses.

38
00:02:23,260 --> 00:02:28,000
Sometimes there will be a course that's so complex that it depends on multiple different courses.

39
00:02:28,570 --> 00:02:31,270
So this is the perfect opportunity to use a graph.

40
00:02:33,120 --> 00:02:37,560
So if you scroll up at the top, you can use this link to just jump down to the bottom.

41
00:02:38,160 --> 00:02:42,360
And we have this graph where you can see the dependencies between each course.

42
00:02:45,580 --> 00:02:50,860
So what I'm going to do is I'm going to go through each link in this graph and explain to you why it

43
00:02:50,860 --> 00:02:52,420
exists and why it's important.

44
00:02:53,290 --> 00:02:54,460
So let's look at the ask you.

45
00:02:54,460 --> 00:02:59,710
Of course, first since that has no links, so it might show up in the middle so you can just drag it

46
00:02:59,710 --> 00:03:00,490
out to the side.

47
00:03:01,660 --> 00:03:05,590
That's the coolest thing about this graph, because you can just pick stuff up and drag it around.

48
00:03:05,680 --> 00:03:09,010
It's very interactive, so ESCO has no legs.

49
00:03:09,640 --> 00:03:14,530
This isn't a prerequisite to any other course, nor is any other course a prerequisite to it.

50
00:03:15,100 --> 00:03:15,990
It is standalone.

51
00:03:16,000 --> 00:03:17,380
You can take it at any time.

52
00:03:20,230 --> 00:03:24,460
Next, you'll notice that there are courses that have no links going into them.

53
00:03:25,060 --> 00:03:31,210
So, for example, the Nampai course, this is because none of my other courses are a prerequisite to

54
00:03:31,210 --> 00:03:31,900
this course.

55
00:03:32,410 --> 00:03:37,990
This course was just designed to give you a basic understanding of the syntax and tools we use in data

56
00:03:37,990 --> 00:03:39,220
science and machine learning.

57
00:03:40,370 --> 00:03:43,550
The goal is actually to not do any machine learning in this course.

58
00:03:43,580 --> 00:03:49,040
Believe me, it was actually quite difficult to accomplish that because it's so easy to use something

59
00:03:49,040 --> 00:03:50,580
in machine learning as an example.

60
00:03:50,600 --> 00:03:53,120
It was very hard to avoid talking about it.

61
00:03:53,810 --> 00:03:57,500
You'll also notice that there are some courses with no outgoing edges.

62
00:03:57,890 --> 00:04:00,290
You can think of these as the most advanced courses.

63
00:04:00,290 --> 00:04:01,790
They are at the top of the ladder.

64
00:04:02,450 --> 00:04:07,820
So, for example, this one right here, of course, that could change in the future as I create more

65
00:04:07,820 --> 00:04:10,520
courses in which case I will update this lecture.

66
00:04:13,020 --> 00:04:15,600
So let's start with what's coming out of Nampai.

67
00:04:16,820 --> 00:04:23,420
You'll see that there are two links, one goes to linear regression and one goes to Bayesian machine

68
00:04:23,420 --> 00:04:26,560
learning, a b testing and by the way, you can zoom into this graph.

69
00:04:26,570 --> 00:04:28,340
So that's what I'm going to do right now.

70
00:04:31,150 --> 00:04:33,790
So here's linear regression, here's a Bayesian machine learning.

71
00:04:35,170 --> 00:04:41,170
So while Nampai goes to both of these courses, I actually strongly recommend linear regression as the

72
00:04:41,170 --> 00:04:41,770
next step.

73
00:04:42,580 --> 00:04:47,920
Bayesian machine learning doesn't depend on any other courses per se, but it's still more advanced.

74
00:04:47,950 --> 00:04:52,990
It requires a more advanced understanding of probability and of real-world engineering problems like

75
00:04:52,990 --> 00:04:55,900
optimizing the click through rate of a link on your website.

76
00:04:56,740 --> 00:05:02,140
So linear regression would be my number one recommendation for your foray into machine learning.

77
00:05:03,150 --> 00:05:06,810
Now, after linear regression, we have logistic regression.

78
00:05:08,570 --> 00:05:11,600
So why do we go from linear regression to a logistic regression?

79
00:05:12,170 --> 00:05:15,380
Well, the basic idea is both of these are linear models.

80
00:05:15,560 --> 00:05:20,750
In other words, the model is of the form Y equals m x plus B, which is a line.

81
00:05:21,710 --> 00:05:26,960
The difference is that linear regression does regression while logistic regression does classification.

82
00:05:27,500 --> 00:05:32,420
We want to learn classification because that's one of the essential tasks of a neural network, which

83
00:05:32,420 --> 00:05:33,410
is what comes after.

84
00:05:34,730 --> 00:05:40,730
And you might notice that linear regression also has two outgoing links, so there's one two logistic

85
00:05:40,730 --> 00:05:43,130
regression and one to reinforcement learning.

86
00:05:44,970 --> 00:05:50,310
This is because my reinforcement learning class makes use of linear regression, so naturally, if you

87
00:05:50,310 --> 00:05:53,190
don't understand linear regression, you won't understand that part.

88
00:05:53,850 --> 00:05:58,710
But again, this is a much weaker link of and going from linear regression to logistic regression.

89
00:05:59,830 --> 00:06:03,520
This is because reinforcement learning is actually conceptually more difficult.

90
00:06:04,180 --> 00:06:09,040
It doesn't depend on the content of any other courses directly other than what we've shown.

91
00:06:09,370 --> 00:06:11,530
But the ideas behind it are more advanced.

92
00:06:12,040 --> 00:06:17,230
You really want to have some experience with both supervised and unsupervised learning before jumping

93
00:06:17,230 --> 00:06:18,520
into reinforcement learning.

94
00:06:18,820 --> 00:06:21,520
It helps to build the proper perspective on the subject.

95
00:06:22,060 --> 00:06:26,440
We will get back to reinforcement learning later, but for now, let's go along the deep learning path.

96
00:06:27,880 --> 00:06:31,490
So in the deep learning path, we last discussed logistic regression.

97
00:06:31,510 --> 00:06:33,280
This is also known as a neuron.

98
00:06:33,910 --> 00:06:38,620
This is very important because in order to build a neural network, we must first know how to build

99
00:06:38,620 --> 00:06:39,280
a neuron.

100
00:06:39,880 --> 00:06:42,850
A neural network is just a bunch of neurons stuck together.

101
00:06:43,480 --> 00:06:49,660
So this leads us to deep learning in Python part one, which is the bright green logo.

102
00:06:50,750 --> 00:06:56,840
This course teaches you how neural networks work and basic fundamental operations like back propagation.

103
00:06:57,720 --> 00:07:03,510
Once you know the basics of deep learning, you can move on to deep learning part two or modern deep

104
00:07:03,510 --> 00:07:04,410
learning in Python.

105
00:07:05,580 --> 00:07:11,730
This is the same image, but in a darker color to signify that it's a continuation of the previous course,

106
00:07:12,420 --> 00:07:16,980
as the title suggests, this course is all about modern developments in deep learning.

107
00:07:17,430 --> 00:07:23,400
So we go over ways we've improved basic back propagation like momentum and adaptive learning rate techniques

108
00:07:23,400 --> 00:07:24,930
like arms, prop and atom.

109
00:07:25,470 --> 00:07:31,770
We talk about modern regularization techniques like dropout and batch normalization, and most importantly,

110
00:07:31,770 --> 00:07:35,100
we look at modern deep learning libraries like the piano and TensorFlow.

111
00:07:35,760 --> 00:07:39,990
We also take a look at Teraz PyTorch and snare and see A.K.

112
00:07:40,990 --> 00:07:45,460
You'll see that once you know the fundamentals, things don't change that much from one library to the

113
00:07:45,460 --> 00:07:47,380
next, it's all the same concept.

114
00:07:48,960 --> 00:07:53,850
Now, learning about these modern library sets the stage for the next level, which is applying deep

115
00:07:53,850 --> 00:08:00,330
learning to certain special data formats, the two fundamental types of data are text and images.

116
00:08:00,960 --> 00:08:05,700
These are special because they have unique structural properties, and we can make use of our knowledge

117
00:08:05,700 --> 00:08:09,090
of those properties in order to build better neural networks.

118
00:08:09,720 --> 00:08:16,560
For example, we know that text is made up of sentences and sentences are sequences of words, so sequential

119
00:08:16,560 --> 00:08:17,610
modeling is important.

120
00:08:18,420 --> 00:08:24,480
We know that images are two dimensional objects, but importantly, each pixel is very likely to be

121
00:08:24,480 --> 00:08:26,100
the same as nearby pixels.

122
00:08:26,490 --> 00:08:32,909
For example, if you pick a random pixel on an image of a red car, if that pixel is red, then most

123
00:08:32,909 --> 00:08:35,760
likely the pixels around it are also going to be red.

124
00:08:36,390 --> 00:08:40,590
There is also no question about the practical applicability of text and images.

125
00:08:41,070 --> 00:08:46,380
The internet is made up of text, so by doing machine learning on text, you're learning techniques

126
00:08:46,380 --> 00:08:50,250
that essentially allow you to write models of the entire world's knowledge.

127
00:08:50,940 --> 00:08:52,440
Images are also everywhere.

128
00:08:52,770 --> 00:08:59,490
Snapchat, Instagram, Facebook Self-driving cars home security systems Knowing how to deal with images

129
00:08:59,490 --> 00:09:03,570
is super important, and billion dollar companies exist because they are good at it.

130
00:09:04,290 --> 00:09:09,750
So that brings us to the next course in the deep learning series convolutional neural networks or deep

131
00:09:09,750 --> 00:09:11,250
learning in Python Part three.

132
00:09:11,880 --> 00:09:17,040
This is all about how to incorporate convolution into neural networks and what convolution is in the

133
00:09:17,040 --> 00:09:21,000
first place, and how that helps us make use of the structure of images.

134
00:09:23,270 --> 00:09:24,260
Let's scroll up a bit.

135
00:09:25,610 --> 00:09:32,510
So after convolutional neural networks, we take a detour into unsupervised deep learning, unsupervised

136
00:09:32,510 --> 00:09:35,450
deep learning isn't very popular with beginners, but it should be.

137
00:09:35,960 --> 00:09:42,080
If you look at the experts of deep learning like Yann LeCun, Joshua Bengio and Geoffrey Hinton, they

138
00:09:42,080 --> 00:09:45,440
all talk about how important unsupervised deep learning is.

139
00:09:46,100 --> 00:09:48,680
This course introduces us to several new ideas.

140
00:09:49,370 --> 00:09:52,730
First, the idea of latent variables or latent causes.

141
00:09:53,360 --> 00:09:58,160
These are factors that you don't directly observe in your data, but that you can learn about by studying

142
00:09:58,160 --> 00:09:59,480
the structure of your data.

143
00:10:00,450 --> 00:10:06,410
Second, modern techniques for data visualization like tease me this will be very important since,

144
00:10:06,410 --> 00:10:11,180
as I discussed earlier, along with images, text is a really important data format.

145
00:10:11,810 --> 00:10:15,920
In order to visualize text in later courses, we will be making use of Disney.

146
00:10:16,280 --> 00:10:18,470
So it's good to know what it is and why it's useful.

147
00:10:19,340 --> 00:10:26,450
Third, unsupervised pre-training unsupervised pre-training is the foundation of modern ideas like transfer

148
00:10:26,450 --> 00:10:31,490
learning, and we make use of it both in NLP and with convolutional neural networks.

149
00:10:32,490 --> 00:10:37,740
Fourth, the vanishing gradient problem, we demonstrate the vanishing gradient problem directly, so

150
00:10:37,740 --> 00:10:43,200
you understand where it comes from, vanishing gradients are especially important in the context of

151
00:10:43,200 --> 00:10:44,430
recurrent neural networks.

152
00:10:45,600 --> 00:10:50,190
And it just so happens that recurrent neural networks is the next course in the deep learning series

153
00:10:50,460 --> 00:10:52,170
Deep Learning and Python Part five.

154
00:10:53,100 --> 00:10:57,630
So recall that our two fundamentally interesting data types are text and images.

155
00:10:58,140 --> 00:11:01,410
We've looked at images, and in this course, we start to look at texts.

156
00:11:02,010 --> 00:11:02,970
Why might that be?

157
00:11:03,540 --> 00:11:08,700
Well, recurrent neural networks specialize in modeling sequences, and text is just the sequence of

158
00:11:08,700 --> 00:11:09,150
words.

159
00:11:09,450 --> 00:11:13,890
So it's the best type of data to use to demonstrate the principles of our own ends.