1
00:00:02,090 --> 00:00:04,280
Everyone, and welcome back to this class.

2
00:00:05,370 --> 00:00:10,740
In this lecture, I'm going to answer a question that I get pretty much multiple times per day, and

3
00:00:10,740 --> 00:00:13,240
that is what order should I take your courses in?

4
00:00:14,130 --> 00:00:18,110
This is a great question and it's something that even I have to think about sometimes.

5
00:00:18,120 --> 00:00:19,950
What is the best way to learn all this stuff?

6
00:00:20,560 --> 00:00:25,860
So I think that what you'll find is this ordering makes a lot of sense in that we're always building

7
00:00:25,860 --> 00:00:27,210
on skills we learned before.

8
00:00:27,760 --> 00:00:29,450
So it's just like learning calculus.

9
00:00:29,880 --> 00:00:34,890
First you learn calculus one, then you learn calculus two, which is kind of the opposite of calculus

10
00:00:34,890 --> 00:00:35,250
one.

11
00:00:35,610 --> 00:00:40,050
Then you learn Calculus three, which expands on the ideas from both Calculus one and two.

12
00:00:40,260 --> 00:00:43,840
And then from there you can go on to even more advanced topics like probability.

13
00:00:44,220 --> 00:00:47,130
So this is a process based on skill building.

14
00:00:49,010 --> 00:00:54,980
Now, it's important to keep in mind that this lecture is not just about my courses in how to navigate

15
00:00:54,980 --> 00:01:00,050
them, although that is the question I'm answering, since that's a question I get multiple times per

16
00:01:00,050 --> 00:01:00,370
day.

17
00:01:01,100 --> 00:01:06,770
But this is also a lecture on how different machine learning topics are related and what depends on

18
00:01:06,770 --> 00:01:07,320
what else.

19
00:01:07,760 --> 00:01:13,520
So even if you end up not taking any of my courses, it doesn't change the fact that these topics are

20
00:01:13,520 --> 00:01:16,190
still related to each other in these ways.

21
00:01:17,150 --> 00:01:20,490
So in some sense, this lecture isn't really about my courses at all.

22
00:01:20,870 --> 00:01:25,910
It's a general lecture about how different types of machine learning models are related and in what

23
00:01:25,910 --> 00:01:29,390
order you might want to learn them from a skill building perspective.

24
00:01:32,280 --> 00:01:34,660
Now, I want to show you this stuff in a visual way.

25
00:01:34,860 --> 00:01:39,360
I've actually set up a webpage where you can go to and look at the chart I made for yourself.

26
00:01:39,720 --> 00:01:44,010
So what I want you to do is go to this website, Deep Learning Courses, Dotcom.

27
00:01:45,250 --> 00:01:47,920
Now, I want you to click on the catalog link.

28
00:01:49,130 --> 00:01:52,220
And from there, you want to click on the Click Here link.

29
00:01:54,530 --> 00:01:57,470
And so this brings us to the court's order webpage.

30
00:01:59,010 --> 00:02:04,740
So at the top, you have sort of this linear chart, right, it's just one course after the other,

31
00:02:04,740 --> 00:02:05,400
after the other.

32
00:02:06,850 --> 00:02:12,100
But this is not really what I want to look at in this lecture, this linear chart gives you a basic

33
00:02:12,100 --> 00:02:13,510
overview, but it's not ideal.

34
00:02:14,290 --> 00:02:18,970
The reason is the relationships between courses are way more complex than this.

35
00:02:19,390 --> 00:02:22,990
Sometimes one course gives you the skills you need for two other courses.

36
00:02:23,240 --> 00:02:28,010
Sometimes there will be a course that so complex that it depends on multiple different courses.

37
00:02:28,570 --> 00:02:31,300
So this is the perfect opportunity to use a graph.

38
00:02:33,090 --> 00:02:39,060
So if you scroll up at the top, you can use this link to just jump down to the bottom and we have this

39
00:02:39,060 --> 00:02:42,390
graph where you can see the dependencies between each course.

40
00:02:45,550 --> 00:02:50,890
So what I'm going to do is I'm going to go through each link in this graph and explain to you why it

41
00:02:50,890 --> 00:02:52,460
exists and why it's important.

42
00:02:53,260 --> 00:02:56,710
So let's look at the actual course first, since that has no links.

43
00:02:57,310 --> 00:03:00,520
So it may show up in the middle so you can just drag it out to the side.

44
00:03:01,630 --> 00:03:05,660
That's the coolest thing about this graph, because you can just pick stuff up and drag it around.

45
00:03:05,680 --> 00:03:06,940
It's very interactive.

46
00:03:07,360 --> 00:03:09,040
So Escudo has no links.

47
00:03:09,640 --> 00:03:14,580
This isn't a prerequisite to any other course, nor is any other course a prerequisite to it.

48
00:03:15,070 --> 00:03:16,020
It is standalone.

49
00:03:16,030 --> 00:03:17,400
You can take it at any time.

50
00:03:20,230 --> 00:03:26,020
Next, you'll notice that there are courses that have no links going into them, so, for example,

51
00:03:26,020 --> 00:03:31,910
the no, because this is because none of my other courses are a prerequisite to this course.

52
00:03:32,410 --> 00:03:37,690
This, of course, was just designed to give you a basic understanding of the syntax and tools we use

53
00:03:37,690 --> 00:03:39,220
in data science and machine learning.

54
00:03:40,370 --> 00:03:43,580
The goal is actually to not do any machine learning in this course.

55
00:03:43,610 --> 00:03:49,070
Believe me, it was actually quite difficult to accomplish that because it's so easy to use something

56
00:03:49,070 --> 00:03:50,600
in machine learning as an example.

57
00:03:50,660 --> 00:03:53,120
It was very hard to avoid talking about it.

58
00:03:53,810 --> 00:03:57,530
You'll also notice that there are some courses with no outgoing edges.

59
00:03:57,860 --> 00:04:00,320
You can think of these as the most advanced courses.

60
00:04:00,320 --> 00:04:01,800
They are at the top of the ladder.

61
00:04:02,450 --> 00:04:07,820
So, for example, this one right here, of course, that could change in the future as I create more

62
00:04:07,820 --> 00:04:10,580
courses, in which case I will update this lecture.

63
00:04:12,990 --> 00:04:15,630
So let's start with what's coming out of non-pay.

64
00:04:16,790 --> 00:04:23,420
You'll see that there are two links, one goes to linear regression and one goes to Bazhaev machine

65
00:04:23,420 --> 00:04:24,700
learning, a B testing.

66
00:04:24,710 --> 00:04:26,570
And by the way, you can zoom into this graph.

67
00:04:26,580 --> 00:04:28,360
So that's what I'm going to do right now.

68
00:04:31,180 --> 00:04:33,820
So here's linear regression, here's a Bayesian resealing.

69
00:04:35,170 --> 00:04:41,200
So while Nampak goes to both of these courses, I actually strongly recommend linear regression as the

70
00:04:41,200 --> 00:04:41,830
next step.

71
00:04:42,580 --> 00:04:47,970
Bayesian machine learning doesn't depend on any other courses per say, but it's still more advanced.

72
00:04:48,010 --> 00:04:53,020
It requires a more advanced understanding of probability and of real world engineering problems, like

73
00:04:53,020 --> 00:04:55,920
optimizing the click through rate of a link on your website.

74
00:04:56,710 --> 00:05:02,200
So linear regression would be my number one recommendation for your foray into machine learning.

75
00:05:03,150 --> 00:05:06,840
Now, after linear regression, we have logistic regression.

76
00:05:08,540 --> 00:05:11,600
So why do we go from linear regression to a logistic regression?

77
00:05:12,200 --> 00:05:15,540
Well, the basic idea is both of these are linear models.

78
00:05:15,560 --> 00:05:20,730
In other words, the model is of the form Y equals M, X plus B, which is a line.

79
00:05:21,670 --> 00:05:26,970
The difference is that linear regression does regression while logistic regression does classification.

80
00:05:27,500 --> 00:05:32,450
We want to learn classification because that's one of the central tasks of a neural network, which

81
00:05:32,450 --> 00:05:33,440
is what comes after.

82
00:05:34,670 --> 00:05:40,760
Now, you might notice that linear regression also has two outgoing links, so there's one, two logistic

83
00:05:40,760 --> 00:05:43,160
regression and one to reinforcement learning.

84
00:05:44,970 --> 00:05:48,930
This is because my reinforcement learning class makes use of linear regression.

85
00:05:48,960 --> 00:05:53,220
So naturally, if you don't understand linear regression, you will understand that part.

86
00:05:53,820 --> 00:05:58,740
But again, this is a much weaker link and going from linear regression to logistic regression.

87
00:05:59,830 --> 00:06:03,530
This is because reinforcement learning is actually conceptually more difficult.

88
00:06:04,120 --> 00:06:09,760
It doesn't depend on the content of any other courses directly other than what we've shown, but the

89
00:06:09,760 --> 00:06:11,560
ideas behind it are more advanced.

90
00:06:12,040 --> 00:06:17,230
You really want to have some experience with both supervised and unsupervised learning before jumping

91
00:06:17,230 --> 00:06:18,550
into reinforcement learning.

92
00:06:18,790 --> 00:06:21,530
It helps to build the proper perspective on the subject.

93
00:06:22,060 --> 00:06:24,160
We will get back to reinforcement learning later.

94
00:06:24,160 --> 00:06:26,470
But for now, let's go along the deep learning path.

95
00:06:27,850 --> 00:06:31,490
So on the deep learning path, we last discussed logistic regression.

96
00:06:31,540 --> 00:06:33,270
This is also known as a neuron.

97
00:06:33,940 --> 00:06:38,650
This is very important because in order to build a neural network, we must first know how to build

98
00:06:38,650 --> 00:06:39,310
a neuron.

99
00:06:39,850 --> 00:06:42,880
A neural network is just a bunch of neurons stuck together.

100
00:06:43,480 --> 00:06:49,690
So this leads us to deep learning in Python, part one, which is the bright green logo.

101
00:06:50,780 --> 00:06:56,840
This course teaches you how neural networks work and basic fundamental operations like back propagation.

102
00:06:57,750 --> 00:07:03,510
Once you know the basics of deep learning, you can move on to deep learning Part two or modern deep

103
00:07:03,510 --> 00:07:04,440
learning in Python.

104
00:07:05,610 --> 00:07:11,250
This is the same image, but in a darker color, to signify that it's a continuation of the previous

105
00:07:11,250 --> 00:07:16,990
course, as the title suggests, this course is all about modern developments in deep learning.

106
00:07:17,400 --> 00:07:18,450
So we go over ways.

107
00:07:18,450 --> 00:07:24,090
We've improved basic back propagation like momentum and adaptive learning rate techniques like arms,

108
00:07:24,090 --> 00:07:24,950
prop and atom.

109
00:07:25,530 --> 00:07:29,970
We talk about modern regularisation techniques like drop out and batch normalization.

110
00:07:30,510 --> 00:07:35,110
And most importantly, we look at modern deep learning libraries like Viento intenser flow.

111
00:07:35,790 --> 00:07:40,020
We also take a look at Karez Pythonic and Max there and see A.K..

112
00:07:40,990 --> 00:07:45,460
You'll see that once you know the fundamentals, things don't change that much from one library to the

113
00:07:45,460 --> 00:07:47,380
next, it's all the same concept.

114
00:07:48,930 --> 00:07:53,880
Now, learning about these modern library sets the stage for the next level, which is applying deep

115
00:07:53,880 --> 00:08:00,390
learning to certain special data formats, the two fundamental types of data are text and images.

116
00:08:00,990 --> 00:08:05,730
These are special because they have unique structural properties and we can make use of our knowledge

117
00:08:05,730 --> 00:08:09,100
of those properties in order to build better neural networks.

118
00:08:09,690 --> 00:08:15,390
For example, we know that text is made up of sentences and sentences or sequences of words.

119
00:08:15,720 --> 00:08:17,620
So sequential modeling is important.

120
00:08:18,420 --> 00:08:24,510
We know that images are two dimensional objects, but importantly, each pixel is very likely to be

121
00:08:24,510 --> 00:08:26,130
the same as nearby pixels.

122
00:08:26,470 --> 00:08:30,490
For example, if you pick a random pixel on an image of a red car.

123
00:08:30,660 --> 00:08:35,760
If that pixel is red, then most likely the pixels around it are also going to be red.

124
00:08:36,390 --> 00:08:40,620
There is also no question about the practical applicability of text and images.

125
00:08:41,100 --> 00:08:42,840
The Internet is made up of text.

126
00:08:42,870 --> 00:08:48,300
So by doing machine learning on text, you're learning techniques that essentially allow you to write

127
00:08:48,300 --> 00:08:50,280
models of the entire world's knowledge.

128
00:08:50,910 --> 00:08:52,460
Images are also everywhere.

129
00:08:52,770 --> 00:08:57,570
Snapchat, Instagram, Facebook, self-driving cars, home security systems.

130
00:08:58,080 --> 00:09:03,000
Knowing how to deal with images is super important, and billion dollar companies exist because they

131
00:09:03,000 --> 00:09:03,600
are good at it.

132
00:09:04,260 --> 00:09:05,970
So that brings us to the next course.

133
00:09:05,970 --> 00:09:07,050
In the deep learning series.

134
00:09:07,440 --> 00:09:13,710
Convolutional Neural Networks or Deep Learning in Python Part three, this is all about how to incorporate

135
00:09:13,710 --> 00:09:19,290
convolution into neural networks and what convolution is in the first place and how that helps us make

136
00:09:19,290 --> 00:09:21,030
use of the structure of images.

137
00:09:23,210 --> 00:09:24,320
Let's scroll up a bit.

138
00:09:25,580 --> 00:09:32,520
So after convolutional neural networks, we take a detour into unsupervised, deep learning, unsupervised

139
00:09:32,520 --> 00:09:35,470
deep learning isn't very popular with beginners, but it should be.

140
00:09:35,900 --> 00:09:41,840
If you look at the experts of deep learning, like Yan Lican, Yahshua, Bengoa and Geoffrey Hinton,

141
00:09:41,870 --> 00:09:45,470
they all talk about how important unsupervised deep learning is.

142
00:09:46,100 --> 00:09:48,720
This course introduces us to several new ideas.

143
00:09:49,370 --> 00:09:52,790
First, the idea of latent variables or latent causes.

144
00:09:53,360 --> 00:09:58,190
These are factors that you don't directly observe in your data, but that you can learn about by studying

145
00:09:58,190 --> 00:09:59,450
the structure of your data.

146
00:10:00,410 --> 00:10:04,090
Second, modern techniques for data visualization like Tesna.

147
00:10:04,850 --> 00:10:10,490
This will be very important since, as I discussed earlier, along with images, text is a really important

148
00:10:10,490 --> 00:10:11,240
data format.

149
00:10:11,780 --> 00:10:15,950
In order to visualize text in later courses, we will be making use of Tesna.

150
00:10:16,250 --> 00:10:18,470
So it's good to know what it is and why it's useful.

151
00:10:19,340 --> 00:10:22,010
Third, unsupervised, free training.

152
00:10:22,670 --> 00:10:28,040
Unsupervised retraining is the foundation of modern ideas like transfer learning, and we make use of

153
00:10:28,040 --> 00:10:31,520
it both in NLP and with convolutional neural networks.

154
00:10:32,490 --> 00:10:37,560
Fourth, the vanishing ingredient problem, we demonstrate the vanishing ingredient problem directly,

155
00:10:37,570 --> 00:10:43,080
so you understand where it comes from, vanishing ingredients are especially important in the context

156
00:10:43,080 --> 00:10:44,460
of recurrent neural networks.

157
00:10:45,600 --> 00:10:50,250
And it just so happens that recurrent neural networks is the next course in the deep learning series

158
00:10:50,490 --> 00:10:52,210
Deep Learning in Python Part five.

159
00:10:53,070 --> 00:10:57,660
So recall, there are two fundamentally interesting data types are text and images.

160
00:10:58,140 --> 00:11:01,470
We've looked at images, and in this course we start to look at text.

161
00:11:02,040 --> 00:11:02,950
Why might that be?

162
00:11:03,540 --> 00:11:08,730
Well, recurrent neural networks specialize in modeling sequences, and text is just the sequence of

163
00:11:08,730 --> 00:11:09,180
words.

164
00:11:09,450 --> 00:11:13,920
So it's the best type of data to use to demonstrate the principles of Arnon's.