1
00:00:02,220 --> 00:00:06,270
Everyone and welcome back to this class in this lecture.

2
00:00:06,300 --> 00:00:11,940
I'm going to answer a question that I get pretty much multiple times per day and that is what order

3
00:00:11,940 --> 00:00:14,110
should I take your courses in.

4
00:00:14,160 --> 00:00:18,130
This is a great question and it's something that even I have to think about sometimes.

5
00:00:18,210 --> 00:00:20,540
What is the best way to learn all this stuff.

6
00:00:20,640 --> 00:00:25,830
So I think that what you'll find is this ordering makes a lot of sense in that we're always building

7
00:00:25,950 --> 00:00:27,640
on skills we learned before.

8
00:00:27,810 --> 00:00:33,510
So it's just like learning calculus first you learn calculus one then you learn calculus two which is

9
00:00:33,510 --> 00:00:39,360
kind of the opposite of Calculus 1 then you learn calculus 3 which expands on the ideas from both Calculus

10
00:00:39,360 --> 00:00:44,270
1 and 2 and then from there you can go on even more advanced topics like probability.

11
00:00:44,280 --> 00:00:49,070
So this is a process based on skill building.

12
00:00:49,100 --> 00:00:54,980
Now it's important to keep in mind that this lecture is not just about my courses in how to navigate

13
00:00:54,980 --> 00:01:00,960
them although that is the question I'm answering since that's a question I get multiple times per day.

14
00:01:01,160 --> 00:01:06,770
But this is also a lecture on how different machine learning topics are related in a what depends on

15
00:01:06,770 --> 00:01:07,820
what else.

16
00:01:07,820 --> 00:01:13,520
So even if you end up not taking any of my courses it doesn't change the fact that these topics are

17
00:01:13,520 --> 00:01:20,090
still related to each other in these ways so in some sense this lecture isn't really about my courses

18
00:01:20,090 --> 00:01:20,960
at all.

19
00:01:20,960 --> 00:01:25,910
It's a general lecture about how different types of machine learning models are related and in what

20
00:01:25,910 --> 00:01:32,370
order you might want to learn them from a skill building perspective.

21
00:01:32,380 --> 00:01:34,930
Now I want to show you this stuff in a visual way.

22
00:01:34,960 --> 00:01:39,370
I've actually set up a Web page where you can go to and look at the chart I made for yourself.

23
00:01:39,760 --> 00:01:42,500
So what I want you to do is go to this Web site.

24
00:01:42,520 --> 00:01:45,300
People in courses dot com.

25
00:01:45,340 --> 00:01:54,650
Now I want you to click on the catalog link and from there you want to click on the click here link.

26
00:01:54,760 --> 00:02:03,030
So this brings us to the course or the web page so at the top you have sort of this linear chart right.

27
00:02:03,040 --> 00:02:08,980
It's just one course after the other after the other but this is not really what I want to look at in

28
00:02:08,980 --> 00:02:09,950
this lecture.

29
00:02:10,270 --> 00:02:14,340
This linear chart gives you a basic overview but it's not ideal.

30
00:02:14,350 --> 00:02:19,230
The reason is the relationships between courses are way more complex than this.

31
00:02:19,450 --> 00:02:23,320
Sometimes one course gives you the skills you need for two other courses.

32
00:02:23,320 --> 00:02:28,620
Sometimes there will be a course that's so complex that it depends on multiple different courses.

33
00:02:28,630 --> 00:02:36,140
So this is the perfect opportunity to use a graph so if you scroll up at the top you can use this link

34
00:02:36,170 --> 00:02:41,900
to just jump down to the bottom and we have this graph where you can see the dependencies between each

35
00:02:41,900 --> 00:02:45,610
course.

36
00:02:45,610 --> 00:02:50,890
So what I'm going to do is I'm going to go through each link in this graph and explain to you why it

37
00:02:50,890 --> 00:02:53,350
exists and why it's important.

38
00:02:53,350 --> 00:02:58,630
So let's look at the of course first since that has no links so it might show up in the middle so you

39
00:02:58,630 --> 00:03:04,260
can just drag it out to the side that's the coolest thing about this graph because you can just pick

40
00:03:04,260 --> 00:03:05,730
stuff up and drag it around.

41
00:03:05,730 --> 00:03:09,690
It's very interactive so ask you well has no links.

42
00:03:09,690 --> 00:03:15,050
This isn't a prerequisite to any other course nor is any other course a prerequisite to it.

43
00:03:15,180 --> 00:03:16,010
It is standalone.

44
00:03:16,020 --> 00:03:17,400
You can take it at anytime

45
00:03:20,300 --> 00:03:20,990
next.

46
00:03:21,020 --> 00:03:25,100
You'll notice that there are courses that have no links going into them.

47
00:03:25,100 --> 00:03:27,540
So for example the name pi course.

48
00:03:27,710 --> 00:03:32,480
This is because none of my other courses are a prerequisite to this course.

49
00:03:32,480 --> 00:03:38,000
This course was just designed to give you a basic understanding of the syntax and tools we use in data

50
00:03:38,000 --> 00:03:40,410
science and machine learning.

51
00:03:40,440 --> 00:03:43,620
The goal is actually to not do any machine learning in this course.

52
00:03:43,620 --> 00:03:49,050
Believe me it was actually quite difficult to accomplish that because it's so easy to use something

53
00:03:49,050 --> 00:03:50,670
in machine learning as an example.

54
00:03:50,670 --> 00:03:53,850
It was very hard to avoid talking about it.

55
00:03:53,880 --> 00:03:57,850
You'll also notice that there are some courses with no outgoing edges.

56
00:03:57,960 --> 00:04:02,420
You can think of these as the most advanced courses they are at the top of the ladder.

57
00:04:02,520 --> 00:04:08,370
So for example this one right here of course that could change in the future as I create more courses

58
00:04:08,430 --> 00:04:17,190
in which case I will update this lecture so let's start with what's coming out of num pi you'll see

59
00:04:17,190 --> 00:04:24,030
that there are two links one goes to linear regression and one goes to Bayesian machine learning a b

60
00:04:24,030 --> 00:04:24,710
testing.

61
00:04:24,810 --> 00:04:26,640
And by the way you can zoom into this graph.

62
00:04:26,660 --> 00:04:28,510
So what I would do right now.

63
00:04:31,160 --> 00:04:32,440
So here's linear regression.

64
00:04:32,450 --> 00:04:37,980
Here's Bayesian machine learning so well known pi goes to both of these courses.

65
00:04:38,000 --> 00:04:44,630
I actually strongly recommend linear regression as the next step Bayesian machine learning doesn't depend

66
00:04:44,630 --> 00:04:49,760
on any other courses per say but it's still more advanced it requires a more advanced understanding

67
00:04:49,760 --> 00:04:55,190
of probability and of real world engineering problems like optimizing the click through rate of a link

68
00:04:55,190 --> 00:04:55,910
on your Web site.

69
00:04:56,780 --> 00:05:03,370
So linear regression would be my number one recommendation for your foray into machine learning now

70
00:05:03,380 --> 00:05:05,120
after linear regression.

71
00:05:05,150 --> 00:05:12,260
We have logistic regression so why do we go from linear regression to logistic regression.

72
00:05:12,260 --> 00:05:15,600
Well the basic idea is both of these are linear models.

73
00:05:15,650 --> 00:05:21,590
In other words the model is of the form y equals M X plus B which is a line.

74
00:05:21,770 --> 00:05:27,340
The difference is that linear regression does regression while logistic regression does classification.

75
00:05:27,560 --> 00:05:32,540
We want to learn classification because that's one of the central tasks of a neural network which is

76
00:05:32,540 --> 00:05:38,650
what comes after you might notice that linear regression also has two outgoing links.

77
00:05:38,690 --> 00:05:45,000
So there's one two logistic regression and one two reinforcement learning.

78
00:05:45,010 --> 00:05:48,930
This is because my reinforcement learning class makes use of linear regression.

79
00:05:48,940 --> 00:05:53,230
So naturally if you don't understand linear regression you won't understand that part.

80
00:05:53,890 --> 00:06:00,130
But again this is a much weaker link and going from linear regression to logistic regression this is

81
00:06:00,130 --> 00:06:03,550
because reinforcement learning is actually conceptually more difficult.

82
00:06:04,270 --> 00:06:10,150
It doesn't depend on the content of any other courses directly other than what we've shown but the ideas

83
00:06:10,150 --> 00:06:12,060
behind it are more advanced.

84
00:06:12,130 --> 00:06:17,230
You really want to have some experience with both supervised and unsupervised learning before jumping

85
00:06:17,230 --> 00:06:18,910
into reinforcement learning.

86
00:06:18,910 --> 00:06:22,060
It helps to build the proper perspective on the subject.

87
00:06:22,120 --> 00:06:28,110
We will get back to reinforcement learning later but for now let's go along the deep learning path so

88
00:06:28,110 --> 00:06:31,500
on the deeper learning path we last discussed logistic regression.

89
00:06:31,590 --> 00:06:33,900
This is also known as a neuron.

90
00:06:33,990 --> 00:06:38,760
This is very important because in order to build a neural network we must first know how to build a

91
00:06:38,760 --> 00:06:43,490
neuron a neural network is just a bunch of neurons stuck together.

92
00:06:43,530 --> 00:06:51,650
So this leads us to deep learning in Python part 1 which is the bright green logo This course teaches

93
00:06:51,650 --> 00:06:58,440
you how neural networks work and basic fundamental operations like back propagation once you know the

94
00:06:58,440 --> 00:06:59,870
basics of deep learning.

95
00:06:59,910 --> 00:07:05,600
You can move on to deep learning part two or modern Deep Learning in python.

96
00:07:05,640 --> 00:07:12,250
This is the same image but in a darker color to signify that it's a continuation of the previous course.

97
00:07:12,510 --> 00:07:17,460
As the title suggests this course is all about modern developments in deep learning.

98
00:07:17,460 --> 00:07:23,430
So we go over ways we've improved basic back propagation like momentum and adaptive learning rate techniques

99
00:07:23,430 --> 00:07:25,560
like our mass prop and atom.

100
00:07:25,590 --> 00:07:31,740
We talk about modern regularization techniques like drop out in batch normalization and most importantly

101
00:07:31,770 --> 00:07:35,850
we look at modern Deep Learning libraries like piano and tensor flow.

102
00:07:35,850 --> 00:07:43,230
We also take a look at KRS pi talk and Max net and see A.K. you'll see that once you know the fundamentals

103
00:07:43,290 --> 00:07:47,390
things don't change that much from one library to the next it's all the same concept.

104
00:07:49,010 --> 00:07:54,180
Now learning about these modern libraries sets the stage for the next level which is applying deep learning

105
00:07:54,180 --> 00:07:56,890
to certain special data formats.

106
00:07:57,000 --> 00:08:00,890
The two fundamental types of data are text and images.

107
00:08:01,020 --> 00:08:05,700
These are special because they have unique structural properties and we can make use of our knowledge

108
00:08:05,760 --> 00:08:09,690
of those properties in order to build better neural networks.

109
00:08:09,750 --> 00:08:16,590
For example we know that text is made up of sentences and sentences or sequences of words so sequential

110
00:08:16,590 --> 00:08:18,510
modeling is important.

111
00:08:18,510 --> 00:08:24,600
We know that images are two dimensional objects but importantly each pixel is very likely to be the

112
00:08:24,600 --> 00:08:26,500
same as nearby pixels.

113
00:08:26,520 --> 00:08:30,510
For example if you pick a random pixel on an image of a red car.

114
00:08:30,780 --> 00:08:36,450
If that pixel is red then most likely the pixels around it are also going to be red.

115
00:08:36,450 --> 00:08:41,100
There is also no question about the practical applicability of text and images.

116
00:08:41,130 --> 00:08:42,880
The internet is made up of text.

117
00:08:42,930 --> 00:08:48,300
So by doing machine learning on text you're learning techniques that essentially allow you to write

118
00:08:48,300 --> 00:08:51,030
models of the entire world's knowledge.

119
00:08:51,030 --> 00:08:52,770
Images are also everywhere.

120
00:08:52,830 --> 00:08:57,870
Snapchat Instagram Facebook self-driving cars home security systems.

121
00:08:58,140 --> 00:09:03,090
Knowing how to deal with images is super important and billion dollar companies exist because they are

122
00:09:03,090 --> 00:09:03,750
good at it.

123
00:09:04,320 --> 00:09:09,540
So that brings us to the next course in the Deep Learning series convolution on their own networks or

124
00:09:09,540 --> 00:09:11,900
deep learning in Python Part 3.

125
00:09:11,940 --> 00:09:17,040
This is all about how to incorporate convolution into neural networks and what convolution is in the

126
00:09:17,040 --> 00:09:24,280
first place and how that helps us make use of the structure of images let's scroll up a bit.

127
00:09:25,680 --> 00:09:32,520
So after convolution on neural networks we take a detour into unsupervised Deep Learning unsupervised

128
00:09:32,520 --> 00:09:35,920
deep learning isn't very popular with beginners but it should be.

129
00:09:36,030 --> 00:09:42,090
If you look at the experts of deep learning like Yann Le -- Joshua banjo and Geoffrey Hinton they

130
00:09:42,090 --> 00:09:46,160
all talk about how important unsupervised deep learning is.

131
00:09:46,170 --> 00:09:49,380
This course introduces us to several new ideas.

132
00:09:49,410 --> 00:09:53,110
First the idea of latent variables or latent causes.

133
00:09:53,400 --> 00:09:58,200
These are factors that you don't directly observe in your data but that you can learn about by studying

134
00:09:58,200 --> 00:10:01,030
the structure of your data second.

135
00:10:01,300 --> 00:10:04,630
Modern techniques for data visualization like tease me.

136
00:10:04,900 --> 00:10:10,480
This will be very important since as I discussed earlier along with images text is a really important

137
00:10:10,480 --> 00:10:11,770
data format.

138
00:10:11,890 --> 00:10:16,110
In order to visualize text in later courses we will be making use of.

139
00:10:16,330 --> 00:10:19,360
So it's good to know what it is and why it's useful.

140
00:10:19,360 --> 00:10:26,470
Third unsupervised pre training unsupervised pre training is the foundation of modern ideas like transfer

141
00:10:26,470 --> 00:10:32,500
learning and we make use of it both in an LP and with convolution or neuro networks.

142
00:10:32,510 --> 00:10:38,540
Fourth the vanishing and problem we demonstrate the vanishing gradient problem directly so you understand

143
00:10:38,570 --> 00:10:43,880
where it comes from vanishing gradients are especially important in the context of recurrent neural

144
00:10:43,880 --> 00:10:49,680
networks and it just so happens that recurrent neural networks is the next course in the Deep Learning

145
00:10:49,680 --> 00:10:50,430
series.

146
00:10:50,520 --> 00:10:52,990
Deep Learning in Python part 5.

147
00:10:53,160 --> 00:10:58,230
So recall there are two fundamentally interesting data types are text and images.

148
00:10:58,230 --> 00:11:02,030
We've looked at images and in this course we start to look at texts.

149
00:11:02,130 --> 00:11:08,140
Why might that be where recurrent neuron networks specialize in modeling sequences and text is just

150
00:11:08,140 --> 00:11:09,450
the sequence of words.

151
00:11:09,450 --> 00:11:13,910
So it's the best type of data to use to demonstrate the principles of Arnold's.