1
00:00:11,550 --> 00:00:15,990
In this lecture I'm going to give you an outline for the rest of this course.

2
00:00:15,990 --> 00:00:19,830
First I want to give you a brief summary of what pi torture is all about.

3
00:00:19,830 --> 00:00:24,600
In case you've had previous experience with tensor flow or other deep learning libraries such as the

4
00:00:24,600 --> 00:00:25,860
a..

5
00:00:25,860 --> 00:00:31,950
So in the beginning there was V.A. V.A. was a significant improvement over what was being done previously

6
00:00:32,190 --> 00:00:34,060
for two major reasons.

7
00:00:34,110 --> 00:00:39,210
Number one for reasons you'll learn about later writing neuron that works from scratch involves doing

8
00:00:39,240 --> 00:00:46,010
a lot of Matrix calculus by hand and then copying those equations into code for those of you who took

9
00:00:46,010 --> 00:00:47,720
my first deep learning course.

10
00:00:47,840 --> 00:00:49,850
You know how difficult this can be.

11
00:00:49,940 --> 00:00:55,880
The Vienna library was the first to innovate in this area using automatic differentiation or auto diff

12
00:00:55,910 --> 00:01:01,670
for sure what that means is you don't have to write down Calculus Equations since the computer will

13
00:01:01,670 --> 00:01:03,040
do that for you.

14
00:01:03,530 --> 00:01:07,330
And number two there's a lot of math that has to happen in a neuron that work.

15
00:01:07,340 --> 00:01:13,040
And this takes a lot of time meaning you'd have to wait hours or even days or weeks to train your neural

16
00:01:13,040 --> 00:01:20,180
network the V.A. library was the first to innovate in this area by making use of GP use which were originally

17
00:01:20,180 --> 00:01:23,960
designed to improve the performance of P.C. games.

18
00:01:23,960 --> 00:01:29,680
One of the downsides of older libraries such as the piano is that you have to build everything by yourself.

19
00:01:29,690 --> 00:01:34,720
This can really slow you down especially if it's your first time writing certain components.

20
00:01:35,060 --> 00:01:40,330
You not only have to write each component on your own you'll also have to worry about them being right.

21
00:01:40,580 --> 00:01:45,910
If any single component you wrote is wrong then it could make your entire program fail.

22
00:01:46,070 --> 00:01:56,710
On top of that in the previous years it was announced that the V.A. library would no longer be maintained.

23
00:01:56,730 --> 00:02:01,800
What about tensor flow tensor flow is a pretty popular library thanks to the fact that it's backed by

24
00:02:01,800 --> 00:02:07,040
Google who at this point kind of run the Internet tensor flow used to be really messy.

25
00:02:07,080 --> 00:02:14,730
In fact probably more so than V.A. but now in version 2.0 it uses the caris API which is the total opposite.

26
00:02:14,910 --> 00:02:20,710
It's very high level and very simple but there is a downside to very high level and very simple API

27
00:02:20,710 --> 00:02:21,350
eyes.

28
00:02:21,600 --> 00:02:26,130
They make it very easy to do common things but hard to do uncommon things

29
00:02:31,290 --> 00:02:36,420
pi talks on the other hand has been slowly gaining adoption in the field of deep learning thanks to

30
00:02:36,420 --> 00:02:42,370
the fact that it's relatively easy to do common things and still easy to do uncommon things.

31
00:02:42,600 --> 00:02:48,630
For this reason it's been extremely popular in the research community who by definition do lots of uncommon

32
00:02:48,630 --> 00:02:49,330
things.

33
00:02:49,380 --> 00:02:51,190
That's their job after all.

34
00:02:51,450 --> 00:02:57,660
Pi torch is developed mainly by another Internet giant Facebook specifically the Facebook API research

35
00:02:57,660 --> 00:02:58,680
lab.

36
00:02:58,680 --> 00:03:05,610
As with all deep learning libraries pi talk is supported by GPA acceleration and has automatic differentiation

37
00:03:05,910 --> 00:03:12,390
so there's no need in this course to use tools like calculus and linear algebra to derive back propagation.

38
00:03:12,390 --> 00:03:15,190
If you hate math then you should feel very lucky.

39
00:03:15,210 --> 00:03:20,100
In fact we won't discuss back propagation at all despite the fact that it's really the main ingredient

40
00:03:20,100 --> 00:03:22,220
that makes everything work.

41
00:03:22,230 --> 00:03:27,090
This combined with the fact that a lot of the basic building blocks are already built for you are some

42
00:03:27,090 --> 00:03:31,750
of the major advantages which are going to allow us to blast through each section very quickly.

43
00:03:32,010 --> 00:03:38,070
In the past it was necessary to focus on each architecture one at a time with Oren being the most complex

44
00:03:38,070 --> 00:03:44,720
to implement these days and ends CNN and aren't ends can be implemented in just a few lines of code.

45
00:03:49,930 --> 00:03:52,730
All right so how is this code structured.

46
00:03:52,730 --> 00:03:58,490
First before we even begin discussing deep learning or any type of statistical modeling we are going

47
00:03:58,490 --> 00:04:02,790
to look at a new coding environment which I really like called Google collab.

48
00:04:03,080 --> 00:04:09,640
Google collab is basically Japan notebook hosted by Google but with a lot more bells and whistles.

49
00:04:09,650 --> 00:04:14,720
Personally I have never been a fan of Japan a notebook since it seemed to have more disadvantages than

50
00:04:14,720 --> 00:04:15,550
advantages.

51
00:04:15,920 --> 00:04:19,020
However Google collab is a different beast.

52
00:04:19,100 --> 00:04:23,620
It's hosted by Google so you don't have to use your own computing resources.

53
00:04:23,750 --> 00:04:28,100
It's free so it doesn't matter if you have a slow computer or a fast computer.

54
00:04:28,100 --> 00:04:35,180
Everyone has the same access it gives you free access to the GP you A.P. you for orders of magnitude

55
00:04:35,270 --> 00:04:41,690
faster training and inference and finally most of what we need is already installed so you don't have

56
00:04:41,690 --> 00:04:44,470
to waste any time installing libraries yourself.

57
00:04:49,580 --> 00:04:55,250
Once you've done that we are going to go through the fundamental architectures involved in deep learning.

58
00:04:55,670 --> 00:05:00,950
Believe it or not this all starts with linear regression the line of best fit you learned about in high

59
00:05:00,950 --> 00:05:02,300
school physics.

60
00:05:02,540 --> 00:05:07,940
We'll see that with just one little change adding the logistic function on top of a linear model.

61
00:05:08,090 --> 00:05:15,340
We will get a neuron this covers the two major types of supervised learning classification and regression.

62
00:05:15,350 --> 00:05:19,400
Once you know the basics it's time to get started with deep learning.

63
00:05:19,400 --> 00:05:24,080
The first deep learning architecture you'll learn about is the artificial neural network also known

64
00:05:24,080 --> 00:05:26,330
these days as the deep neural network.

65
00:05:27,050 --> 00:05:33,500
Although neither of these names really do a good job of differentiating it with other architectures.

66
00:05:34,070 --> 00:05:38,710
Next we'll dive into image processing with convolution on their own that works.

67
00:05:38,750 --> 00:05:44,030
You'll learn about convolution and how this works to create neural networks that are specially designed

68
00:05:44,330 --> 00:05:50,660
to achieve better performance on images and other physical signals next you'll learn about recurrent

69
00:05:50,660 --> 00:05:54,080
neural networks which specialize at working with sequence data.

70
00:05:54,950 --> 00:06:00,110
Unlike my previous courses where we used RNA is mostly for natural language processing we are going

71
00:06:00,110 --> 00:06:02,670
to start with time series forecasting.

72
00:06:02,750 --> 00:06:07,940
This is going to let us cover a lot of ground that many other courses simply skip over entirely

73
00:06:13,090 --> 00:06:14,110
in particular.

74
00:06:14,200 --> 00:06:18,610
We're going to cover the difference between the wrong way to do a time series forecast and the right

75
00:06:18,610 --> 00:06:24,910
way we're going to look at several time series that simple models like linear regression can and can't

76
00:06:24,910 --> 00:06:31,270
solve as well as several time series that slightly more complex models like the simple are in can and

77
00:06:31,270 --> 00:06:38,110
can't solve for the more difficult problems we'll see how the Elysium achieves superior performance

78
00:06:38,650 --> 00:06:44,740
and so you'll see first hand why Alice teams are useful rather than me just telling you and you accepting

79
00:06:44,740 --> 00:06:47,490
it as fact without observing it for yourself.

80
00:06:49,650 --> 00:06:55,080
The next example is one of my favorites in the chorus which is on stock prediction with LSD MS and Arne

81
00:06:55,090 --> 00:06:56,180
ends.

82
00:06:56,190 --> 00:06:59,990
I think most of you will find the contents of these lectures very surprising.

83
00:07:00,630 --> 00:07:06,360
If you've ever learned about stock prediction with Ellis teams in the past be warned most other instructors

84
00:07:06,360 --> 00:07:08,130
are doing this the wrong way.

85
00:07:08,310 --> 00:07:14,250
In this course I'm going to teach you why it's wrong how to correct it and what some of the real obstacles

86
00:07:14,250 --> 00:07:20,810
are in stock prediction.

87
00:07:20,850 --> 00:07:25,920
The first major part of this course is about the fundamental architectures and deep learning while the

88
00:07:25,920 --> 00:07:30,230
second major part of this course is focused on applications.

89
00:07:30,300 --> 00:07:35,250
As a side note if you want to just skip the rest of this lecture and move on with the rest of the course

90
00:07:35,310 --> 00:07:36,620
that's fine too.

91
00:07:37,810 --> 00:07:41,160
The first application we will look at is natural language processing.

92
00:07:41,170 --> 00:07:47,800
Specifically with text documents you'll see how we can use deep learning for text classification which

93
00:07:47,800 --> 00:07:53,620
is the type of task you would use for spam detection sentiment analysis and so on.

94
00:07:53,620 --> 00:07:58,480
We'll also look at embedding which are deep learnings method of dealing with categorical data

95
00:08:03,550 --> 00:08:08,220
embedding will lead us to our next application on recommender systems.

96
00:08:08,230 --> 00:08:13,810
Now it might seem weird to think that natural language processing would be somehow related to recommender

97
00:08:13,810 --> 00:08:19,000
systems but hopefully in this course you will learn about the hidden connection between these two different

98
00:08:19,000 --> 00:08:25,600
fields recommender systems is all about how to optimize the products or items that you recommend to

99
00:08:25,600 --> 00:08:27,010
your users.

100
00:08:27,010 --> 00:08:33,120
Facebook Amazon and Google have been using recommender systems to improve their profits by the billions.

101
00:08:33,340 --> 00:08:38,350
When you go to Amazon.com it is full of recommended products for you.

102
00:08:38,350 --> 00:08:43,090
Facebook news feed and your Instagram feed are both recommendation lists.

103
00:08:43,090 --> 00:08:50,110
Google's search results advertisements and the YouTube sidebar are all examples of how Google uses recommendations

104
00:08:51,050 --> 00:08:57,130
ensure recommender systems are one of the most practical business applications of deep learning and

105
00:08:57,130 --> 00:09:01,960
one that can be used in a pretty straightforward manner to improve your profits.

106
00:09:01,960 --> 00:09:08,050
This is in contrast to say natural language where it's not exactly clear how that would directly tie

107
00:09:08,050 --> 00:09:10,150
in to how much money your business makes.

108
00:09:10,150 --> 00:09:13,740
Although it's already a fundamental part of our day to day lives.

109
00:09:13,780 --> 00:09:15,160
So don't discount it just yet.

110
00:09:20,260 --> 00:09:23,770
The next application we'll discuss was a game changer for deep learning.

111
00:09:24,370 --> 00:09:29,080
If you've taken any of my deep learning courses in the past you know that training a deep neural network

112
00:09:29,080 --> 00:09:31,290
takes time lots of time.

113
00:09:31,420 --> 00:09:35,570
You might end up waiting hours or even days or possibly even weeks.

114
00:09:35,590 --> 00:09:40,660
Luckily machine learning engineers have found new ways of allowing us to build on top of the work of

115
00:09:40,660 --> 00:09:42,350
others.

116
00:09:42,370 --> 00:09:47,980
The idea is companies like Google or Facebook or university research groups will train large neural

117
00:09:47,980 --> 00:09:53,620
networks on humongous datasets such as image net which contains over a million images.

118
00:09:53,650 --> 00:09:58,870
That's not the kind of thing you can feasibly do on your home computer using transfer learning what

119
00:09:58,870 --> 00:10:04,960
we can do is take just a part of their neural network and combine it with our own neural network designed

120
00:10:04,960 --> 00:10:07,260
for a specific task that we want to do.

121
00:10:08,420 --> 00:10:10,480
Results have shown that this is an easy.

122
00:10:10,580 --> 00:10:15,070
And more importantly fast method of building state of the art deep learning models.

123
00:10:15,200 --> 00:10:26,230
In just a few seconds you can integrate the power of bleeding edge known that works into your own applications.

124
00:10:26,230 --> 00:10:30,230
Next we have Ganz or generative adversarial networks.

125
00:10:30,250 --> 00:10:35,150
This was named as the most interesting invention in machine learning in the past 10 years.

126
00:10:35,260 --> 00:10:41,600
By Yann Le -- a deep learning pioneer who is now the chief A.I. scientists at Facebook.

127
00:10:41,770 --> 00:10:48,550
Ganz are all about generating images using a system of two neural networks one that generates the images

128
00:10:48,820 --> 00:10:53,430
and one whose job it is to discriminate between a real and fake images.

129
00:10:53,560 --> 00:10:59,560
Using this pair of neural networks they both help each other improve so the discriminator becomes better

130
00:10:59,560 --> 00:11:01,650
and better at recognizing fakes.

131
00:11:01,810 --> 00:11:07,270
While the generator becomes better and better at creating realistic images that cannot be differentiated

132
00:11:07,270 --> 00:11:14,260
from fakes when people think about deep learning they don't often think about how neuron that West can

133
00:11:14,260 --> 00:11:19,630
be used to generate things but rather just how neural networks can make predictions on things.

134
00:11:19,630 --> 00:11:24,080
Predictions are nice but generating things opens up a whole new world.

135
00:11:24,100 --> 00:11:30,160
Recently Google announced a conversational A.I. where you could have a personal assistant do something

136
00:11:30,160 --> 00:11:32,740
like call a restaurant and make a reservation for you.

137
00:11:33,460 --> 00:11:36,000
But that personal assistant is not a person.

138
00:11:36,010 --> 00:11:37,320
It is a robot.

139
00:11:37,390 --> 00:11:39,820
All of its speech is generated by a computer

140
00:11:44,960 --> 00:11:50,900
another exciting application of deep learning is in a reinforcement learning usually deep learning is

141
00:11:50,900 --> 00:11:54,590
used to create models with a static input and output.

142
00:11:54,620 --> 00:12:00,950
For example you put in an image and it tells you what this image is of like a car or a truck or you

143
00:12:00,950 --> 00:12:05,360
put in an email and you're known that work tells you if it's spam or not spam.

144
00:12:05,570 --> 00:12:11,420
But some tasks require multiple steps which requires having some long term strategy and keeping track

145
00:12:11,420 --> 00:12:12,610
of state.

146
00:12:12,740 --> 00:12:18,200
For example if you are playing a game like Super Mario you can't just look at a still image of the game

147
00:12:18,200 --> 00:12:19,810
and decide what to do.

148
00:12:19,850 --> 00:12:22,820
Instead you need to have a long term strategy.

149
00:12:22,940 --> 00:12:27,270
You have to know that you should walk forward to reach the flag for that level.

150
00:12:27,320 --> 00:12:32,160
You have to know that if an enemy is walking towards you you have to avoid it or attack it.

151
00:12:32,330 --> 00:12:35,540
And also you have to have proper timing in doing so.

152
00:12:35,540 --> 00:12:37,690
This is what reinforcement learning is all about.

153
00:12:42,750 --> 00:12:45,720
To summarize the outline of the Course is like this.

154
00:12:45,870 --> 00:12:51,120
We can split it into two parts Part 1 architectures and part 2 applications.

155
00:12:51,330 --> 00:12:56,060
In part one we discuss the fundamental architectures such as linear models and ends.

156
00:12:56,070 --> 00:12:59,080
CNN is an aunt it's in part two.

157
00:12:59,190 --> 00:13:05,400
We discuss applications such as an AP recommender systems Ganz transfer learning reinforcement learning

158
00:13:05,430 --> 00:13:07,150
and possibly more.

159
00:13:07,170 --> 00:13:09,300
Thanks for listening and I'll see you in the next lecture.