1
00:00:11,610 --> 00:00:17,520
To end and summarize this section I want to discuss at a high level what it takes to learn it reinforcement

2
00:00:17,520 --> 00:00:18,570
learning for real.

3
00:00:19,470 --> 00:00:24,450
Personally I think it would be quite difficult to grasp reinforcement learning and understand it on

4
00:00:24,450 --> 00:00:28,530
an abstract level from just a single section of a single course

5
00:00:33,680 --> 00:00:39,320
let's start by recognizing that reinforcement learning is very different from supervised and unsupervised

6
00:00:39,320 --> 00:00:39,740
learning.

7
00:00:40,790 --> 00:00:46,760
Nowadays when you're studying supervised and unsupervised learning many beginner courses take the approach

8
00:00:47,030 --> 00:00:51,230
of avoiding how to implement the models.

9
00:00:51,230 --> 00:00:57,070
This is not a good approach because it doesn't prepare you for understanding real machine learning.

10
00:00:57,110 --> 00:01:02,600
Let's suppose the structure of the courses like this First you discuss the intuition behind how the

11
00:01:02,600 --> 00:01:03,600
model works.

12
00:01:03,740 --> 00:01:06,920
Just some pictures and some descriptive ideas.

13
00:01:07,050 --> 00:01:09,400
Then you learn how to use that model in psyche.

14
00:01:09,410 --> 00:01:13,160
Learn not including loading in the data or looking at the results.

15
00:01:14,620 --> 00:01:16,480
So what is the problem with this approach.

16
00:01:17,260 --> 00:01:22,660
Well it allows you to not only make many mistakes and have serious misunderstandings about the content

17
00:01:23,290 --> 00:01:29,230
but perhaps the biggest problem with this approach is that you will never know what those misunderstandings

18
00:01:29,230 --> 00:01:30,470
are.

19
00:01:30,490 --> 00:01:34,520
In other words you are in a place where you don't know what you don't know.

20
00:01:34,720 --> 00:01:38,230
You don't know your mistakes and hence you can't fix them.

21
00:01:38,290 --> 00:01:44,500
You're confident that you've learned something but this is nothing but blissful ignorance and again

22
00:01:44,530 --> 00:01:46,860
these are just my observations of other students.

23
00:01:46,870 --> 00:01:55,030
So don't take it as a personal judgement of you.

24
00:01:55,110 --> 00:01:56,450
This is not so much a problem.

25
00:01:56,460 --> 00:02:02,130
If that's how far you want to go if you only ever plan on using AP eyes and you'll never want to become

26
00:02:02,130 --> 00:02:08,400
an actual professional that's totally fine and nobody is judging you but it becomes a problem when it's

27
00:02:08,400 --> 00:02:10,690
time to stop using AP eyes.

28
00:02:11,010 --> 00:02:15,920
This turning point appears when you want to learn how to do reinforcement learning.

29
00:02:16,020 --> 00:02:19,380
There is no API for reinforcement learning at least not yet.

30
00:02:20,190 --> 00:02:22,200
So what is one to do.

31
00:02:22,200 --> 00:02:25,340
Well you have to implement those algorithms yourself.

32
00:02:25,350 --> 00:02:31,320
The problem is if you have no experience implementing machine learning algorithms reinforcement learning

33
00:02:31,500 --> 00:02:32,910
is not a good place to start.

34
00:02:38,050 --> 00:02:43,900
I want to also comment about the many blogs tutorials and so forth that exist out there attempting to

35
00:02:43,900 --> 00:02:46,840
teach you about reinforcement learning.

36
00:02:46,870 --> 00:02:52,390
Consider how long it took us in this section to go from nothing to deep Q learning at a pace that I

37
00:02:52,390 --> 00:02:55,430
would consider as fast as possible.

38
00:02:55,450 --> 00:02:59,140
Compare that to the length of a typical blog tutorial.

39
00:02:59,140 --> 00:03:04,810
It should be clear that it would be impossible for a short blog post to include enough detail to get

40
00:03:04,810 --> 00:03:08,650
you to the point where you can actually implement reinforcement learning yourself.

41
00:03:13,920 --> 00:03:18,840
The best advice I can give you if you want to learn reinforcement learning for real is this.

42
00:03:19,500 --> 00:03:25,040
I would recommend taking a full course or multiple courses about reinforcement learning.

43
00:03:25,380 --> 00:03:31,380
Learn about basic tabular reinforcement learning before you move on to approximation methods that will

44
00:03:31,380 --> 00:03:37,450
make understanding the concepts much easier in this section we spent most of our time discussing the

45
00:03:37,450 --> 00:03:43,240
concepts under the assumption that we were using a cue table and not a neural network.

46
00:03:43,300 --> 00:03:48,190
You want to learn about the basic approaches to reinforcement learning which can be categorized into

47
00:03:48,490 --> 00:03:51,370
number one dynamic programming methods.

48
00:03:51,370 --> 00:03:53,580
Number two Monte Carlo methods.

49
00:03:53,710 --> 00:03:54,850
And number three.

50
00:03:54,850 --> 00:03:56,620
Temporal Difference methods.

51
00:03:56,830 --> 00:03:59,140
And make sure to spend some time on implementation

52
00:04:04,240 --> 00:04:05,150
after that.

53
00:04:05,230 --> 00:04:08,770
Move on to approximation methods with linear models.

54
00:04:08,830 --> 00:04:13,660
Once you've done that and you've implemented it reinforcement learning algorithms with linear models

55
00:04:13,930 --> 00:04:20,660
move on to using deep learning for function approximation at this point you should be very comfortable

56
00:04:20,660 --> 00:04:25,660
with the concepts but implementation is still another matter entirely.

57
00:04:26,210 --> 00:04:32,030
Unlike supervised and unsupervised learning reinforcement learning is very hard to get right even if

58
00:04:32,030 --> 00:04:34,370
you are an expert programmer.

59
00:04:34,370 --> 00:04:39,860
It is very easy to introduce subtle bugs that are difficult or nearly impossible to track down

60
00:04:45,060 --> 00:04:47,810
and of course you don't have to take my word for it.

61
00:04:47,810 --> 00:04:53,550
Here is a post by Andre Carr party probably one of the most famous reinforcement learning researchers

62
00:04:53,850 --> 00:05:00,360
talking about just how difficult it can be and this come in another user talks about how they spent

63
00:05:00,390 --> 00:05:06,420
an entire year trying to get Q learning to work with neuron that works for a particular gain.

64
00:05:06,480 --> 00:05:12,200
Many students who take my courses get frustrated when an exercise takes more than a few minutes.

65
00:05:12,210 --> 00:05:18,990
Now imagine working on something for one year Andre response to this comment and says it took me six

66
00:05:18,990 --> 00:05:25,290
weeks to get policy gradients working and keep in mind he's one of the world's leading researchers in

67
00:05:25,290 --> 00:05:26,590
reinforcement learning.

68
00:05:26,610 --> 00:05:32,160
He has all the state of the art technology he needs on demand and he's surrounded by other researchers

69
00:05:32,400 --> 00:05:37,390
who are just as smart as he is so he can ask them for advice or to check his code at any time.

70
00:05:38,710 --> 00:05:44,310
So the key idea I want you to keep in mind is that as you learn reinforcement learning realize that

71
00:05:44,310 --> 00:05:48,490
implementation is very non-trivial even for experts.

72
00:05:48,510 --> 00:05:52,100
You should expect to spend a lot of time and effort trying to get things right.