0
1
00:00:00,420 --> 00:00:07,320
In reinforcement learning, this is an area of machine learning, it is about taking suitable action
1

2
00:00:07,320 --> 00:00:15,030
to maximize reward in a particular situation, it is employed by various software and machines to find
2

3
00:00:15,030 --> 00:00:20,520
the best possible behavior or path it should take in a specific situation.
3

4
00:00:20,700 --> 00:00:28,410
We can also say that reinforcement learning differs from supervised learning in a way that, in the supervised
4

5
00:00:28,410 --> 00:00:29,040
learning.
5

6
00:00:29,280 --> 00:00:37,680
The training data has the answer key with it, so the model is trained with the correct answer itself.
6

7
00:00:38,250 --> 00:00:41,820
But for the reinforcement learning, there is no answer.
7

8
00:00:41,820 --> 00:00:50,190
Enforcement agent decides what to do to perform the given task, meaning the machine itself find a way
8

9
00:00:50,370 --> 00:00:54,210
to solve a particular solution by receiving rewards.
9

10
00:00:55,170 --> 00:00:59,670
Here we have the environment and we have an agent. First,
10

11
00:00:59,940 --> 00:01:06,790
this agent needs to take some action to observe the area. If it just goes straight,
11

12
00:01:06,810 --> 00:01:08,030
there is no reward!
12

13
00:01:08,220 --> 00:01:11,470
So first it would be observation, after just observed,
13

14
00:01:11,820 --> 00:01:18,690
there are some internal state and based on these learning processes, it will take some action if
14

15
00:01:18,690 --> 00:01:22,920
it goes in this example to the right side, again there is no reward.
15

16
00:01:23,250 --> 00:01:31,000
But if it goes to the backward and then turns left, then there is a reward, it will receive a reward.
16

17
00:01:31,260 --> 00:01:33,240
So meaning this is a correct path!
17

18
00:01:33,540 --> 00:01:39,690
It will goes its way until it finally satisfied a system and achieve the goal.
