1
00:00:00,000 --> 00:00:02,040
Last week, we looked at doing

2
00:00:02,040 --> 00:00:04,290
classification using
texts and trying to

3
00:00:04,290 --> 00:00:05,639
train and understand

4
00:00:05,639 --> 00:00:08,685
positive and negative sentiment
in movie reviews.

5
00:00:08,685 --> 00:00:12,135
We finished by looking at
the effect of tokenizing words,

6
00:00:12,135 --> 00:00:14,100
and saw that
our classifier failed

7
00:00:14,100 --> 00:00:16,200
to get any meaningful results.

8
00:00:16,200 --> 00:00:18,570
The main reason for this
was that the context

9
00:00:18,570 --> 00:00:20,820
of words was hard to follow when

10
00:00:20,820 --> 00:00:22,290
the words were broken down into

11
00:00:22,290 --> 00:00:24,690
sub-words and the
sequence in which

12
00:00:24,690 --> 00:00:26,790
the tokens for
the sub-words appear

13
00:00:26,790 --> 00:00:29,550
becomes very important in
understanding their meaning.

14
00:00:29,550 --> 00:00:31,395
Let's take a look at that now.

15
00:00:31,395 --> 00:00:33,960
The neural network is like

16
00:00:33,960 --> 00:00:37,170
a function that when you
feed it in data and labels,

17
00:00:37,170 --> 00:00:39,015
it infers the rules from these,

18
00:00:39,015 --> 00:00:41,055
and then you can use those rules.

19
00:00:41,055 --> 00:00:44,245
So it could be seen as a
function a little bit like this,

20
00:00:44,245 --> 00:00:46,615
you take the data and
you take the labels,

21
00:00:46,615 --> 00:00:47,900
and you get the rules.

22
00:00:47,900 --> 00:00:51,695
But this doesn't take any kind
of sequence into account.

23
00:00:51,695 --> 00:00:54,710
To understand why sequences
can be important,

24
00:00:54,710 --> 00:00:56,690
consider this set of numbers.

25
00:00:56,690 --> 00:00:58,130
If you've never seen them before,

26
00:00:58,130 --> 00:01:00,350
they're called the
Fibonacci sequence.

27
00:01:00,350 --> 00:01:02,330
So let's replace
the actual values

28
00:01:02,330 --> 00:01:04,100
with variables such as n_0,

29
00:01:04,100 --> 00:01:07,265
n_1 and n_2, etc.,
to denote them.

30
00:01:07,265 --> 00:01:10,670
Then the sequence
itself can be derived

31
00:01:10,670 --> 00:01:13,955
where a number is the sum of
the two numbers before it.

32
00:01:13,955 --> 00:01:16,040
So 3 is 2 plus 1,

33
00:01:16,040 --> 00:01:17,915
5 is 2 plus 3,

34
00:01:17,915 --> 00:01:20,180
8 is 3 plus 5, etc.

35
00:01:20,180 --> 00:01:23,165
Our n_x equals n_x minus 1,

36
00:01:23,165 --> 00:01:24,725
plus n_x minus 2,

37
00:01:24,725 --> 00:01:27,665
where x is the position
in the sequence.

38
00:01:27,665 --> 00:01:31,430
Visualized, it might
also look like this,

39
00:01:31,430 --> 00:01:32,900
one and two feed into

40
00:01:32,900 --> 00:01:35,200
the first function
and three comes out.

41
00:01:35,200 --> 00:01:37,280
Two gets carried
over to the next,

42
00:01:37,280 --> 00:01:40,625
where it's fed in along with
the three to give us a five.

43
00:01:40,625 --> 00:01:43,520
The three is carried on to
the next where it's fed into

44
00:01:43,520 --> 00:01:45,020
the function along
with the five to

45
00:01:45,020 --> 00:01:46,985
get an eight and so on.

46
00:01:46,985 --> 00:01:49,760
This is similar to
the basic idea of

47
00:01:49,760 --> 00:01:52,760
a recurrent neural
network or RNN,

48
00:01:52,760 --> 00:01:55,565
which is often drawn
a little like this.

49
00:01:55,565 --> 00:01:59,045
You have your x as in input
and your y as an output.

50
00:01:59,045 --> 00:02:01,145
But there's also
an element that's fed

51
00:02:01,145 --> 00:02:04,265
into the function from
a previous function.

52
00:02:04,265 --> 00:02:06,740
That becomes a little
more clear when

53
00:02:06,740 --> 00:02:09,295
you chain them
together like this,

54
00:02:09,295 --> 00:02:12,875
x_0 is fed into
the function returning y_0.

55
00:02:12,875 --> 00:02:14,780
An output from the function is

56
00:02:14,780 --> 00:02:16,820
then fed into the next function,

57
00:02:16,820 --> 00:02:18,890
which gets fed into
the function along

58
00:02:18,890 --> 00:02:21,635
with x_2 to get y_2,

59
00:02:21,635 --> 00:02:25,100
producing an output and
continuing the sequence.

60
00:02:25,100 --> 00:02:26,480
As you can see,

61
00:02:26,480 --> 00:02:28,250
there's an element of x_0

62
00:02:28,250 --> 00:02:30,800
fed all the way
through the network,

63
00:02:30,800 --> 00:02:33,835
similar with x_1 and x_2 etc.

64
00:02:33,835 --> 00:02:36,350
This forms the basis of

65
00:02:36,350 --> 00:02:38,915
the recurrent neural
network or RNN.

66
00:02:38,915 --> 00:02:41,540
I'm not going to go into
detail and how they work,

67
00:02:41,540 --> 00:02:43,010
but you can learn much more about

68
00:02:43,010 --> 00:02:45,750
them at this course from Andrew.