1
00:00:00,000 --> 00:00:03,225
So let's start with
this basic neural network.

2
00:00:03,225 --> 00:00:06,315
It has an embedding
taking my vocab size,

3
00:00:06,315 --> 00:00:09,360
embedding dimensions, and
input length as usual.

4
00:00:09,360 --> 00:00:11,820
The output from
the embedding is flattened,

5
00:00:11,820 --> 00:00:15,135
averaged, and then fed into
a dense neural network.

6
00:00:15,135 --> 00:00:16,770
But we can experiment with

7
00:00:16,770 --> 00:00:18,735
the layers that
bridge the embedding

8
00:00:18,735 --> 00:00:20,280
and the dense by removing

9
00:00:20,280 --> 00:00:22,590
the flatten and puling from here,

10
00:00:22,590 --> 00:00:25,730
and replacing them with
an LSTM like this.

11
00:00:25,730 --> 00:00:29,554
For a trainee using the sarcasm
data-set with these,

12
00:00:29,554 --> 00:00:32,420
when just using the pooling
and flattening,

13
00:00:32,420 --> 00:00:35,164
I quickly got close to
85 percent accuracy

14
00:00:35,164 --> 00:00:36,710
and then it flattened out there.

15
00:00:36,710 --> 00:00:39,410
The validation set was
a little less accurate,

16
00:00:39,410 --> 00:00:41,495
but the curves we're
quite in sync.

17
00:00:41,495 --> 00:00:44,625
On the other hand,
when using LSTM,

18
00:00:44,625 --> 00:00:46,820
I reached 85 percent accuracy

19
00:00:46,820 --> 00:00:49,220
really quickly and
continued climbing

20
00:00:49,220 --> 00:00:53,975
towards about 97.5 percent
accuracy within 50 epochs.

21
00:00:53,975 --> 00:00:56,720
The validation set
dropped slowly,

22
00:00:56,720 --> 00:00:57,950
but it was still close to

23
00:00:57,950 --> 00:01:01,175
the same value as
the non- LSTM version.

24
00:01:01,175 --> 00:01:03,230
Still the drop indicates that

25
00:01:03,230 --> 00:01:05,180
there's some over
fitting going on here.

26
00:01:05,180 --> 00:01:08,755
So a bit of tweaking to
the LSTM should help fix that.

27
00:01:08,755 --> 00:01:12,990
Similarly, the loss values
from my non-LSTM

28
00:01:12,990 --> 00:01:14,600
one got to healthy state

29
00:01:14,600 --> 00:01:17,045
quite quickly and
then flattened out.

30
00:01:17,045 --> 00:01:18,905
Whereas with the LSTM,

31
00:01:18,905 --> 00:01:20,779
the training loss drop nicely,

32
00:01:20,779 --> 00:01:24,440
but the validation one increased
as I continue training.

33
00:01:24,440 --> 00:01:28,580
Again, this shows some over
fitting in the LSTM network.

34
00:01:28,580 --> 00:01:31,475
While the accuracy of
the prediction increased,

35
00:01:31,475 --> 00:01:34,130
the confidence in it decreased.

36
00:01:34,130 --> 00:01:36,380
So you should be
careful to adjust

37
00:01:36,380 --> 00:01:38,090
your training parameters when

38
00:01:38,090 --> 00:01:40,040
you use different network types,

39
00:01:40,040 --> 00:01:43,650
it's not just a straight
drop-in like I did here.