1
00:00:00,000 --> 00:00:02,070
Okay so let's take
a look at the code,

2
00:00:02,070 --> 00:00:03,720
that we're going to use
in this environment.

3
00:00:03,720 --> 00:00:05,550
So if you're going to
use this workbook,

4
00:00:05,550 --> 00:00:08,565
please make sure you're using
a Python three environment.

5
00:00:08,565 --> 00:00:09,990
So we're going to
go in here and say

6
00:00:09,990 --> 00:00:11,010
change run time type,

7
00:00:11,010 --> 00:00:12,600
make sure it's Python three,

8
00:00:12,600 --> 00:00:14,010
and I am just leaving as a GPU

9
00:00:14,010 --> 00:00:16,080
accelerator to speed things up.

10
00:00:16,080 --> 00:00:18,420
If you import TensorFlow as tf,

11
00:00:18,420 --> 00:00:19,620
and print out the tf version

12
00:00:19,620 --> 00:00:21,120
you'll see the
TensorFlow version.

13
00:00:21,120 --> 00:00:23,010
If you're doing this
course at a later date,

14
00:00:23,010 --> 00:00:24,450
this may be 2.0.

15
00:00:24,450 --> 00:00:26,460
If it is 2.0 then you do not

16
00:00:26,460 --> 00:00:28,425
need to enable eager execution,

17
00:00:28,425 --> 00:00:29,700
but because its 113,

18
00:00:29,700 --> 00:00:32,160
I'm going to turn
on eager execution.

19
00:00:32,160 --> 00:00:35,760
Also, if you don't have
TensorFlow datasets installed,

20
00:00:35,760 --> 00:00:37,140
this line will install it for you

21
00:00:37,140 --> 00:00:39,390
but right now I do
have it installed.

22
00:00:39,390 --> 00:00:41,930
So the next thing I'm
going to do is just import

23
00:00:41,930 --> 00:00:44,300
TensorFlow datasets as TFDS.

24
00:00:44,300 --> 00:00:47,315
On TFDS load, IMDb reviews

25
00:00:47,315 --> 00:00:50,315
will give me an IMDb
set and an info set.

26
00:00:50,315 --> 00:00:52,670
I'm going to use the info
set in another video,

27
00:00:52,670 --> 00:00:53,960
we're not going to use it here.

28
00:00:53,960 --> 00:00:55,370
But the IMDb set is

29
00:00:55,370 --> 00:00:57,690
what we're going to
be looking at here.

30
00:00:57,760 --> 00:01:02,510
So next up is the code
where I'm going to get

31
00:01:02,510 --> 00:01:03,890
my IMDb training and

32
00:01:03,890 --> 00:01:05,180
my IMDb testing and

33
00:01:05,180 --> 00:01:06,890
load them into training
and test data.

34
00:01:06,890 --> 00:01:08,390
I'm going to create lists of

35
00:01:08,390 --> 00:01:09,920
training sentences and labels,

36
00:01:09,920 --> 00:01:11,570
testing sentences and labels,

37
00:01:11,570 --> 00:01:13,595
and I'm going to
copy the contents of

38
00:01:13,595 --> 00:01:15,935
the tensors into these

39
00:01:15,935 --> 00:01:18,080
so that I can encode
and pad them later.

40
00:01:18,080 --> 00:01:20,825
So this code will just do that.

41
00:01:20,825 --> 00:01:23,150
Secondly also for training,

42
00:01:23,150 --> 00:01:25,460
I need numpy arrays instead
of just straight array.

43
00:01:25,460 --> 00:01:27,590
So I'm going to convert
my training labels

44
00:01:27,590 --> 00:01:30,410
that I created into
a numpy array like this.

45
00:01:30,410 --> 00:01:32,570
So that's all done.

46
00:01:32,570 --> 00:01:37,235
So next up is where I'm going
to do my sentence encoding.

47
00:01:37,235 --> 00:01:40,400
So I've decided I'm going to
do a 10,000-word vocab size.

48
00:01:40,400 --> 00:01:41,735
Of course you can change that.

49
00:01:41,735 --> 00:01:43,700
My embedding dimensions,
which you'll see in

50
00:01:43,700 --> 00:01:45,725
a moment will be 16 dimensions.

51
00:01:45,725 --> 00:01:47,690
I'm going to make sure
all of my reviews

52
00:01:47,690 --> 00:01:49,820
are 120 words long.

53
00:01:49,820 --> 00:01:52,655
So if they are shorter than
that there'll be padded.

54
00:01:52,655 --> 00:01:55,025
If there longer than that
they'll be truncated.

55
00:01:55,025 --> 00:01:57,380
I'm setting the truncation
type to be post,

56
00:01:57,380 --> 00:01:58,460
so we'll cut off the back

57
00:01:58,460 --> 00:02:00,005
of the review and not the front,

58
00:02:00,005 --> 00:02:04,200
and then my outer vocabulary
token will be OOV like this.

59
00:02:05,570 --> 00:02:07,770
So now if I import

60
00:02:07,770 --> 00:02:10,550
my TensorFlow keras
preprocessing tokenizer,

61
00:02:10,550 --> 00:02:13,870
and pad sequences is
in sequence as before.

62
00:02:13,870 --> 00:02:16,015
I'll instantiate my tokenizer

63
00:02:16,015 --> 00:02:18,250
passing it my vocabulary size,

64
00:02:18,250 --> 00:02:20,755
which as I said here
earlier on was 10,000.

65
00:02:20,755 --> 00:02:22,165
Of course you can change that.

66
00:02:22,165 --> 00:02:24,475
Then my outer vocabulary token,

67
00:02:24,475 --> 00:02:27,850
the tokenizer will then be fit
on the training sentences,

68
00:02:27,850 --> 00:02:30,670
not the testing sentences
just the training ones.

69
00:02:30,670 --> 00:02:33,429
If I want to look at the word
index for the tokenizer,

70
00:02:33,429 --> 00:02:36,440
all have to do say
tokenizer.word_index.

71
00:02:36,440 --> 00:02:38,890
I will then convert

72
00:02:38,890 --> 00:02:42,190
my sentences into
sequences of numbers,

73
00:02:42,190 --> 00:02:43,690
with the number being the value

74
00:02:43,690 --> 00:02:45,370
and the word being the key that

75
00:02:45,370 --> 00:02:47,410
were taken out of
the training sentences

76
00:02:47,410 --> 00:02:48,950
when I did fit on text,

77
00:02:48,950 --> 00:02:50,720
and that will give me now my list

78
00:02:50,720 --> 00:02:52,865
of integers per sentence.

79
00:02:52,865 --> 00:02:55,070
If I want to pad
or truncate them I

80
00:02:55,070 --> 00:02:57,140
use pads sequences to do that.

81
00:02:57,140 --> 00:03:01,040
So each of my sentences
is now a list of numbers.

82
00:03:01,040 --> 00:03:04,370
Again, those numbers are
the values in a key value pair,

83
00:03:04,370 --> 00:03:06,980
where the key of
course is the word.

84
00:03:06,980 --> 00:03:09,050
The pad sequences will

85
00:03:09,050 --> 00:03:10,460
ensure that they're
all the same length,

86
00:03:10,460 --> 00:03:15,305
which in this case is 120
words or 120 numbers.

87
00:03:15,305 --> 00:03:17,795
There'll be padded out
or truncated to suit.

88
00:03:17,795 --> 00:03:20,330
I'm then going to do the same
with the testing sequences,

89
00:03:20,330 --> 00:03:22,025
and I'm going to pad
them in the same way.

90
00:03:22,025 --> 00:03:25,730
Do note that the testing
sequences are tokenized based

91
00:03:25,730 --> 00:03:27,410
on the word index that was

92
00:03:27,410 --> 00:03:29,870
learned from the training words.

93
00:03:29,870 --> 00:03:32,810
So you may find a lot more OOVs,

94
00:03:32,810 --> 00:03:34,070
in the testing sequence

95
00:03:34,070 --> 00:03:35,450
than you would have in
the training sequence,

96
00:03:35,450 --> 00:03:36,560
because there'll
be a lot of words

97
00:03:36,560 --> 00:03:37,805
that it hasn't encountered.

98
00:03:37,805 --> 00:03:39,530
But that's what makes
it a good test,

99
00:03:39,530 --> 00:03:41,690
because later if you're
going to try out a review,

100
00:03:41,690 --> 00:03:44,960
you want to be able to do it
to see how it will do with

101
00:03:44,960 --> 00:03:46,519
words that the tokenizer

102
00:03:46,519 --> 00:03:49,500
or the neural network
hasn't previously seen.

103
00:03:49,790 --> 00:03:53,085
So I'll now run this code.

104
00:03:53,085 --> 00:03:56,340
I'll create my sequences,

105
00:03:56,340 --> 00:03:57,600
my padded sequences, my

106
00:03:57,600 --> 00:03:59,465
testing sequences,
my testing patterns.

107
00:03:59,465 --> 00:04:01,680
This will take a few moments.

108
00:04:02,470 --> 00:04:04,880
So I can now explore
what this looks

109
00:04:04,880 --> 00:04:06,420
like by running
this block of code.

110
00:04:06,420 --> 00:04:08,270
So for example here
you can see I've just

111
00:04:08,270 --> 00:04:10,460
taken my reverse word index,

112
00:04:10,460 --> 00:04:13,550
and I can decode
my review by taking

113
00:04:13,550 --> 00:04:15,230
a look at the numbers in

114
00:04:15,230 --> 00:04:17,210
that review and reversing
that into a word.

115
00:04:17,210 --> 00:04:19,265
So taking the key for that value

116
00:04:19,265 --> 00:04:21,635
and reverse word index
flips the key value pair.

117
00:04:21,635 --> 00:04:25,010
So we can see here that
the decoded review,

118
00:04:25,010 --> 00:04:26,210
this is what would be fed in.

119
00:04:26,210 --> 00:04:27,710
I saw this this film
on true movies,

120
00:04:27,710 --> 00:04:29,630
which automatically made me

121
00:04:29,630 --> 00:04:32,120
out of vocabulary and
the original text.

122
00:04:32,120 --> 00:04:33,935
You can see as capitalized,

123
00:04:33,935 --> 00:04:36,870
there is punctuation like
brackets and commas in there,

124
00:04:36,870 --> 00:04:38,810
and the word skeptical ended up

125
00:04:38,810 --> 00:04:41,210
being not one of the top
1,000 that was used.

126
00:04:41,210 --> 00:04:43,550
So just gives us a nice way
of looking at the type of

127
00:04:43,550 --> 00:04:44,570
data that we're going to be

128
00:04:44,570 --> 00:04:46,705
feeding into the neural network.

129
00:04:46,705 --> 00:04:50,090
So let's now take a look at
the neural network itself,

130
00:04:50,090 --> 00:04:51,410
and it's very simple.

131
00:04:51,410 --> 00:04:53,825
It's just a sequential.

132
00:04:53,825 --> 00:04:56,270
The top layer of this is
going to be an embedding,

133
00:04:56,270 --> 00:04:58,220
the embedding is going
to be my vocab size,

134
00:04:58,220 --> 00:05:00,040
the embedding dimensions
that I wanted to use,

135
00:05:00,040 --> 00:05:01,625
I had specified 16.

136
00:05:01,625 --> 00:05:03,845
My input length for that is 120,

137
00:05:03,845 --> 00:05:06,080
which is the maximum length
of the reviews.

138
00:05:06,080 --> 00:05:09,005
So the output of the embedding
will then be flattened,

139
00:05:09,005 --> 00:05:11,000
that will then be passed
into a dense layer,

140
00:05:11,000 --> 00:05:13,610
which is six neurons and
then that will be passed to

141
00:05:13,610 --> 00:05:15,890
a final layer with
a sigmoid activation

142
00:05:15,890 --> 00:05:17,180
and only one neuron,

143
00:05:17,180 --> 00:05:18,950
because I know I've
only got two classes.

144
00:05:18,950 --> 00:05:21,075
I'm just going to do
one neuron instead of two.

145
00:05:21,075 --> 00:05:22,485
I didn't need to hard encode.

146
00:05:22,485 --> 00:05:23,840
I'll just do one neuron

147
00:05:23,840 --> 00:05:25,760
and my activation function
being a sigmoid,

148
00:05:25,760 --> 00:05:28,570
it will push it to zero
or one respectively.

149
00:05:28,570 --> 00:05:31,850
I can then compile that and
take a look at the summary.

150
00:05:31,850 --> 00:05:35,645
Here's the summary,
it all looks good.

151
00:05:35,645 --> 00:05:39,590
Again, each of my sentences,
120 characters,

152
00:05:39,590 --> 00:05:41,410
my embedding has 16,

153
00:05:41,410 --> 00:05:43,395
and out of that

154
00:05:43,395 --> 00:05:46,010
the flattened thing
we'll have 1,920 values.

155
00:05:46,010 --> 00:05:47,150
They get fed into the dense.

156
00:05:47,150 --> 00:05:49,220
They get fed into
the output layer.

157
00:05:49,220 --> 00:05:52,160
So let's train it. So I'm

158
00:05:52,160 --> 00:05:54,970
going to set just for 10 epochs
and I'm going to fit it.

159
00:05:54,970 --> 00:05:56,700
So it's training, it's correct,

160
00:05:56,700 --> 00:06:01,020
25,000 samples and validating
on 25,000 samples.

161
00:06:01,020 --> 00:06:02,820
Let's see it training.

162
00:06:02,820 --> 00:06:04,350
Our accuracy starts at

163
00:06:04,350 --> 00:06:08,630
73 percent on the training set,
85 on validation.

164
00:06:08,630 --> 00:06:11,330
Training over time is
going to go up nicely.

165
00:06:11,330 --> 00:06:14,915
We're most likely
overfitting on this,

166
00:06:14,915 --> 00:06:17,270
because our accuracy is so high,

167
00:06:17,270 --> 00:06:19,070
but even that a validation
accuracy is not

168
00:06:19,070 --> 00:06:21,870
bad. It's in the 80s.

169
00:06:37,420 --> 00:06:40,130
So we can see even by epoch

170
00:06:40,130 --> 00:06:42,365
seven our accuracy up to a one.

171
00:06:42,365 --> 00:06:44,150
Our validation
accuracy is still in

172
00:06:44,150 --> 00:06:46,325
the low 80s, 81, 82 percent.

173
00:06:46,325 --> 00:06:49,800
Pretty good, but
this clear overfitting going on.

174
00:06:50,570 --> 00:06:53,360
So by the time I've
reached my final epoch,

175
00:06:53,360 --> 00:06:55,700
my training accuracy
was 100 percent,

176
00:06:55,700 --> 00:06:58,655
my validation accuracy
at 82.35 percent.

177
00:06:58,655 --> 00:07:01,735
It's quite healthy but I'm
sure we could do better.

178
00:07:01,735 --> 00:07:05,090
So now let's take
a look at what we'll

179
00:07:05,090 --> 00:07:08,225
do to view this in
the embedding projector.

180
00:07:08,225 --> 00:07:10,310
So first of all,
I'm going to take

181
00:07:10,310 --> 00:07:12,335
the output of my embedding,

182
00:07:12,335 --> 00:07:13,820
which was modeled out layer zero,

183
00:07:13,820 --> 00:07:15,815
and we can see that there were

184
00:07:15,815 --> 00:07:19,690
10,000 possible words
and I had 16 dimensions.

185
00:07:19,690 --> 00:07:22,370
Here is where I'm going
to iterate through

186
00:07:22,370 --> 00:07:26,165
that array to pull out
the 16 dimensions,

187
00:07:26,165 --> 00:07:28,010
the values for
the 16 dimensions per

188
00:07:28,010 --> 00:07:30,485
word and write that as out_V,

189
00:07:30,485 --> 00:07:33,055
which is my vectors.tsv.

190
00:07:33,055 --> 00:07:35,105
Then the actual word associated

191
00:07:35,105 --> 00:07:37,415
with that will be
written to out_M,

192
00:07:37,415 --> 00:07:39,830
which is my meta.tsv.

193
00:07:39,830 --> 00:07:42,400
So if I run that,

194
00:07:42,400 --> 00:07:44,190
it we'll do its trick
and if you're running in

195
00:07:44,190 --> 00:07:45,870
Colab this piece of code,

196
00:07:45,870 --> 00:07:49,160
will then allow me to just
download those files.

197
00:07:49,160 --> 00:07:51,840
So it'll take a moment and
they'll get downloaded.

198
00:07:51,840 --> 00:07:54,835
There they are,
vecs.tsv and meta.tsv.

199
00:07:54,835 --> 00:07:57,200
So if I now come over to
the embedding projector,

200
00:07:57,200 --> 00:08:00,140
we see its showing right
now the Word2Vec 10K.

201
00:08:00,140 --> 00:08:04,060
So if I scroll down
here and say load data,

202
00:08:04,060 --> 00:08:08,385
I'll choose file, I'll
take the vecs.tsv.

203
00:08:08,385 --> 00:08:10,575
I'll choose file.

204
00:08:10,575 --> 00:08:14,165
I'll take the
meta.tsv, then load.

205
00:08:14,165 --> 00:08:17,045
I click outside and
now I see this.

206
00:08:17,045 --> 00:08:19,340
But if I spherize the data,

207
00:08:19,340 --> 00:08:20,660
you can see it's
clustered like this.

208
00:08:20,660 --> 00:08:22,100
We do need to improve it a little

209
00:08:22,100 --> 00:08:23,570
bit but we can begin to see that

210
00:08:23,570 --> 00:08:24,770
the words have been clustered

211
00:08:24,770 --> 00:08:26,690
in both the positive
and negative.

212
00:08:26,690 --> 00:08:29,390
So for example if I search
for the word boring,

213
00:08:29,390 --> 00:08:31,760
we can see like
the nearest neighbors

214
00:08:31,760 --> 00:08:33,905
for boring are things
like stink or unlikeable,

215
00:08:33,905 --> 00:08:36,380
prom, unrealistic wooden,

216
00:08:36,380 --> 00:08:38,630
devoid, unwatchable,
and proverbial.

217
00:08:38,630 --> 00:08:40,520
So if come over here we can see.

218
00:08:40,520 --> 00:08:41,840
These are bad words.

219
00:08:41,840 --> 00:08:46,380
These are words showing
a negative looking review.

220
00:08:49,520 --> 00:08:51,990
I can see, there's
lots of words that

221
00:08:51,990 --> 00:08:53,660
have fun in them, some positive,

222
00:08:53,660 --> 00:08:55,670
some negative, like
unfunny, dysfunction,

223
00:08:55,670 --> 00:08:57,800
funeral are quite negative.

224
00:08:57,800 --> 00:09:02,075
So let's try exciting.

225
00:09:02,075 --> 00:09:05,120
So now if I come over here
we're beginning to see,

226
00:09:05,120 --> 00:09:08,285
hey there's a lot of words
clustered around positive,

227
00:09:08,285 --> 00:09:11,860
movies that matching
exciting, that type of thing.

228
00:09:11,860 --> 00:09:13,160
So we can see them over

229
00:09:13,160 --> 00:09:15,830
really on this left-hand side
of the diagram.

230
00:09:15,830 --> 00:09:19,765
Again, I just maybe if
I search for Oscar,

231
00:09:19,765 --> 00:09:22,040
nothing really associated with

232
00:09:22,040 --> 00:09:24,575
Oscar because it's
such a unique word.

233
00:09:24,575 --> 00:09:27,185
We could just have all kinds
of fun with it like that.

234
00:09:27,185 --> 00:09:29,485
What if I search for brilliant?

235
00:09:29,485 --> 00:09:31,850
Again we can begin to see like

236
00:09:31,850 --> 00:09:34,040
words clustering
over on this side,

237
00:09:34,040 --> 00:09:35,570
but there's not
a lot of words that

238
00:09:35,570 --> 00:09:37,550
became close to brilliant.

239
00:09:37,550 --> 00:09:40,190
In this case, guardian,
Jeffrey, Kidman, Gershwin.

240
00:09:40,190 --> 00:09:43,010
So these are brilliant
being used as an adjective.

241
00:09:43,010 --> 00:09:45,110
Some good stuff in there though.

242
00:09:45,110 --> 00:09:47,900
So hopefully this is
a good example of how you

243
00:09:47,900 --> 00:09:50,570
can start mapping words
into vector spaces,

244
00:09:50,570 --> 00:09:51,800
and how you can start looking at

245
00:09:51,800 --> 00:09:53,510
sentiment and even
visualizing how

246
00:09:53,510 --> 00:09:55,970
your model has learned

247
00:09:55,970 --> 00:09:58,070
sentiment from
these sets of words.

248
00:09:58,070 --> 00:09:59,660
In the next video, we're going to

249
00:09:59,660 --> 00:10:01,220
look at a simpler
version of doing

250
00:10:01,220 --> 00:10:02,660
IMDb than this one

251
00:10:02,660 --> 00:10:04,200
where we're doing
writing a lot less code,

252
00:10:04,200 --> 00:10:05,930
we're taking advantage
of stuff that was

253
00:10:05,930 --> 00:10:09,330
available to us
TenserFlow data services.