1
00:00:11,120 --> 00:00:16,790
So in this lecture, we'll be introducing the next section of this course, which is all about convolutional

2
00:00:16,790 --> 00:00:17,780
neural networks.

3
00:00:18,230 --> 00:00:23,330
We also call these CNN's essentially this section is made up of two parts.

4
00:00:23,870 --> 00:00:27,560
The first part will discuss the theory behind scenes and how they work.

5
00:00:28,130 --> 00:00:32,210
This will make up a majority of this section, since the reasoning goes pretty deep.

6
00:00:33,080 --> 00:00:36,320
The second part we'll look at how to apply CNN's to text.

7
00:00:37,790 --> 00:00:43,760
Now, I want to make it explicit that the first part on the theory of CNN's is optional if you prefer

8
00:00:43,760 --> 00:00:45,830
to skip straight to the code preparation.

9
00:00:46,160 --> 00:00:50,600
You can feel free to do so and opt out of understanding how CNN's work.

10
00:00:51,200 --> 00:00:57,050
There is a lot of math, so if you don't like math, then this is not for you or if you already know

11
00:00:57,050 --> 00:00:57,980
how CNN's work.

12
00:00:58,340 --> 00:01:00,590
Then again, feel free to skip this part.

13
00:01:05,349 --> 00:01:11,110
So clearly, this idea of convolutional neural networks has something to do with convolution.

14
00:01:11,680 --> 00:01:17,140
In fact, a convolutional neural network is nothing but a neural network with convolution in it.

15
00:01:17,740 --> 00:01:23,470
So a majority of this section will be devoted to simply helping you understand what kind of solution

16
00:01:23,470 --> 00:01:25,300
even is and why it works.

17
00:01:26,290 --> 00:01:30,250
Now, it turns out that CNN's were originally invented for images.

18
00:01:30,790 --> 00:01:36,910
Understanding how convolution works for images is much more intuitive, simply because we can see images

19
00:01:37,210 --> 00:01:39,400
and visualize the effects of convolution.

20
00:01:40,240 --> 00:01:45,700
As such, our discussion on convolution will start with images, even though this is, of course, about

21
00:01:45,700 --> 00:01:46,300
NLP.

22
00:01:46,990 --> 00:01:48,520
It simply makes more sense.

23
00:01:49,270 --> 00:01:54,100
What you'll come to see is that an image is merely a two dimensional continuous signal.

24
00:01:54,790 --> 00:01:59,410
On the other hand, something like a time series is a one dimensional continuous signal.

25
00:02:00,370 --> 00:02:06,340
Furthermore, when we convert text into vectors, this is basically also a time series, but with multiple

26
00:02:06,340 --> 00:02:12,790
features and thus using the CNN for NLP is the same as using a CNN for a time series.

27
00:02:17,470 --> 00:02:23,230
So I want to expand on this idea a bit more that a sequence of word vectors is just a time series.

28
00:02:24,100 --> 00:02:27,490
Firstly, I think it's helpful to think of an actual time series.

29
00:02:28,120 --> 00:02:30,850
Consider a sequence of stock returns from the S&P.

30
00:02:31,570 --> 00:02:34,570
Clearly, you and I can agree that this is a time series.

31
00:02:35,170 --> 00:02:41,110
If we were to store these values inside an umpire, we would have a one dimensional sequence of length

32
00:02:41,110 --> 00:02:44,740
T to turn this into a machine learning problem.

33
00:02:44,740 --> 00:02:48,970
Suppose we want to predict whether tomorrow's stock price will go up or down.

34
00:02:49,900 --> 00:02:55,480
Now, of course, we might want to include different inputs to our model like pass volume and perhaps

35
00:02:55,480 --> 00:02:57,010
some technical indicators.

36
00:02:57,730 --> 00:03:01,210
Clearly, each of these also forms a time series of lengthy.

37
00:03:02,530 --> 00:03:09,040
Suppose we have such time series, then in total, if we were to put all these together, we would have

38
00:03:09,280 --> 00:03:11,170
time series each of 9:30.

39
00:03:11,200 --> 00:03:16,180
But if we combine them into a single array, we could store this in a matrix of size t by.

40
00:03:17,500 --> 00:03:22,240
Of course, this is also exactly what we have when we convert text into vectors.

41
00:03:23,050 --> 00:03:24,310
Suppose we have a sentence.

42
00:03:24,310 --> 00:03:25,630
I like eggs and bacon.

43
00:03:26,140 --> 00:03:27,280
This is a sequence of length.

44
00:03:27,280 --> 00:03:27,820
Five.

45
00:03:27,850 --> 00:03:28,390
So two years.

46
00:03:28,390 --> 00:03:28,990
Five.

47
00:03:29,710 --> 00:03:34,840
Suppose we then pass these through an embedding layer such that we have the vector for ie, the vector

48
00:03:34,840 --> 00:03:37,330
for like the vector for eggs and so forth.

49
00:03:37,990 --> 00:03:41,260
The result is that we get two vectors each of size.

50
00:03:42,520 --> 00:03:46,930
In effect, we have a T by D matrix or, in other words, the time series.

51
00:03:47,200 --> 00:03:53,920
Each of sequence length T, therefore a time series of word embeddings is no different than a time series

52
00:03:53,920 --> 00:03:58,450
of stock returns or technical indicators, as a side note.

53
00:03:58,480 --> 00:04:03,760
Although this might be a bit confusing, we do consider both of these time series to be one dimensional.

54
00:04:04,090 --> 00:04:09,760
Even though they occupy a two dimensional array, this is because we consider things like time and space

55
00:04:09,760 --> 00:04:14,380
to be indices over which we do convolution, but not different features.

56
00:04:15,010 --> 00:04:19,870
Don't worry if this doesn't make perfect sense now, but be prepared to consider this thought as you

57
00:04:19,870 --> 00:04:21,390
go through the CNN lectures.

58
00:04:23,160 --> 00:04:28,530
As you'll see, images are considered two dimensional, even though they are stored in three dimensional

59
00:04:28,530 --> 00:04:29,250
arrays.