1
00:00:00,360 --> 00:00:01,510
Hi and welcome back.

2
00:00:01,530 --> 00:00:07,260
That, of course, in this section, we'll talk about auto encounters and how they can perform representational

3
00:00:07,260 --> 00:00:07,650
learning.

4
00:00:07,980 --> 00:00:09,130
So let's get started.

5
00:00:09,150 --> 00:00:11,610
So firstly, what are often quitters?

6
00:00:11,970 --> 00:00:14,280
Well, it's an unsupervised learning technique.

7
00:00:14,400 --> 00:00:17,310
Like I said, that is used for representational learning.

8
00:00:17,700 --> 00:00:19,050
But what exactly is that?

9
00:00:19,080 --> 00:00:25,500
Well, firstly, do you recall what our CNN filters actually learned when you trained a CNN on an image

10
00:00:25,500 --> 00:00:26,040
dataset?

11
00:00:26,430 --> 00:00:32,370
Well, it lends things like high level patterns, as well as low level features such as edges and blobs.

12
00:00:32,730 --> 00:00:39,720
So what if we could exploit what a CNN learns about a dataset so that it acts as a method of compression?

13
00:00:39,930 --> 00:00:41,040
So think about something.

14
00:00:41,370 --> 00:00:48,000
When you train CNN on an image dataset, it learns so many things about the dataset that there must

15
00:00:48,000 --> 00:00:54,540
be some way where we can use those features we can use, like combinations of those features to generate

16
00:00:54,540 --> 00:00:55,140
an output.

17
00:00:55,560 --> 00:01:02,010
Well, that's exactly what auto input do they learn to compress data based on the correlations between

18
00:01:02,010 --> 00:01:02,910
input features.

19
00:01:03,300 --> 00:01:09,570
So some applications of this include things like denouncing images or even audio image in painting,

20
00:01:09,570 --> 00:01:16,290
which is fixing areas of areas that obscured or missing an image, as well as information retrieval,

21
00:01:16,290 --> 00:01:19,590
anomaly detection and obviously compression.

22
00:01:20,250 --> 00:01:23,470
So, no, let's take a look at the auto encoder architecture.

23
00:01:23,910 --> 00:01:25,680
Notice there's a bottleneck layer.

24
00:01:25,800 --> 00:01:32,190
Now imagine the whole objective of an auto encoder is that we want to shrink the dimensionality that

25
00:01:32,190 --> 00:01:33,210
represents data.

26
00:01:33,750 --> 00:01:39,840
What that means is if there's like a number one or number two, you know that a lot of features and

27
00:01:39,840 --> 00:01:45,720
that a lot of patterns in that number correlate with each other, meaning that a lot of local pixels

28
00:01:46,110 --> 00:01:48,000
are grouped together, so they do correlate.

29
00:01:48,480 --> 00:01:49,290
So what?

30
00:01:49,290 --> 00:01:51,390
That's what the bottleneck architecture does.

31
00:01:51,900 --> 00:01:54,070
It seeks to find those correlations.

32
00:01:54,090 --> 00:02:00,270
That's the embedded learning that's been done in that layer so that we can know represented in a much

33
00:02:00,270 --> 00:02:04,530
smaller vector dimensionality so that we can actually compress the data.

34
00:02:04,980 --> 00:02:07,050
So often coders do it very well.

35
00:02:07,050 --> 00:02:13,410
That data has the data that has correlated input features, which is images actually, in case you didn't

36
00:02:13,410 --> 00:02:13,590
know.

37
00:02:14,010 --> 00:02:16,020
So let's talk a bit about this bottleneck layer.

38
00:02:16,530 --> 00:02:21,060
The bottleneck constrains the amount of information that is able to traverse the full network.

39
00:02:21,370 --> 00:02:22,560
It's kind of obvious here.

40
00:02:23,460 --> 00:02:29,370
So this enables the hidden bottleneck layers to learn a compressed representation of the input data.

41
00:02:29,940 --> 00:02:34,860
So that's basically what all of this is quite simple in a high level way of looking at it.

42
00:02:35,520 --> 00:02:41,970
Basically, all we're trying to do is learn the input data by using less information here to store that

43
00:02:41,970 --> 00:02:42,510
information.

44
00:02:42,630 --> 00:02:45,030
So that's why we use the correlated features.

45
00:02:46,020 --> 00:02:48,750
So what is the ideal auto and could it look like?

46
00:02:49,260 --> 00:02:55,980
Well, the ideal auto encoder is sensitive enough to accurately reconstruct the image, meaning that

47
00:02:56,580 --> 00:03:03,420
it is sensitive enough in that it can take different inputs and use that to generate the correct output.

48
00:03:04,170 --> 00:03:10,410
However, it's insensitive enough to inputs that the model doesn't over fit on the training data, so

49
00:03:10,410 --> 00:03:11,390
it's a tricky balance.

50
00:03:11,430 --> 00:03:18,360
So we'll take a look at this when we when we train our own auto and quitters in PyTorch and Keros afterward.

51
00:03:18,870 --> 00:03:19,650
So let's take a look.

52
00:03:19,670 --> 00:03:22,260
A deeper look at the auto encoded architecture.

53
00:03:22,800 --> 00:03:28,500
A simple auto and could architecture is one where it bottlenecks or constrains the number of nodes in

54
00:03:28,500 --> 00:03:29,190
the middle layer.

55
00:03:29,280 --> 00:03:30,120
It's quite simple.

56
00:03:30,810 --> 00:03:33,990
No noticeable input, an output much in dimensionality.

57
00:03:34,110 --> 00:03:37,620
They have to be the same output because we're trying to compress like that.

58
00:03:37,630 --> 00:03:39,540
Say this a 28 by 28 image here.

59
00:03:39,960 --> 00:03:46,830
We're trying to compress it by a smaller representation, but then express it back as the full 28 by

60
00:03:46,830 --> 00:03:47,520
28 image.

61
00:03:48,390 --> 00:03:51,390
That's because we're reconstructing the input and the output.

62
00:03:52,110 --> 00:03:55,620
So our laws function here penalizes Reconstruction ERA.

63
00:03:56,370 --> 00:04:01,200
So this allows a model to learn the most important features needed to reconstruct the image.

64
00:04:01,650 --> 00:04:06,450
Now, many of you who have studied data science and machine learning may think that ought to include

65
00:04:06,450 --> 00:04:10,590
is quite simple in principle two principal components analysis.

66
00:04:10,590 --> 00:04:17,370
And in that guess you are right, principal component analysis is trying to learn the features with

67
00:04:17,370 --> 00:04:22,770
the highest variations so that we can represent the data at a much less, much more compressed state

68
00:04:23,280 --> 00:04:26,610
that is technically similar to what auto and coders are trying to do.

69
00:04:26,640 --> 00:04:30,570
This is just another deep learning way of doing that same thing.

70
00:04:31,650 --> 00:04:35,190
So let's take a look at these CNN auto encoder architecture now.

71
00:04:35,190 --> 00:04:40,980
Given that our inputs in computer vision are images, using a CNN takes makes total sense because the

72
00:04:40,980 --> 00:04:46,860
Cavaliers provide much better performance because they're able to compress that image structure and

73
00:04:46,860 --> 00:04:48,570
linear correlations within that image.

74
00:04:49,140 --> 00:04:50,790
So that's why sorry about that.

75
00:04:51,150 --> 00:04:56,760
That's why Convolutional is ideal for constructing a auto encoder for images.

76
00:04:57,480 --> 00:04:59,930
So let's take a look at how we train not to include a.

77
00:05:00,520 --> 00:05:02,650
So the training process is quite simple.

78
00:05:02,680 --> 00:05:06,250
However, there are a few differences, so let's take a note of those differences here.

79
00:05:06,730 --> 00:05:09,970
So the target data is the same as the training data.

80
00:05:10,390 --> 00:05:11,010
Look at this here.

81
00:05:11,020 --> 00:05:14,410
This is the sheriff's auto include a structure extreme.

82
00:05:14,560 --> 00:05:16,480
Usually we have extreme and x test.

83
00:05:16,480 --> 00:05:18,580
Here we have extreme and extreme.

84
00:05:18,700 --> 00:05:20,830
That's why the target is the same as the training data.

85
00:05:20,860 --> 00:05:23,290
So what does that mean for the rest of it?

86
00:05:23,770 --> 00:05:25,660
Well, it means also for validation.

87
00:05:26,500 --> 00:05:28,410
X2's is also the same as X2's.

88
00:05:29,020 --> 00:05:30,070
And why is that?

89
00:05:30,130 --> 00:05:35,740
Well, that's because we're trying to test how well our encoder decoder model works.

90
00:05:36,550 --> 00:05:42,400
So when I say and could a decoder model, what we're doing with an auto encoder is that we're taking

91
00:05:42,760 --> 00:05:49,990
an input dataset, shrinking the dimensionality or the representational size of the dataset and basically

92
00:05:49,990 --> 00:05:52,030
storing the information of that data.

93
00:05:52,030 --> 00:05:55,630
So we're encoding, let's say, the number seven in a smaller vector.

94
00:05:55,630 --> 00:05:55,930
No.

95
00:05:56,320 --> 00:06:03,460
And then we can use the same encoder decoder model to now rebuild the input from that smaller representation.

96
00:06:03,790 --> 00:06:05,470
That's what an auto input is doing.

97
00:06:06,160 --> 00:06:12,040
So the lost function we can use for this can be either something like binary cross entropy or even means

98
00:06:12,040 --> 00:06:12,580
quit error.

99
00:06:13,240 --> 00:06:19,450
So let's take a look at the limitations now of what to include is auto included as a lossy meaning that

100
00:06:19,450 --> 00:06:24,640
decompressed outputs are degraded bits in original, mainly because we have less information to store

101
00:06:24,640 --> 00:06:25,270
that data.

102
00:06:25,300 --> 00:06:26,680
That's basically what lossy means.

103
00:06:27,550 --> 00:06:34,370
It's data specific, meaning that it looms a representation only of a specific domain so that, no,

104
00:06:34,390 --> 00:06:40,850
we can't apply it or to include it, that we have trained on c 100 handwritten digits to the two letters.

105
00:06:40,870 --> 00:06:41,740
It's not going to work.

106
00:06:42,130 --> 00:06:44,490
So it's it's domain is data domain specific.

107
00:06:44,530 --> 00:06:51,940
So next, we'll take a look at building an auto and could is input keras and PyTorch, so I'll see you

108
00:06:51,940 --> 00:06:52,720
in the next lesson.