1
00:00:00,150 --> 00:00:05,820
Hi and welcome back to the course in this lesson, we'll take a look at building our very own super

2
00:00:05,820 --> 00:00:09,120
resolution again or her again in Paris.

3
00:00:09,300 --> 00:00:10,380
So let's get started.

4
00:00:10,560 --> 00:00:14,820
So open notebook territory like I just did and we'll begin to listen.

5
00:00:15,360 --> 00:00:21,330
So firstly, the tutorial, inspiration and credit is attributed to this author here, and this is the

6
00:00:21,330 --> 00:00:25,170
link for the actual material on the official Keros example site.

7
00:00:25,590 --> 00:00:31,140
They have so many good tutorials there, however, there are a lot of them are a bit hard for beginners

8
00:00:31,140 --> 00:00:31,950
to understand.

9
00:00:32,460 --> 00:00:38,370
So what we'll do in this course is that will go through maybe a handful of these tutorials because these

10
00:00:38,370 --> 00:00:44,280
are research paper implementations of some pretty state of the art networks.

11
00:00:44,790 --> 00:00:47,070
So let's get started with the lesson.

12
00:00:47,490 --> 00:00:51,750
So, but sorry again, we're going to recreate it here is called the.

13
00:00:51,750 --> 00:00:57,800
It's called the efficient subpixels CNN, and it was proposed by XI in 2016.

14
00:00:57,810 --> 00:01:01,200
This is the original paper here by the water domain.

15
00:01:01,200 --> 00:01:01,770
Also here.

16
00:01:02,340 --> 00:01:08,070
It came out in 2016, and you can browse this people, if you want to get a good, deeper understanding

17
00:01:08,070 --> 00:01:10,110
of how this summer again works.

18
00:01:10,650 --> 00:01:12,360
So let's continue with the lesson.

19
00:01:13,020 --> 00:01:19,570
So we're going to try and implement this model on a small dataset called the BSD 500.

20
00:01:20,070 --> 00:01:22,680
That's from Berkeley's Computer Vision Group.

21
00:01:23,070 --> 00:01:26,310
And you can take a look at some of the dataset information here.

22
00:01:28,230 --> 00:01:30,120
And now we'll go back to the lesson.

23
00:01:30,360 --> 00:01:36,240
So firstly, let's set up and load our libraries and data set, but we're not at the low end of the

24
00:01:36,240 --> 00:01:37,610
dataset here just yet.

25
00:01:37,630 --> 00:01:38,910
We're getting the dataset over here.

26
00:01:39,930 --> 00:01:44,820
So let's download our data said while these libraries are loaded.

27
00:01:52,760 --> 00:01:59,810
This will take about roughly 20 seconds, so just be patient and we'll have our data downloaded shortly.

28
00:02:01,370 --> 00:02:02,000
OK, great.

29
00:02:02,030 --> 00:02:04,850
It's complete now to roughly 24 seconds.

30
00:02:05,450 --> 00:02:12,320
So now what we do, we have to split that data set into a training and validation separate sets.

31
00:02:12,500 --> 00:02:17,030
So we'll do that here, and this will have some parameters be defined here.

32
00:02:17,030 --> 00:02:22,670
So we're setting the crop size two to three hundred setting an upscale factor as well as an input size,

33
00:02:22,670 --> 00:02:26,510
which is a crop size over the upscale factor, as well as a batch size.

34
00:02:26,690 --> 00:02:32,480
So we'll use image data set from directory here to get our data set, and we just set the validation

35
00:02:32,480 --> 00:02:36,410
split two point two and then we go, So let's create that.

36
00:02:37,280 --> 00:02:42,590
And that's done, and you can actually see the summary below of what we get from using image data set

37
00:02:42,590 --> 00:02:43,360
from directory.

38
00:02:43,370 --> 00:02:50,150
So we have 500 files, all belong to one class using 400 for training and then using 100 for validation.

39
00:02:50,990 --> 00:02:56,000
So now will this scale create a function to reskill data between zero and one?

40
00:02:56,510 --> 00:03:02,780
So then we just apply it to all data set that we've created here the validation dataset and the training

41
00:03:02,780 --> 00:03:06,890
dataset, so we can scale our dataset accordingly.

42
00:03:07,760 --> 00:03:11,330
So then now let's visualize some batches from that data.

43
00:03:16,200 --> 00:03:21,780
So we can see we picked some batches here is a weird looking animal, looks like some sort of monkey,

44
00:03:21,780 --> 00:03:25,080
but I doubt that's a monkey with a lizard.

45
00:03:25,270 --> 00:03:28,200
But there's a dog with some people on snow.

46
00:03:28,950 --> 00:03:33,750
So you can see there's a very wide variety of images in this dataset, which is good.

47
00:03:34,080 --> 00:03:38,400
That means our model will generalize well to new images next.

48
00:03:38,550 --> 00:03:42,960
All we do here is just we can get to test image parts here using these functions.

49
00:03:42,960 --> 00:03:48,280
So we get all of the image names, well, all the image parts for the test images right there.

50
00:03:48,300 --> 00:03:49,500
So let's run that.

51
00:03:50,130 --> 00:03:56,700
Next, we'll convert our images to the volume color space, which is a more human way of perceiving

52
00:03:56,730 --> 00:03:57,240
images.

53
00:03:58,140 --> 00:04:00,680
And it also has some advantages over RGV.

54
00:04:00,690 --> 00:04:06,510
So we'll be using that for this shooting network, and you can train on any sort of color space.

55
00:04:06,510 --> 00:04:07,960
You can create an RGV.

56
00:04:07,980 --> 00:04:09,420
Typically, we always do that.

57
00:04:09,810 --> 00:04:13,050
However, you can train on HSV or white movie as well.

58
00:04:13,590 --> 00:04:19,110
Next, what we'll do will create some helper functions here to process the input that that means it's

59
00:04:19,110 --> 00:04:21,090
converted into white image here.

60
00:04:21,540 --> 00:04:25,800
And then we also have process to target as well, which does something similar, but only returns to

61
00:04:25,800 --> 00:04:28,200
white channel because that's the only time we're looking at here.

62
00:04:28,890 --> 00:04:35,130
And then we use a TensorFlow pre fetch function to create some data loaders that allow us to load data

63
00:04:35,130 --> 00:04:37,140
quite quickly during the training process.

64
00:04:37,860 --> 00:04:43,680
Next, what we'll do We'll take a look at some of our training and target data, so you can see these

65
00:04:43,680 --> 00:04:44,790
are first images.

66
00:04:44,790 --> 00:04:49,080
Here are low resolution images that we want to upskill to these.

67
00:04:49,080 --> 00:04:50,670
These are the ground truth images here.

68
00:04:50,880 --> 00:04:53,250
So the network is going to learn that with CNN.

69
00:04:53,850 --> 00:04:56,100
So typically when I said this is a gun.

70
00:04:56,490 --> 00:04:59,070
Guns, guns have been used for super resolution.

71
00:04:59,100 --> 00:05:03,990
However, this specific gun isn't really based on generative adversarial networks.

72
00:05:03,990 --> 00:05:06,450
It's based on convolution networks.

73
00:05:06,450 --> 00:05:08,800
So it's more for our CNN.

74
00:05:09,750 --> 00:05:15,900
So, but nevertheless, I put it in the gun section just because it's commonly associated with guns.

75
00:05:16,680 --> 00:05:20,010
So now we can move on to building new models so you can run this.

76
00:05:20,010 --> 00:05:22,980
I've already run all this courthouse, so you don't I don't need to do it again.

77
00:05:23,280 --> 00:05:24,810
I don't want to mess things up too much.

78
00:05:25,260 --> 00:05:26,880
So I just go through the code here.

79
00:05:26,880 --> 00:05:32,240
But remember to run every cell block as you go along with this lesson and you can see to CNN, it's

80
00:05:32,250 --> 00:05:34,840
just kind of it's quite simple.

81
00:05:35,130 --> 00:05:35,670
Comes on.

82
00:05:35,670 --> 00:05:42,840
Come on on top of cons and then you can see we have different arguments here where we have really activation

83
00:05:43,290 --> 00:05:49,560
orthogonal panel of Initialized, which is a different type of initialization, as well as we use padding

84
00:05:49,710 --> 00:05:55,170
for this same padding, so we don't lose the feature map size as we progress to the network.

85
00:05:56,490 --> 00:05:58,740
Next, we define some more utility functions.

86
00:05:58,740 --> 00:06:03,570
This one +2 results, and this one gets too low res image here.

87
00:06:03,930 --> 00:06:10,920
This one upscales the image, and now sorry, we can create some callbacks to monitor trend filling.

88
00:06:11,370 --> 00:06:15,540
So we just create a callback class here, and we just stored in our value.

89
00:06:15,810 --> 00:06:22,270
That's peak signal to noise ratio, which is a metric we use to evaluate super resolution type networks

90
00:06:22,270 --> 00:06:23,370
for the performance.

91
00:06:24,120 --> 00:06:30,090
And then we just have one epoch end on test batch so we can print and get different training accuracy

92
00:06:30,090 --> 00:06:34,210
or training performance metrics during training process.

93
00:06:35,130 --> 00:06:42,840
Then we can just finally create the model here and get the summary and then define everything here with

94
00:06:42,850 --> 00:06:48,150
the optimizer and lowest functions, as well as getting set of setting a little callbacks and the checkpoints

95
00:06:48,630 --> 00:06:49,710
so we can see a model.

96
00:06:49,710 --> 00:06:51,150
SUMMARY It's quite tiny.

97
00:06:51,150 --> 00:06:55,770
It's fifty nine thousand trainable parameters, which is very small because it's only conflits.

98
00:06:55,770 --> 00:07:01,110
That's why we actually trained for filters, not for accuracy in these types of networks.

99
00:07:02,550 --> 00:07:04,440
So now we're ready to train the model.

100
00:07:04,950 --> 00:07:07,620
So this model takes a little bit of a while to train.

101
00:07:08,070 --> 00:07:12,000
So this is a training process, the results of my previous training experiment.

102
00:07:12,540 --> 00:07:14,430
And you can see this is a prediction here.

103
00:07:14,430 --> 00:07:19,860
And this plotting function that we defined above allows you to zoom into an area and expand it here.

104
00:07:19,950 --> 00:07:26,820
So that's quite cool, and you can see this as a lost function, as the progress of the smaller and

105
00:07:26,820 --> 00:07:27,390
smaller.

106
00:07:27,390 --> 00:07:33,730
And you can see the mean percent are for each layer, and this should get larger and actually twice.

107
00:07:49,420 --> 00:07:54,340
And this should get larger and larger as we progressed, should a network doesn't change that much,

108
00:07:54,340 --> 00:07:56,910
it seems after a while, but that's OK.

109
00:07:57,370 --> 00:07:58,960
And let's look at this.

110
00:07:59,650 --> 00:08:01,330
These are all the results of quite a bit.

111
00:08:01,840 --> 00:08:06,250
Now we can look at displaying the model prediction as well as plotting the results here.

112
00:08:06,820 --> 00:08:12,340
So you can see the piercing hour of low resolution images and high resolution image is twenty nine point

113
00:08:12,340 --> 00:08:15,830
eight five percent of the predicted high resolution.

114
00:08:15,830 --> 00:08:16,690
That's 230.

115
00:08:16,720 --> 00:08:17,680
So that's interesting.

116
00:08:18,190 --> 00:08:24,400
So you can see now this is a low resolution image here versus the high resolution images versus the

117
00:08:24,400 --> 00:08:24,970
prediction.

118
00:08:25,060 --> 00:08:27,250
So let's take a look at all three of them together.

119
00:08:27,730 --> 00:08:33,790
You can see in the high resolution, some would say it's a bit of a defined logo with the individual

120
00:08:33,790 --> 00:08:35,410
looks that grips on this model.

121
00:08:35,840 --> 00:08:37,410
It's all a blur here.

122
00:08:37,420 --> 00:08:38,920
It's still a bit of a blur.

123
00:08:39,430 --> 00:08:44,440
However, it's slightly sharper, like this line has slightly sharper than this line.

124
00:08:44,980 --> 00:08:46,240
So it's a bit cleaned up.

125
00:08:46,240 --> 00:08:48,190
A bit more defined, but not that great.

126
00:08:48,690 --> 00:08:50,620
Then we can look at the images here.

127
00:08:50,770 --> 00:08:57,580
This one is a woman holding a wine glass, and you can see in the prediction vision everything looks

128
00:08:57,580 --> 00:09:03,730
slightly sharper, but you can see it on your video stream, but it's a little bit sharp on my end.

129
00:09:04,630 --> 00:09:08,110
But you do have the source notebook for this, so you can analyze it when results.

130
00:09:08,620 --> 00:09:14,080
So that's it for this lesson on super resolution that works, this one doesn't make a huge difference.

131
00:09:14,500 --> 00:09:21,240
I may record another video lesson with a better, more advanced super resolution again.

132
00:09:21,760 --> 00:09:24,430
So for no, this is it for this lesson.

133
00:09:24,430 --> 00:09:29,770
And if you want me to record that video, just let me know I'll put the notebook up in the course.

134
00:09:32,670 --> 00:09:33,420
Resources.

135
00:09:34,020 --> 00:09:36,190
So you guys can have access to that as well.

136
00:09:36,240 --> 00:09:37,620
So thank you for watching.

137
00:09:38,040 --> 00:09:40,380
And that's it for our super resolution guy.

138
00:09:40,380 --> 00:09:41,520
And listen, thank you.