1
00:00:00,060 --> 00:00:06,810
Hi and welcome back to Section 19, where we take a look at histograms and K means clustering for the

2
00:00:06,810 --> 00:00:11,940
dominant cause of an image, so open that notebook and scroll up to the top.

3
00:00:12,570 --> 00:00:15,150
And I'll tell you about this lesson before we dive in.

4
00:00:15,300 --> 00:00:18,900
So but firstly, let's just look at our images and the functions and libraries.

5
00:00:19,710 --> 00:00:24,210
So what we're going to talk about is histograms and what a histogram is.

6
00:00:24,780 --> 00:00:27,720
It's basically a graph, a bar chart.

7
00:00:27,720 --> 00:00:31,290
If you if it looks similar to the bar chart, basically, but it doesn't have to be a bar chart that

8
00:00:31,290 --> 00:00:32,670
can be a line graph as well.

9
00:00:33,390 --> 00:00:37,770
And a histogram gives us basically the distribution of something.

10
00:00:38,100 --> 00:00:38,490
All right.

11
00:00:38,490 --> 00:00:44,250
So in the case of an images we're going to give, we're going to go through a distribution of the colors.

12
00:00:44,250 --> 00:00:50,730
So the amount of colors that, oh, that sets in value, it's going to show.

13
00:00:50,910 --> 00:00:56,720
So maybe, maybe a verbal explanation is not the best, but maybe even if we see it.

14
00:00:56,760 --> 00:00:59,100
So let's go down and run this code.

15
00:00:59,430 --> 00:01:01,880
So what we're going to do, we're going to load of input image.

16
00:01:01,890 --> 00:01:08,100
We're going to show the original image here, which is this and then we're going to use the plotting

17
00:01:08,100 --> 00:01:12,810
function from that bottle, which we imported right here to plot a histogram.

18
00:01:12,930 --> 00:01:14,860
We use Bravo, Bravo.

19
00:01:14,880 --> 00:01:16,770
This basically flattens our imagery.

20
00:01:16,860 --> 00:01:19,980
So we just have it as big one big one dimensional array.

21
00:01:21,180 --> 00:01:24,770
And then we just specify this is this is how much bins we want.

22
00:01:24,780 --> 00:01:30,210
This is orange and we just do appeal to our show, which actually this should be in another line.

23
00:01:30,600 --> 00:01:35,670
I don't actually like programming like that to Java or C++ habit.

24
00:01:36,480 --> 00:01:39,180
And then what we do is separate the two channels here.

25
00:01:39,210 --> 00:01:45,690
But this is just an array or set that's telling us is BGR so that when we enumerate over it, that means

26
00:01:45,690 --> 00:01:46,590
looping through it.

27
00:01:46,590 --> 00:01:52,620
Here we can just specify the labels here so we can just see code color, equal color, so we can actually

28
00:01:52,620 --> 00:01:55,650
specify the color and types here, which is pretty cool to do.

29
00:01:56,220 --> 00:02:03,330
And again, we set the limits here, and then we use the CB2 calc highest here to plot the individual

30
00:02:04,980 --> 00:02:05,610
lines here.

31
00:02:05,940 --> 00:02:08,010
So we're actually doing two histograms in this code.

32
00:02:08,640 --> 00:02:16,430
We're doing one which just shows the overall image, the overall histogram, which is basically a brightness,

33
00:02:16,440 --> 00:02:17,550
disrupt distribution.

34
00:02:18,210 --> 00:02:20,370
So you can see this image is a fairly dark image.

35
00:02:21,300 --> 00:02:27,330
So imagine now before I move down to the axis, imagine this the x axis had a bottom axis.

36
00:02:27,900 --> 00:02:32,400
Imagine this was telling us that this was a range from zero to 255.

37
00:02:33,060 --> 00:02:39,270
And it basically was telling us that if you see a spike in a line like this, it means there's a lot

38
00:02:39,270 --> 00:02:43,440
of dark pixels or if it's a spike in and hence a lot of bright pixels.

39
00:02:43,920 --> 00:02:46,230
So what do you anticipate this image will look like?

40
00:02:47,100 --> 00:02:47,670
There it is.

41
00:02:48,090 --> 00:02:50,310
You can see these are the brightness.

42
00:02:51,060 --> 00:02:54,640
How much of each brightness was in the image, each pixel brightness.

43
00:02:55,110 --> 00:03:01,830
So you can see a lot of the pixels in this image are basically under like 60 in terms of the brightness

44
00:03:01,830 --> 00:03:07,220
and the intensity, almost a hundred and twenty thousand of them with this density here.

45
00:03:07,470 --> 00:03:11,790
Remember, this is a fairly large image, so you can expect to have that that among the pixels.

46
00:03:12,450 --> 00:03:17,910
This will probably close to a million pixels, so you can see this is the distribution of it here.

47
00:03:18,240 --> 00:03:22,020
So what if we wanted to break this down by color?

48
00:03:22,320 --> 00:03:23,570
That's what this look does.

49
00:03:23,580 --> 00:03:25,740
That's what we did in this part of the code here.

50
00:03:26,700 --> 00:03:31,890
So I may have committed this out just because I wasn't using it before, but it's just to show you that

51
00:03:31,890 --> 00:03:39,090
we can use to caucus to actually get the histogram in the video histogram of an image I got.

52
00:03:39,510 --> 00:03:44,970
You put the image into array just like this isalso array as well.

53
00:03:45,570 --> 00:03:50,590
And then we specified range and we wanted to beans beans, basically meaning what spacing we want here.

54
00:03:50,620 --> 00:03:52,770
So we're we're doing it to every pixel.

55
00:03:53,340 --> 00:03:58,260
And this has a colored but so you can see, no, this image definitely does seem to have a more warm

56
00:03:58,260 --> 00:03:59,850
ish tint to it.

57
00:04:00,630 --> 00:04:02,280
That's why it is more red to it and blue.

58
00:04:03,000 --> 00:04:07,040
And you can see this is probably the least amount of blue in terms of dark.

59
00:04:07,620 --> 00:04:10,980
Blue is at least visible as more dark than the screen.

60
00:04:11,220 --> 00:04:13,440
So this is a pretty cool way to analyze images.

61
00:04:14,070 --> 00:04:16,100
So what what?

62
00:04:16,130 --> 00:04:17,010
What do we do with this?

63
00:04:17,040 --> 00:04:20,880
Well, one cool thing we can do with this before everyone.

64
00:04:20,880 --> 00:04:24,530
This is just the parameters that go into calculus.

65
00:04:24,540 --> 00:04:28,860
If you wanted to see it for yourself, we just kind of went through it, we believe before in code.

66
00:04:29,340 --> 00:04:29,750
So what?

67
00:04:29,760 --> 00:04:31,380
It's for your information, it's right here.

68
00:04:32,070 --> 00:04:33,690
So back to this.

69
00:04:33,840 --> 00:04:39,210
So what we're going to do, we're going to take another image before I move on to pretty cool exercise

70
00:04:40,110 --> 00:04:43,690
and we're going to compare the histogram of this image to the other one.

71
00:04:43,710 --> 00:04:49,350
You can see this one has a nice spike of end, which is the sky pixels most likely.

72
00:04:49,860 --> 00:04:54,060
And you can see as a gradual, dark and darkening both in the beginning.

73
00:04:54,240 --> 00:04:58,740
There's a lot of blacks, probably as well in image, which is why this little spike in beginning is

74
00:04:58,890 --> 00:04:59,310
for.

75
00:04:59,940 --> 00:05:04,380
And then again, it kind of goes smoothly down with a little spike it in just probably the fringes of

76
00:05:04,380 --> 00:05:04,980
the mountains.

77
00:05:06,000 --> 00:05:07,080
You can see the cold.

78
00:05:07,080 --> 00:05:11,640
The situation is very different and you can see again because it's a warm looking image.

79
00:05:12,180 --> 00:05:14,640
There's a lot of red in the image and spiking in the end.

80
00:05:14,790 --> 00:05:17,160
Much more than blue is actually no blue spike in the end.

81
00:05:18,000 --> 00:05:19,890
So it's an interesting way to analyze images.

82
00:05:20,370 --> 00:05:25,920
So what if no, we wanted to break down this image into its individual color components.

83
00:05:26,400 --> 00:05:31,170
Let's say we wanted to get the five most dominant colors out of this image.

84
00:05:31,710 --> 00:05:32,830
Well, that's what we're going to do.

85
00:05:32,850 --> 00:05:35,190
We're going to use K means clustering to do that.

86
00:05:35,760 --> 00:05:38,670
No, I won't go into detail of what K means clustering is.

87
00:05:39,090 --> 00:05:44,970
It's basically a clustering algorithm that groups pixels of similar value together.

88
00:05:45,270 --> 00:05:46,650
That's essentially what it does.

89
00:05:47,460 --> 00:05:51,030
So we have some helper functions here, which we'll go to just now.

90
00:05:51,600 --> 00:05:55,740
But what we do, we input key means from Skillings cluster.

91
00:05:56,840 --> 00:05:57,060
Hmm.

92
00:05:58,050 --> 00:06:03,350
And then we lowered the image convey the image to RGV.

93
00:06:04,080 --> 00:06:05,780
Then we reshape the image accordingly.

94
00:06:05,790 --> 00:06:11,820
And the reason we reshape this image is because we do need it to be in a certain format for the CEOs

95
00:06:11,820 --> 00:06:12,630
that function.

96
00:06:13,140 --> 00:06:15,540
So we run, we create documents, clusters.

97
00:06:15,540 --> 00:06:17,640
We're going to go five clusters here.

98
00:06:18,060 --> 00:06:20,940
So we're going to get the five most dominant colors out of this image.

99
00:06:21,600 --> 00:06:24,200
And then we run the K means clustering model on this.

100
00:06:24,210 --> 00:06:28,080
So we just do Kielty dot fit as we created a seal.

101
00:06:28,090 --> 00:06:30,000
The object, which is a key, means clustering object.

102
00:06:30,570 --> 00:06:35,130
And then we pass the input to it, which is which has been reshaped into this format here.

103
00:06:35,580 --> 00:06:38,790
If you do want to inspect what the ship is, you can.

104
00:06:39,300 --> 00:06:45,210
And actually, let's do that right now so we can see the difference in shape so we can do image dot

105
00:06:45,510 --> 00:06:52,740
shape and then we can do it again here to see what changes after it.

106
00:06:53,160 --> 00:06:54,240
So let's print this.

107
00:06:57,310 --> 00:07:00,160
So you can see this is the shape here, so what do you think happened?

108
00:07:00,700 --> 00:07:03,970
Basically, what happened is that it flattened this axis here.

109
00:07:04,390 --> 00:07:09,790
So instead of having to to a three dimensional image, it has a two dimensional image.

110
00:07:09,800 --> 00:07:13,570
Yes, it flattened the axis here for each color components.

111
00:07:14,470 --> 00:07:16,270
And that's why it's still tree the depth of tree.

112
00:07:18,010 --> 00:07:23,590
So you can see we got an error here, and that's basically because we didn't run this block of code

113
00:07:23,590 --> 00:07:29,110
before, which is the helper functions that we need to complete this exercise so we won't come back

114
00:07:29,110 --> 00:07:29,590
to this.

115
00:07:29,800 --> 00:07:38,400
You can see after we get after we get the silty object, and this year we pass it to the this century

116
00:07:38,410 --> 00:07:39,460
algorithm function.

117
00:07:40,420 --> 00:07:41,410
So what does that do?

118
00:07:42,040 --> 00:07:47,680
Centroid algorithm creates a histogram of for the clusters based on the pixels of each cluster.

119
00:07:47,710 --> 00:07:53,230
So after we do care clustering on it, we now need to actually grouped the clusters together.

120
00:07:54,100 --> 00:07:59,080
So this is what this function does, and it returns the histogram right here.

121
00:07:59,890 --> 00:08:01,240
Next, let's move on.

122
00:08:01,600 --> 00:08:02,810
We do the plot colors.

123
00:08:02,830 --> 00:08:09,130
This is the one that takes the output of this its highest, and it takes a census here and then just

124
00:08:09,130 --> 00:08:15,730
plots bar graph where we actually do do a distribution again of, well, it's not really a bar graph,

125
00:08:15,730 --> 00:08:21,190
it just a ball distribution like a bar shaped plot, and it gives us the distribution of the colors

126
00:08:21,190 --> 00:08:21,490
here.

127
00:08:21,520 --> 00:08:27,370
So you can see in this image here, you can see this probably look kind of a brown looking color.

128
00:08:27,760 --> 00:08:28,990
Then there's some dark green.

129
00:08:28,990 --> 00:08:32,620
There's some maybe some yellows, some light colors over here.

130
00:08:33,100 --> 00:08:35,830
So let's take a look and see what ki-moon's did with the colors.

131
00:08:36,640 --> 00:08:43,120
So let's run this code now, and you can see this good will take some time to run because it has to

132
00:08:43,120 --> 00:08:46,370
perform the ki-moon's clustering on roughly.

133
00:08:46,390 --> 00:08:51,190
This is about a million two million pixels, so it does take some time to run.

134
00:08:52,760 --> 00:08:55,560
OK, so there we go, and now we can see the dominant colors here.

135
00:08:55,580 --> 00:08:58,340
You can see there's a sky color as the dominant color.

136
00:08:58,340 --> 00:09:03,500
This is olive green color, which is probably the average color that's of these top mountains here.

137
00:09:03,920 --> 00:09:09,770
Then this dark kind of mostly looking green, which is probably to set the dark horse on the Valkyries.

138
00:09:09,770 --> 00:09:13,400
Here, then, is a light olive green, which is probably this area here.

139
00:09:13,850 --> 00:09:17,930
And then there's a brown, which is probably these areas here, as well as some of the flowers here.

140
00:09:18,710 --> 00:09:23,360
So it's a really cool way to do this so we can now break up images into color components like that.

141
00:09:23,370 --> 00:09:27,350
And if you wanted, you can do more clusters in this, which would be interesting as well.

142
00:09:27,950 --> 00:09:30,080
So now let's run off in a different image.

143
00:09:30,110 --> 00:09:34,940
This is a picture of me playing beach volleyball in Barbados, actually with some friends.

144
00:09:36,500 --> 00:09:38,180
So you can see this.

145
00:09:38,180 --> 00:09:46,640
This mystic is the code where I renamed the functions to Camel Case, and I did not update that properly,

146
00:09:47,090 --> 00:09:48,380
so we'll fix that.

147
00:09:49,010 --> 00:09:53,060
And now we run this, and it should run faster because this is a much smaller image.

148
00:09:54,980 --> 00:09:55,430
There we go.

149
00:09:55,520 --> 00:09:56,750
So you can see this make sense.

150
00:09:57,170 --> 00:09:59,150
Is this blue for the water?

151
00:09:59,480 --> 00:10:04,190
There's black, which is my hair and beard, as well as a bit of the background, perhaps.

152
00:10:04,550 --> 00:10:07,190
And then this is brown, which is matching their skin tone.

153
00:10:07,520 --> 00:10:09,590
So this is a pretty nifty and pretty cool.

154
00:10:10,340 --> 00:10:12,350
So that concludes this lesson.

155
00:10:12,890 --> 00:10:13,270
Oh, good.

156
00:10:13,340 --> 00:10:19,160
Now, move on to the next lesson, which is comparing images so we can actually see the differences

157
00:10:19,160 --> 00:10:20,210
between images.

158
00:10:20,600 --> 00:10:22,400
Let's see how much to how much they change.

159
00:10:23,000 --> 00:10:25,880
And then take a look at structural similarity of images.

160
00:10:26,420 --> 00:10:27,950
So I'll see you in the next lesson.

161
00:10:28,070 --> 00:10:28,520
Thank you.
