1
00:00:00,420 --> 00:00:00,880
Hi, guys.

2
00:00:00,900 --> 00:00:07,350
Welcome back to the course in this section, we'll take a look at using metric learning to do image

3
00:00:07,350 --> 00:00:13,890
similarity searches, which is a very important field in the vision or anything, to be honest, because

4
00:00:14,370 --> 00:00:18,060
finding similar images is something that's quite useful.

5
00:00:18,780 --> 00:00:23,760
We think of search functions, think of things like signature matching, the fingerprint searches,

6
00:00:24,150 --> 00:00:25,040
that type of stuff.

7
00:00:25,050 --> 00:00:27,330
This is a very, very useful technique.

8
00:00:27,450 --> 00:00:29,490
So let's get started.

9
00:00:29,580 --> 00:00:35,170
So open Notebook 64, and we'll begin to listen and scroll up to the top.

10
00:00:35,190 --> 00:00:40,700
I just want to point out this notebook again comes from the Keros tutorial site.

11
00:00:40,710 --> 00:00:44,130
You official site and author is Matt Kelsey.

12
00:00:44,130 --> 00:00:46,830
So credit goes to him and what we're doing.

13
00:00:47,280 --> 00:00:51,990
We're going to do use similarity metric living on sofar 10 images.

14
00:00:52,440 --> 00:01:00,900
So in a nutshell, what metric learning aims to do is train a model that can embed the input the image

15
00:01:01,260 --> 00:01:08,700
into a higher dimensional space, such that similar inputs or similar images are defined by basically

16
00:01:08,700 --> 00:01:09,990
being close to each other.

17
00:01:10,530 --> 00:01:12,660
So you can imagine how useful this is.

18
00:01:12,810 --> 00:01:15,390
So if you want to learn more, you can check out these links here.

19
00:01:16,140 --> 00:01:21,750
So let's begin the lessons of firstly, we look at all libraries that we'll be using the next next load

20
00:01:22,410 --> 00:01:24,260
this effort and dataset.

21
00:01:24,270 --> 00:01:30,810
And no, we're just going to display 25 random images from the dataset, and you can see an upsurge

22
00:01:30,810 --> 00:01:34,500
of dog and cat, blah blah blah for us as well.

23
00:01:35,160 --> 00:01:43,830
So what metric learning is going to do is provide the training data not as explicit X-Y peers, but

24
00:01:43,830 --> 00:01:49,980
instead uses multiple instances that originated in the way we want to express similarity.

25
00:01:50,490 --> 00:01:57,720
So another example we're expressing the similarity by using the same class of image.

26
00:01:58,200 --> 00:01:59,670
So that's how we're going to do it.

27
00:01:59,850 --> 00:02:01,300
And so let's begin.

28
00:02:01,320 --> 00:02:03,720
So we just run this block of code here.

29
00:02:04,080 --> 00:02:11,760
Next, what we're going to do is create our anchor positive pairs because what's happening here is that

30
00:02:11,760 --> 00:02:16,800
in training, we're trying to create to both the input to a batch that consists of the anchor and the

31
00:02:16,800 --> 00:02:17,910
positive peer.

32
00:02:18,390 --> 00:02:24,930
So the goal of the network in training is to basically move the anchor positive pairs closer together

33
00:02:25,380 --> 00:02:27,750
and further away from other instances in the batch.

34
00:02:27,870 --> 00:02:30,480
So you can understand how that's going to work.

35
00:02:30,840 --> 00:02:32,460
So let's continue.

36
00:02:32,940 --> 00:02:35,790
So you can see this is an anchor of positive pins here.

37
00:02:36,360 --> 00:02:37,590
You can see the pins.

38
00:02:37,590 --> 00:02:39,720
I believe they're showing like this here vertically.

39
00:02:40,200 --> 00:02:48,270
You can see the similar class cut their dog, frog, horse boats and truck.

40
00:02:49,410 --> 00:02:55,200
So now we're going to define a custom model with a custom training was started with a training step

41
00:02:55,650 --> 00:03:03,780
that first embeds two anchors and positives and then uses a pairwise dog product as budgets for the

42
00:03:03,780 --> 00:03:04,560
soft max layer.

43
00:03:04,680 --> 00:03:10,740
So sometimes it's a bit of a mouthful, but you can take a look at this code here, so restore the training

44
00:03:10,740 --> 00:03:11,640
step here.

45
00:03:12,450 --> 00:03:19,500
We get the anchor embedding there, and then we just get the gradients and gradient gradient tape to

46
00:03:19,500 --> 00:03:21,240
optimize this gradients after.

47
00:03:21,930 --> 00:03:27,360
And then we update and return metrics, specifically the one for loss value here.

48
00:03:28,650 --> 00:03:31,860
Next, this is our embedding model here.

49
00:03:32,310 --> 00:03:35,390
So we created this embedding portal based on these inputs here.

50
00:03:36,520 --> 00:03:39,000
And now we can train this model.

51
00:03:39,240 --> 00:03:46,410
So we just compile that model that fits with our and compares us to inputs to epochs are trained for

52
00:03:46,890 --> 00:03:48,780
and you can see the loss of the end.

53
00:03:48,990 --> 00:03:54,150
This is gone down steadily, and if we train for longer, it most likely would have gone down even lower.

54
00:03:54,570 --> 00:03:59,370
So now we can take a look and see how it performs so we can run a test.

55
00:03:59,400 --> 00:04:08,820
So let's look at 10 meters per example here, and we can see this is a collage of similar images here.

56
00:04:09,360 --> 00:04:14,820
So you can see this is the top ten here, so you can see the most similar images to this first class

57
00:04:14,820 --> 00:04:16,200
here, which looks like a cat.

58
00:04:16,860 --> 00:04:22,320
All these are the images here, which so much cats and dogs, but you can see they all looks somewhat

59
00:04:22,320 --> 00:04:28,830
similar in this in the sense, similarly for the boats here as well, and frogmen can definitely see

60
00:04:28,920 --> 00:04:29,970
it as well.

61
00:04:30,060 --> 00:04:30,810
Quite similar.

62
00:04:31,080 --> 00:04:35,580
So you can get a it depends on what you use as a as a similar metric.

63
00:04:35,580 --> 00:04:40,740
We just use classes here and there was a lot of variety in the classes, so.

64
00:04:40,980 --> 00:04:43,830
So how do you know if this is performing well?

65
00:04:44,400 --> 00:04:50,670
Well, if you wanted to get a quantified view of how the performance of correctness of the matches were

66
00:04:50,670 --> 00:04:58,500
done, we can take the 10 samples from the classes and then consider the 10 neighbors as a form of prediction

67
00:04:58,650 --> 00:04:59,370
so that.

68
00:04:59,860 --> 00:05:03,070
You can see if the nearest neighbor is of the same class effectively.

69
00:05:03,190 --> 00:05:09,250
And we can create a confusion matrix out of its here, and you can see in the end the confusion matrix

70
00:05:09,250 --> 00:05:15,060
supposed to have a high diagonal here with the sensors, you can see it's doing OK.

71
00:05:15,070 --> 00:05:20,020
It's not great, but that's basically just after training to two epochs.

72
00:05:20,020 --> 00:05:20,890
Not very much.

73
00:05:21,470 --> 00:05:27,190
So hopefully you enjoyed that lesson or image similarity search using metric learning.

74
00:05:27,870 --> 00:05:33,850
I think it's pretty valuable, and you can build lot upon this for your internal projects or whatever

75
00:05:33,850 --> 00:05:34,570
you want to do.

76
00:05:35,140 --> 00:05:40,720
I'll stop there now, and in the next lesson, we'll take a look at image captioning, which is a very

77
00:05:40,720 --> 00:05:41,530
cool lesson.

78
00:05:42,010 --> 00:05:47,320
We'll take a look at using Keros to implement an image captioning model, so I'll see you in the next

79
00:05:47,320 --> 00:05:47,650
lesson.

80
00:05:47,680 --> 00:05:48,520
Thank you for watching.
