1
00:00:00,150 --> 00:00:06,780
Hi and welcome back in this lesson, we'll take a look at a very cool project, which is using a pre-trained,

2
00:00:06,780 --> 00:00:12,270
fig faced Siamese network and using it in cameras to recognize faces.

3
00:00:12,690 --> 00:00:15,960
So let's get started and we do a number of things in this lesson.

4
00:00:15,960 --> 00:00:22,560
Let's just recognize faces, but we do facial similarity, as well as taking a clip from the Friends

5
00:00:22,560 --> 00:00:25,170
TV show and recognizing the characters on screen.

6
00:00:25,800 --> 00:00:26,770
So it's pretty cool.

7
00:00:26,790 --> 00:00:32,940
So let's get started, and just for your reference, we're using the video fierce model fist.

8
00:00:33,210 --> 00:00:38,050
Yeah, this model that was published by these Oxford researchers in 2015.

9
00:00:38,070 --> 00:00:41,790
So if you want to take a look at the paper, you can take a look and learn more about this model.

10
00:00:42,390 --> 00:00:47,280
So firstly, to get started with this lesson, we have to download the test images that we'll be using

11
00:00:47,280 --> 00:00:48,270
for this experiment.

12
00:00:48,780 --> 00:00:50,540
So just run this module here for.

13
00:00:51,000 --> 00:00:55,110
Run it and import our libraries, which you will do as well.

14
00:00:55,620 --> 00:01:01,260
And then what we do here, we define the fierce model architecture.

15
00:01:01,710 --> 00:01:07,420
And you can see this looks quite familiar to a big architecture, and that's why it's linked to video

16
00:01:07,440 --> 00:01:08,010
device.

17
00:01:08,490 --> 00:01:13,350
It basically is a copy of that network with all the stacked convolutional layers here.

18
00:01:14,010 --> 00:01:18,270
But however, we don't output to the classes that we want to classify.

19
00:01:18,280 --> 00:01:23,850
We output to a one by two six two two vector, and that's a fish embedding.

20
00:01:24,450 --> 00:01:31,260
So when we when we pass this terrorist network, it produces that output vector embedding.

21
00:01:31,860 --> 00:01:33,820
But how do we get that up?

22
00:01:33,890 --> 00:01:36,870
But embedding houses was were trained well.

23
00:01:36,870 --> 00:01:41,940
This network was previously trained on those 2.6 million images, which you saw in the paper and in

24
00:01:41,940 --> 00:01:42,900
the slides I mentioned.

25
00:01:43,470 --> 00:01:44,260
So what we do?

26
00:01:44,280 --> 00:01:45,330
We don't know.

27
00:01:45,840 --> 00:01:47,820
So none of you have downloaded the widths.

28
00:01:48,180 --> 00:01:51,990
You can now load those weights onto the model we defined above here.

29
00:01:52,470 --> 00:01:57,780
So we just do modeled off load weights and into the path to what within weights we downloaded.

30
00:01:57,780 --> 00:01:59,850
And that's the name of the file that we downloaded here.

31
00:01:59,970 --> 00:02:05,400
No bits from director Rupert and just run that and it attaches to weights to the network.

32
00:02:06,180 --> 00:02:11,940
Next, we're going to define a cosine distance function as well as a pre process image function.

33
00:02:11,940 --> 00:02:15,510
But this function isn't too important to just a simple function that takes the image.

34
00:02:15,510 --> 00:02:22,170
We sizes it and converted into a array so that we can pre process it or process it with our CNN.

35
00:02:22,890 --> 00:02:26,400
So what we have here is a cosine similarity function.

36
00:02:26,880 --> 00:02:29,940
What this does, this takes two of those vectors.

37
00:02:29,940 --> 00:02:33,470
Remember the 26 22 size vector?

38
00:02:33,480 --> 00:02:35,010
That's the embedding for the face.

39
00:02:35,040 --> 00:02:41,250
It takes two inputs of two phases two pieces that have been converted into the into data embedding and

40
00:02:41,250 --> 00:02:44,160
outputs the cosine similarity score between them.

41
00:02:44,910 --> 00:02:47,070
And finally, we just have a little model.

42
00:02:47,320 --> 00:02:52,860
You'll find this is a funnel model that outputs that vector, so you can see how we create a model here

43
00:02:52,860 --> 00:02:59,970
with specified inputs here and the outputs and just create this model function that's across TensorFlow

44
00:02:59,970 --> 00:03:02,130
for function to combine everything together.

45
00:03:02,640 --> 00:03:07,200
And we get our final model that produces a big first descriptor.

46
00:03:08,100 --> 00:03:14,590
So know what we're going to do in the first experiment is we're going to implement visual similarity.

47
00:03:14,640 --> 00:03:17,130
So we're going to find a similarity between phases.

48
00:03:17,670 --> 00:03:19,320
So let's take a look at this function.

49
00:03:20,160 --> 00:03:22,650
So firstly, epsilon is our threshold value.

50
00:03:23,100 --> 00:03:28,050
Whether if this is similar or not, depends on if it's more or less than this point for.

51
00:03:28,530 --> 00:03:30,510
You can see that in the bottom of the code here.

52
00:03:31,050 --> 00:03:32,780
But for now, this is quite simple.

53
00:03:32,790 --> 00:03:34,020
Let's take a look at this network.

54
00:03:34,050 --> 00:03:37,710
So we have two images as the input arguments to this function.

55
00:03:38,340 --> 00:03:45,810
Then we have a visually descriptive model, and so we do that model don't predict and we just passed

56
00:03:46,020 --> 00:03:47,140
some images to it.

57
00:03:47,160 --> 00:03:52,410
So we have the image one here specifying right here and this is just a path that we're attaching it

58
00:03:52,410 --> 00:03:52,710
to.

59
00:03:53,340 --> 00:03:56,310
So we get the image one representation from that.

60
00:03:56,760 --> 00:03:59,610
Secondly, we get the second image here.

61
00:03:59,610 --> 00:04:00,240
Image two.

62
00:04:00,720 --> 00:04:03,030
We get its image representation.

63
00:04:03,030 --> 00:04:05,940
That's the embedding vector for it right there.

64
00:04:06,510 --> 00:04:12,870
Then we can take these two representations and passive to or cosine similarity function to get the cosine

65
00:04:12,870 --> 00:04:13,980
similarity score.

66
00:04:14,760 --> 00:04:20,220
And then what we do, what we do is just create some subplots here so we can plot the images side by

67
00:04:20,220 --> 00:04:20,580
side.

68
00:04:20,880 --> 00:04:22,590
So let's run that function.

69
00:04:23,270 --> 00:04:24,750
And now let's run a few tests.

70
00:04:24,990 --> 00:04:26,700
So these are two pictures of my wife.

71
00:04:27,360 --> 00:04:33,480
And you can see let's see if it tells us if the same person loops script is not defined.

72
00:04:33,510 --> 00:04:36,510
That's because we didn't run this block of code.

73
00:04:36,630 --> 00:04:39,240
Let's go back to it now.

74
00:04:39,240 --> 00:04:40,080
Let's run this.

75
00:04:42,940 --> 00:04:48,580
And you can see it give us a very good, very look similar to a school, meaning that it's the same

76
00:04:48,580 --> 00:04:51,790
person, so let's zoom in slightly and you can see the resemblance.

77
00:04:51,790 --> 00:04:55,270
Yes, obviously, because they are the same person.

78
00:04:55,720 --> 00:04:57,770
No, let's try some more images of her.

79
00:04:57,790 --> 00:04:59,500
So these are two of the images here.

80
00:05:00,250 --> 00:05:01,150
Now look at this one.

81
00:05:01,660 --> 00:05:04,360
These are two images of my wife, same person.

82
00:05:04,930 --> 00:05:07,480
However, the similarity scores point five.

83
00:05:07,570 --> 00:05:09,250
So it says they're not the same person.

84
00:05:09,670 --> 00:05:13,750
Remember, if it's less than 0.4, it's considered the same person.

85
00:05:13,750 --> 00:05:15,370
And if not, it's not the same person.

86
00:05:15,790 --> 00:05:19,930
So this image these images here are apparently two different.

87
00:05:19,930 --> 00:05:27,220
She looks two different than them for an official photo of our fidgety face similarity score to actually

88
00:05:27,220 --> 00:05:27,730
hold up.

89
00:05:28,510 --> 00:05:32,740
So let's try to compare to J.Lo or Jennifer Lopez.

90
00:05:34,850 --> 00:05:37,330
And we can see, expectedly, they're not the same person.

91
00:05:37,420 --> 00:05:41,560
And of the similarity score is slightly lower than this one when it was the same person.

92
00:05:42,280 --> 00:05:48,110
Now let's and J.Lo and Lady Gaga, and rightfully so.

93
00:05:48,130 --> 00:05:49,240
They're not the same person.

94
00:05:49,960 --> 00:05:51,550
So we'll stop there for now.

95
00:05:51,550 --> 00:05:58,120
And then the next section next video lecture, we'll take a look at facial recognition with one submitting.

96
00:05:58,510 --> 00:06:04,300
So in this lesson, what we do with this have a database of pictures.

97
00:06:04,300 --> 00:06:08,620
And then much of this, according to that, the database set that we have.

98
00:06:09,580 --> 00:06:10,600
So stay tuned for that.

99
00:06:10,720 --> 00:06:11,200
Thank you to.