1
00:00:00,390 --> 00:00:08,130
Hi and welcome to the lecture on facial recognition in this section will do an overview of facial recognition

2
00:00:08,130 --> 00:00:11,790
and a quick intro into VG face and face it.

3
00:00:12,270 --> 00:00:13,460
So let's get started.

4
00:00:13,890 --> 00:00:16,740
So firstly, what is facial recognition?

5
00:00:17,310 --> 00:00:23,760
Well, it's the ability to automatically attach an individual's identity to a face, and that's something

6
00:00:23,970 --> 00:00:26,220
us humans are quite good at.

7
00:00:26,250 --> 00:00:29,460
In fact, we're incredible at it, although mean, not so much.

8
00:00:29,460 --> 00:00:33,180
I do have problems recognizing faces, but that's another story.

9
00:00:33,960 --> 00:00:36,500
Even some animals do can recognize faces.

10
00:00:36,510 --> 00:00:40,170
It's proven that dogs cruise ship, and I believe elephants can do it.

11
00:00:40,800 --> 00:00:42,270
So what about machines?

12
00:00:42,420 --> 00:00:43,560
Can machines do it?

13
00:00:44,010 --> 00:00:44,880
Yes, they can.

14
00:00:45,420 --> 00:00:51,540
They can recognize faces, and this is exactly how facial recognition algorithms rendered out.

15
00:00:51,540 --> 00:00:58,980
But they then localize and identify bounding box for office and then attach the identity based on a

16
00:00:58,980 --> 00:01:02,740
database of faces to the person it's captured here.

17
00:01:02,790 --> 00:01:07,110
So you can see this is my wife and I been recognized by a facial recognition algorithm.

18
00:01:07,500 --> 00:01:08,730
Quite cool, isn't it?

19
00:01:10,020 --> 00:01:14,580
So let's get into the history of facial recognition, and let's start with the early days.

20
00:01:15,000 --> 00:01:21,000
Well, open Siri actually has three of these historic all the facial recognition libraries built in,

21
00:01:21,450 --> 00:01:28,110
all of which do operate similarly in a way because they all take a dataset of labeled faces and incomplete

22
00:01:28,380 --> 00:01:33,930
features to represent the images after which the classifiers did little to utilize these features to

23
00:01:33,930 --> 00:01:36,420
classify quite simple in concept.

24
00:01:36,420 --> 00:01:40,720
And actually, I was saying these networks do something similar as well.

25
00:01:40,740 --> 00:01:46,560
However, we use deep learning to extract a representation of the faces, but we'll get there shortly.

26
00:01:47,250 --> 00:01:53,160
So the tree library is starting with Eigen Faces, which was developed in 1987 and says how you call

27
00:01:53,160 --> 00:01:54,150
it an open CV.

28
00:01:54,780 --> 00:02:02,180
Fisher Faces was also developed in 1997, so you can see this is quite old and local binary patterns.

29
00:02:02,190 --> 00:02:05,100
Histograms also developed in 1996.

30
00:02:05,130 --> 00:02:13,320
Again, these algorithms, they worked fairly well when you had a low number of faces and strong variation

31
00:02:13,320 --> 00:02:14,250
between images.

32
00:02:14,820 --> 00:02:20,880
However, they all want that kind of that good, and this is an illustration of how local binary pattern

33
00:02:20,880 --> 00:02:25,890
histograms is used to basically extract meaningful features of faces here.

34
00:02:26,670 --> 00:02:32,580
So now let's take a look at some modern approaches to facial recognition that utilize deep learning.

35
00:02:33,240 --> 00:02:39,270
So you've seen in the previous section that Siamese networks one of the many applications, is for facial

36
00:02:39,270 --> 00:02:39,900
recognition.

37
00:02:40,290 --> 00:02:42,480
And that's what we'll take a look at in this section.

38
00:02:42,810 --> 00:02:49,320
We'll take a look at two very popular deep learning facial recognition networks, and that was a big

39
00:02:49,380 --> 00:02:50,880
face and face net.

40
00:02:51,450 --> 00:02:53,710
So let's to get started would face that.

41
00:02:54,150 --> 00:03:01,710
So first, that was introduced by Google in 2015, and it transforms a fierce into a simple one 28 dimension

42
00:03:02,070 --> 00:03:04,080
vector Euclidean space embedding.

43
00:03:04,620 --> 00:03:08,910
And it uses the triplet loss function to train this network.

44
00:03:09,360 --> 00:03:13,290
So you can see this is an overview of it here from the official paper.

45
00:03:14,280 --> 00:03:19,980
And here's another view that actually represents it better because we're more familiar with this Siamese

46
00:03:20,130 --> 00:03:25,050
type architecture where we have two similar system networks.

47
00:03:25,410 --> 00:03:27,450
That's the first net convolution that works here.

48
00:03:27,990 --> 00:03:34,560
They create the embedding that's the one by 128 vector, and then we just find Euclidean distance between

49
00:03:34,560 --> 00:03:34,860
them.

50
00:03:35,400 --> 00:03:37,210
And that's pretty much it.

51
00:03:37,230 --> 00:03:39,240
That's how the FirstNet algorithm works.

52
00:03:40,380 --> 00:03:44,100
Now, you may have noticed something called one shot learning using FirstNet.

53
00:03:44,760 --> 00:03:47,190
Well, I don't have a slide on one shot learning.

54
00:03:47,190 --> 00:03:48,810
However, I should have created one.

55
00:03:49,410 --> 00:03:57,540
Basically, one shot learning is the ability to add just one new data input into the database and have

56
00:03:57,540 --> 00:04:00,210
the network be able to recognize that face and future.

57
00:04:00,750 --> 00:04:02,880
You don't need hundreds of samples of this data.

58
00:04:02,880 --> 00:04:08,520
You just need one instance of that face, and you can add it to your database and know the classifier

59
00:04:08,580 --> 00:04:11,730
can figure out if a face belongs to that face.

60
00:04:12,120 --> 00:04:14,150
That's effectively what one shot learning is.

61
00:04:14,520 --> 00:04:17,250
Few shot learning is technically the same thing, however.

62
00:04:17,250 --> 00:04:23,610
It uses a bit more faces, but the whole point of view means that you don't need hundreds of images.

63
00:04:23,610 --> 00:04:24,720
You just need a handful.

64
00:04:25,470 --> 00:04:32,340
So that's the advantage of using the same year networks do because they learn to encapsulate embeddings

65
00:04:32,700 --> 00:04:38,190
for the dataset they've been trained on, which is feces they can now add as pieces accordingly.

66
00:04:38,640 --> 00:04:42,660
You just create the embeddings here final, including distance between them.

67
00:04:43,200 --> 00:04:47,220
And that gives you a similarity score that you can then use to classify.

68
00:04:47,250 --> 00:04:51,840
So you can basically take a face and then look it up to a similarity.

69
00:04:51,840 --> 00:04:58,620
Search to all the faces in your database and a face that's much more similar as a face that most likely

70
00:04:58,620 --> 00:04:59,910
matches that except.

71
00:04:59,980 --> 00:05:04,290
This so let's take a look now at Viji Face.

72
00:05:04,650 --> 00:05:10,490
Well, VIDEO First was introduced by the same research group in Oxford University that produced a digital

73
00:05:10,500 --> 00:05:11,080
network.

74
00:05:11,130 --> 00:05:13,260
It's not officially called the big fix.

75
00:05:13,680 --> 00:05:19,800
The paper, published in 2015, was called deep fierce recognition, and they had a few follow up papers

76
00:05:19,800 --> 00:05:20,340
after that.

77
00:05:20,880 --> 00:05:25,740
They use a gun chip that loss and an embedding vector that was much bigger than fierceness.

78
00:05:25,740 --> 00:05:30,750
It's two thousand six hundred and twenty two or ten twenty four, depending on the configuration.

79
00:05:30,930 --> 00:05:37,410
It had different network architectures and network configurations, and input image size being 224 by

80
00:05:37,410 --> 00:05:39,510
224, which is fairly large, actually.

81
00:05:40,800 --> 00:05:43,230
So this is the biggest architecture.

82
00:05:43,230 --> 00:05:48,900
You can see this in a number of convolution layers here that output to this one dimensional vector here,

83
00:05:48,900 --> 00:05:53,250
that's one like two six two two, and you can see some of the results here.

84
00:05:54,090 --> 00:05:56,280
Actually, these are the results you can see the data set.

85
00:05:56,790 --> 00:06:02,340
The trend on that was had two thousand six hundred twenty two identities.

86
00:06:02,790 --> 00:06:08,520
So it looks like they match this number of putting this here, and it consisted of 2.6 million images.

87
00:06:08,520 --> 00:06:15,570
So you had a lot of instances of multiple faces, like maybe a roughly a thousand faces per subject.

88
00:06:15,750 --> 00:06:18,330
So here's a summary of the big performance.

89
00:06:18,330 --> 00:06:22,140
You can see how well it performs on this on their dataset.

90
00:06:22,740 --> 00:06:25,890
So you can see it got ninety eight point nine five percent accuracy.

91
00:06:26,340 --> 00:06:33,420
And when comparing it to a different dataset, here you can see the accuracy that's always here, where

92
00:06:33,420 --> 00:06:39,180
they have a parameter called equal 100 plus embedding, landing different network configurations, and

93
00:06:39,180 --> 00:06:42,270
you can see it got ninety seven point three percent accuracy.

94
00:06:42,810 --> 00:06:44,250
That's pretty impressive, isn't it?

95
00:06:44,280 --> 00:06:47,820
And LSW, by the way, it stands for level cases in the world.

96
00:06:48,240 --> 00:06:54,420
It's a very popular facial recognition dataset that exist out there for computer vision researchers

97
00:06:54,420 --> 00:06:59,760
to use and train and compare models so it will stop there for now.

98
00:07:00,240 --> 00:07:06,120
And then the next section will go back to our lab notebooks and we'll start looking at facial similarity

99
00:07:06,120 --> 00:07:12,930
using PDG face and then do some facial recognition with it and then look at using FirstNet in PyTorch,

100
00:07:13,500 --> 00:07:15,950
as well as some other facial recognition libraries.

101
00:07:15,970 --> 00:07:17,250
We'll take a look at afterward.

102
00:07:17,670 --> 00:07:18,420
So stay tuned.

103
00:07:18,420 --> 00:07:21,150
We've got a lot of facial recognition experiments coming up.

104
00:07:21,390 --> 00:07:21,810
Thank you.
