1
00:00:11,700 --> 00:00:17,370
In this lecture we are going to begin looking at our facial recognition code with Siamese networks.

2
00:00:17,490 --> 00:00:22,170
As usual you can look at the title of the notebook to determine what notebook we are currently looking

3
00:00:22,170 --> 00:00:23,340
at.

4
00:00:23,430 --> 00:00:28,560
Since this is a large piece of code it's going to span over several lectures with each lecture focused

5
00:00:28,560 --> 00:00:29,840
on a single theme.

6
00:00:30,090 --> 00:00:33,420
The theme of this lecture will be loading in the data.

7
00:00:33,420 --> 00:00:38,330
As I mentioned previously the actual theory and model behind this project are pretty simple.

8
00:00:38,520 --> 00:00:42,510
And most of the work will go into loading in the data and processing it correctly

9
00:00:48,810 --> 00:00:52,680
so the first thing you'll notice in this script is the strange imports.

10
00:00:52,680 --> 00:00:54,030
This course is about pi talk.

11
00:00:54,060 --> 00:00:56,480
But I'm importing tens of flowing cars.

12
00:00:56,550 --> 00:00:58,200
Why am I doing that.

13
00:00:58,200 --> 00:01:02,750
Well first let's start by recognizing that these are both just Python libraries.

14
00:01:02,790 --> 00:01:06,400
There's no reason you can't use multiple Python libraries simultaneously.

15
00:01:06,450 --> 00:01:11,790
As we often do for example we use an empire map plot lived side by side.

16
00:01:11,790 --> 00:01:17,430
So just because pi torch and sensor flow overlap in what they do in the sense that they're both ways

17
00:01:17,430 --> 00:01:22,290
of building neural networks it's totally acceptable to use a different library that has some useful

18
00:01:22,290 --> 00:01:26,610
functionality even if the neuron that were parts of these libraries are incompatible

19
00:01:33,190 --> 00:01:35,200
so first let's download our dataset.

20
00:01:35,500 --> 00:01:37,640
We'll be using the Yale face data set.

21
00:01:37,690 --> 00:01:42,300
It's just a bunch of different faces that have different emotions like happy sad and so forth

22
00:01:46,160 --> 00:01:46,710
next.

23
00:01:46,730 --> 00:01:51,020
Since this is a zip file we're going to unzip it using the unzip command

24
00:01:56,800 --> 00:02:03,220
next we're going to get the file paths of all the images using the glob function you'll notice that

25
00:02:03,250 --> 00:02:09,700
while all the images are gifts they don't end in the file extension that G I f they have file names

26
00:02:09,700 --> 00:02:12,540
like subjects 0 1 one happy subject.

27
00:02:12,580 --> 00:02:14,330
Oh one not sad and so forth.

28
00:02:14,770 --> 00:02:18,280
So we are looking for any file that starts with the string subject.

29
00:02:21,100 --> 00:02:26,410
Next we are going to shuffle the file pass which will be important later when we create our train and

30
00:02:26,410 --> 00:02:30,740
test sets.

31
00:02:30,750 --> 00:02:37,920
Next we're going to set the number of files to a variable called n so we have one hundred sixty six

32
00:02:37,920 --> 00:02:43,530
images.

33
00:02:43,620 --> 00:02:50,230
Next we have a function that loads in the data from a file path and returns an umpire a there are multiple

34
00:02:50,230 --> 00:02:51,210
things going on here.

35
00:02:51,240 --> 00:02:53,330
So let's break it down.

36
00:02:53,390 --> 00:02:59,270
First we call image dot load AMG to load in the file as an image object.

37
00:02:59,270 --> 00:03:03,060
This also allows us to resize the image to 60 by 80.

38
00:03:03,080 --> 00:03:09,000
Which I found leaves a smaller memory footprint and doesn't prevent us from getting good results.

39
00:03:09,020 --> 00:03:16,190
The original image size was 243 by 320 which was unnecessarily large.

40
00:03:16,190 --> 00:03:21,560
Next we call I am g to array to convert the image object into an umpire.

41
00:03:22,610 --> 00:03:29,030
And finally we convert it to a U.N. eight which takes up only eight bits per pixel rather than 32 or

42
00:03:29,110 --> 00:03:30,830
64 for a float or double

43
00:03:34,360 --> 00:03:34,850
next.

44
00:03:34,870 --> 00:03:39,140
As I always like to do I'm going to plot a random image from the data.

45
00:03:39,340 --> 00:03:40,960
Just for fun.

46
00:03:40,960 --> 00:03:43,600
This always helps me get into the flow of things when I'm coding

47
00:03:52,230 --> 00:03:53,790
if we took the shape of our image.

48
00:03:53,820 --> 00:03:58,020
You might expect to see 60 by 80 since well that's what we specify.

49
00:03:58,020 --> 00:04:04,230
We want the size to be but something weird is happening here if you look closely the actual image shape

50
00:04:04,260 --> 00:04:06,530
is 60 by 80 by 3.

51
00:04:06,690 --> 00:04:12,020
That just seem weird to you because if we look at these images they are clearly grayscale images yet

52
00:04:12,050 --> 00:04:16,590
they are stored as color images with three color channels as an exercise.

53
00:04:16,590 --> 00:04:21,480
I would recommend verifying that each of these channels actually have the same values and so they are

54
00:04:21,480 --> 00:04:22,630
redundant.

55
00:04:22,860 --> 00:04:28,220
Therefore we lose nothing by converting these technically color images integrate scale.

56
00:04:28,560 --> 00:04:31,890
We can do that by taking the mean along the last axis

57
00:04:37,490 --> 00:04:42,890
next we're going to load in the images into a single num higher array so its end by whatever the image

58
00:04:42,890 --> 00:04:44,640
shape is 60 by 80.

59
00:04:45,960 --> 00:04:51,480
So all we do here is we live through each of the image file paths we call the lower AMG function which

60
00:04:51,480 --> 00:05:02,000
we just define and then we store the image in our big array of images.

61
00:05:02,010 --> 00:05:08,180
Next we're going to create an array of labels of length n these are not our binary labels.

62
00:05:08,260 --> 00:05:13,240
These are just labels telling us which subjects the corresponding images of.

63
00:05:13,270 --> 00:05:18,610
Basically this is just a bunch of manual string passing since I don't like to play with regex.

64
00:05:18,820 --> 00:05:24,250
So first we split out the file name by splitting on the last forward slash.

65
00:05:24,250 --> 00:05:28,720
Next we split out the subject's part of the file name by splitting on the first dot.

66
00:05:29,470 --> 00:05:34,980
So now we have a string that's like subject a one a subject or two subjects or three and so forth.

67
00:05:36,620 --> 00:05:42,470
Finally we just remove the subject strain so we're left with 0 1 0 2 or 3 and so on.

68
00:05:42,710 --> 00:05:47,390
And since the IDB starts from one I just subtract one so that it starts from zero.
