1
00:00:11,680 --> 00:00:14,060
In this lecture we are going to build our model.

2
00:00:14,080 --> 00:00:17,250
The lost function in train our Siamese network.

3
00:00:17,530 --> 00:00:23,140
To start we're going to define our network which actually itself contains an entire neural network.

4
00:00:23,200 --> 00:00:28,260
As usual you can feel free to play around with these hyper parameters and choose your own architecture.

5
00:00:28,270 --> 00:00:33,160
This CNN is just what I happened to copy from the fashion amnesty example earlier in the course

6
00:00:37,690 --> 00:00:42,910
notice that instead of a final soft Max we end with a dense layer that has no activation.

7
00:00:42,910 --> 00:00:44,740
Since we are not doing classification

8
00:00:48,780 --> 00:00:51,210
in the forward function it's very simple.

9
00:00:51,330 --> 00:00:54,110
First as input we accept two batches of images.

10
00:00:54,120 --> 00:00:55,850
I am one and I am too.

11
00:00:56,070 --> 00:01:01,860
We pass both of these through the same CNN to get feet 1 and feet to the embedding for both sets of

12
00:01:01,860 --> 00:01:02,490
images.

13
00:01:03,210 --> 00:01:08,530
Finally we calculate the Euclidean distance between two feet one in feet two for each row.

14
00:01:08,700 --> 00:01:15,120
So if we add any input pairs will have an output distances.

15
00:01:15,250 --> 00:01:21,540
Next we create an instance of our model and set the feature dimension to be 50.

16
00:01:21,620 --> 00:01:23,420
Next we move the model to the GP you

17
00:01:27,460 --> 00:01:29,400
Next we create our custom lost function.

18
00:01:29,410 --> 00:01:35,410
The contrast of loss the first argument is the prediction of Y and the second argument is the target

19
00:01:35,410 --> 00:01:37,810
t inside this function.

20
00:01:37,810 --> 00:01:42,080
We start by calculating the squared distance for the non matching pairs.

21
00:01:42,220 --> 00:01:47,140
This is that funny calculation you saw in the theory lecture where we subtract y from the margin and

22
00:01:47,140 --> 00:01:47,630
so forth.

23
00:01:48,550 --> 00:01:51,890
So here you see we're using a margin value of 1.

24
00:01:51,940 --> 00:01:56,820
This means that for non matching pairs we want the distance to be at least 1.

25
00:01:56,980 --> 00:02:00,220
But as long as it's bigger than that we don't really care how big it is.

26
00:02:02,330 --> 00:02:08,780
Now from the formula you know that we want to take the max between the margin minus Y and 0 but you'll

27
00:02:08,780 --> 00:02:11,900
recognize that this is exactly the value function.

28
00:02:11,900 --> 00:02:16,290
So we can just use the built in real you in PI to urge.

29
00:02:16,400 --> 00:02:21,680
Next we pass this calculation into the rest of the loss which is relatively simple and take the mean

30
00:02:21,680 --> 00:02:22,550
over each sample

31
00:02:26,820 --> 00:02:28,860
next we create our optimizer.

32
00:02:28,860 --> 00:02:32,630
Usually I say loss and optimizer but that is not the case in the script.

33
00:02:32,730 --> 00:02:34,950
Since our loss isn't one of the built in losses

34
00:02:38,880 --> 00:02:40,320
next we have our training function.

35
00:02:40,920 --> 00:02:43,160
So what's different about this.

36
00:02:43,350 --> 00:02:48,810
First you'll notice that in addition to the train generator and the test generator we also pass in the

37
00:02:48,810 --> 00:02:51,720
values for the number of steps per epoch.

38
00:02:51,720 --> 00:02:56,940
This is essential when you use a generator because if you don't know when an epoch is finished you'll

39
00:02:56,940 --> 00:02:59,970
just end up in an infinite loop.

40
00:02:59,970 --> 00:03:03,620
Recall that our generators have infinite while loops inside of them.

41
00:03:03,630 --> 00:03:05,780
This carries over from tends to flow and carries.

42
00:03:05,790 --> 00:03:08,760
So if you've ever used these libraries it should seem familiar

43
00:03:14,080 --> 00:03:14,830
inside the loop.

44
00:03:14,830 --> 00:03:20,950
It's mostly the usual stuff we loop over the train generator and on each iteration we get x1 x2 and

45
00:03:20,950 --> 00:03:24,560
the targets since that's what is yielded by our generator.

46
00:03:24,670 --> 00:03:28,760
We move that to the GP U and so on and so forth.

47
00:03:28,780 --> 00:03:33,430
One minor difference is that I have this variable called steps that counts the number of steps I've

48
00:03:33,430 --> 00:03:34,660
completed so far.

49
00:03:37,300 --> 00:03:40,590
So at the end of our inner loop I incremental steps by 1.

50
00:03:40,840 --> 00:03:44,950
And if it's greater than or equal to the steps per epoch I break out of the inner loop

51
00:03:48,490 --> 00:03:52,510
so it's the same basic idea for the test generator except no gradient descent.

52
00:03:57,760 --> 00:04:00,420
Okay so next we call a training function.

53
00:04:00,850 --> 00:04:04,270
As you can see this train is quite fast.

54
00:04:04,270 --> 00:04:05,470
Pretty good for a CNN

55
00:04:09,250 --> 00:04:12,050
X. We plot the last pre iteration which looks OK

56
00:04:17,210 --> 00:04:23,220
and that's it for now in the next lecture we'll focus on how to evaluate our train to Siamese network.