1
00:00:00,120 --> 00:00:00,550
Hi, guys.

2
00:00:00,570 --> 00:00:01,200
Welcome back.

3
00:00:01,290 --> 00:00:08,820
So in this section, we'll take a look at using Big Transfer, which is a transfer learning paradigm

4
00:00:08,820 --> 00:00:13,830
or recipe book for increase for classifying images.

5
00:00:14,010 --> 00:00:20,310
So it's very, very useful technique, I'd say, to create your own classifiers that get excellent performance.

6
00:00:20,850 --> 00:00:22,320
So let's get started.

7
00:00:22,350 --> 00:00:25,980
So open notebook sixty two and begin to lesson.

8
00:00:26,490 --> 00:00:32,970
So once again, I'll just point out that this came from the official Carris tutorial site, and the

9
00:00:32,970 --> 00:00:36,010
author is say a lot and will begin.

10
00:00:36,060 --> 00:00:43,530
So firstly, let me just explain to you what big transfer is known as byte, similar to how vision transformers

11
00:00:43,530 --> 00:00:44,100
have VAT.

12
00:00:45,420 --> 00:00:51,090
But basically, it just it's just a way we can slow it pre-training that works on big data sets.

13
00:00:51,090 --> 00:00:57,540
And these pre-trip networks are like things like a resonant 50 or resonant 150 to a pretty deep and

14
00:00:57,540 --> 00:00:58,470
wide resonant.

15
00:00:59,280 --> 00:01:06,640
And basically, it allows us to just load that network as a feature extractor and then train it on our

16
00:01:06,640 --> 00:01:08,940
own data set and get some very good results.

17
00:01:09,300 --> 00:01:18,900
And you can see VAT, which is in blue, has gotten better results than the best state of the art networks

18
00:01:18,900 --> 00:01:20,910
here, at least generalist.

19
00:01:21,270 --> 00:01:22,350
State of the art networks.

20
00:01:22,350 --> 00:01:28,950
Not fine grained classifiers that might give better results, but you can see we're getting very good

21
00:01:28,950 --> 00:01:30,240
results using this paradigm.

22
00:01:30,750 --> 00:01:32,370
So let's get started.

23
00:01:32,370 --> 00:01:34,110
So let's run the setup here.

24
00:01:34,530 --> 00:01:37,020
This takes about three seconds right now.

25
00:01:37,500 --> 00:01:44,800
Then we lowered the flower flows dataset, which takes about eight seconds as well, and we can visualize

26
00:01:44,960 --> 00:01:49,350
the researcher seeing some images of the flowers in the dataset.

27
00:01:50,040 --> 00:01:55,680
And the flowers dataset consists of 17 image classes, by the way, just in case you didn't know.

28
00:01:56,460 --> 00:02:01,410
Next, we need to set some parameters, so we resizing the images to create a forward is actually fairly

29
00:02:01,410 --> 00:02:07,110
large, but then we crop to 224 afterwards and send that image that size anyway.

30
00:02:07,620 --> 00:02:09,990
And then we just said, these are the parameters here.

31
00:02:10,590 --> 00:02:13,070
Actually, this is the different flowers in the sorry.

32
00:02:13,470 --> 00:02:17,050
So this one is five clusters, not 17, like the original one.

33
00:02:17,140 --> 00:02:23,610
Maybe we just made this one, maybe a subset of that one, and we just set some schedule links as well

34
00:02:23,610 --> 00:02:24,960
as schedule boundaries.

35
00:02:26,640 --> 00:02:31,380
So and what these settings really are, you can take a look at them here.

36
00:02:31,380 --> 00:02:38,010
They're hyper parameters that Google has mentioned in their publication and blog post story.

37
00:02:39,000 --> 00:02:43,140
It depends on the augmentation mix of these with very nice data augmentation technique.

38
00:02:43,140 --> 00:02:47,770
However, you don't always need it depending on these parameters here.

39
00:02:47,780 --> 00:02:49,770
Dataset size and schedule linked.

40
00:02:50,250 --> 00:02:53,220
So you can see you can set these accordingly to what you want.

41
00:02:53,400 --> 00:02:57,030
So that's 500 right now, and the boundaries are two, three and 400.

42
00:02:58,260 --> 00:03:03,750
So we just passed those parameters here and we create a pre-processed function.

43
00:03:03,750 --> 00:03:10,200
We created a pre-processed test, image function when we're inferencing images and then we just create

44
00:03:10,200 --> 00:03:13,920
some repeat count using some of the schedule linked type of parameters.

45
00:03:14,580 --> 00:03:17,070
Next, we define our data pipeline.

46
00:03:17,080 --> 00:03:18,900
So let's run that there.

47
00:03:19,950 --> 00:03:23,220
And now we can just visualize some of the training samples here.

48
00:03:23,550 --> 00:03:28,680
Let's take a look from our pre-processing pipeline just to make sure it works.

49
00:03:29,430 --> 00:03:31,160
And you can see it does.

50
00:03:31,470 --> 00:03:35,220
So now we can move on to loading our network.

51
00:03:35,230 --> 00:03:40,470
So we're going to load this resonant 50 model here and the Vati model.

52
00:03:40,470 --> 00:03:43,620
So let's do it using it as a feature extractor.

53
00:03:45,930 --> 00:03:47,340
Oh, sorry, I've already done this.

54
00:03:47,340 --> 00:03:48,150
I believe so.

55
00:03:48,150 --> 00:03:49,500
That's why it's running here.

56
00:03:50,130 --> 00:03:51,240
So let's keep going.

57
00:03:51,240 --> 00:03:54,840
So now we actually create our bit class here.

58
00:03:55,530 --> 00:04:01,710
So this is where we just put a number of classes here, create our own then slate on top of it and then

59
00:04:02,220 --> 00:04:06,890
get to put the VAD model as basically what we in the beginning.

60
00:04:06,900 --> 00:04:12,720
So we get the features out of it and then change the head of network and we get back on there.

61
00:04:13,590 --> 00:04:20,880
Next, we need to defined optimizer and losses so we can see we used to schedule boundaries as a parameter

62
00:04:20,880 --> 00:04:26,790
for the boundaries here, and we're using something called peace waves constant to Q, which is not

63
00:04:28,230 --> 00:04:30,690
optimized that we have we have come across before.

64
00:04:31,230 --> 00:04:37,380
This is probably one that's specific to using B schedule boundary type of parameters and or sorry,

65
00:04:37,420 --> 00:04:38,430
this is actually it schedules.

66
00:04:38,430 --> 00:04:42,100
So it's not producing SD as a regular optimizer here.

67
00:04:42,100 --> 00:04:43,140
Apologies for that.

68
00:04:44,160 --> 00:04:46,260
Some learning, Richard, you can see here as well.

69
00:04:46,260 --> 00:04:51,720
Let me just define, I guess, it's per schedule boundary as well, because legal boundaries do seem

70
00:04:51,720 --> 00:04:54,150
to be changing the image size, I believe.

71
00:04:55,140 --> 00:04:59,460
So now we're ready to compile the model, create or callbacks early stopping.

72
00:05:00,060 --> 00:05:05,460
And now we trained a model, and you can see, even though we said that 40 bucks a train, it stops

73
00:05:05,460 --> 00:05:06,360
quite quickly.

74
00:05:07,140 --> 00:05:09,960
And that's because we have the early stopping parameters set.

75
00:05:10,770 --> 00:05:16,500
And you can see we get ninety seven percent accuracy, which is really, really good, to be honest.

76
00:05:16,770 --> 00:05:18,750
And now we can just plop the history here.

77
00:05:18,750 --> 00:05:22,710
You can see we quickly got some good results out of it.

78
00:05:23,100 --> 00:05:29,490
And that's because of its pre-trained on image net, too, which is very applicable to a flawless dataset

79
00:05:29,490 --> 00:05:31,860
because of all the plant categories inside of it.

80
00:05:32,580 --> 00:05:36,570
And we evaluated the model assuming with ninety seven point four or five percent.

81
00:05:37,800 --> 00:05:44,610
And that's it for this lesson, and you can see the conclusion here that VAT is actually performed better

82
00:05:44,610 --> 00:05:50,970
in a lot of these common like so far 10 data task and you can see it.

83
00:05:51,190 --> 00:05:52,350
And basically, it's quite good.

84
00:05:52,660 --> 00:05:53,820
You can consider it here.

85
00:05:54,420 --> 00:05:56,770
So I would encourage you to experiment.

86
00:05:56,790 --> 00:06:02,460
You can see the general state of the art and be when flowers is ninety seven point seven.

87
00:06:03,000 --> 00:06:05,160
They're able to get that nine point sixty three.

88
00:06:05,640 --> 00:06:07,590
This isn't using this model of this model.

89
00:06:07,590 --> 00:06:10,640
Give us ninety seven point forty five, which is pretty good.

90
00:06:10,650 --> 00:06:17,430
So it's very comparable to the state of the art in normal CNN model training from scratch.

91
00:06:18,030 --> 00:06:20,910
So I assume it's from scratch who could be pre-trained?

92
00:06:20,910 --> 00:06:29,820
But this big transfer framework or recipe you can call it allows us to get very, very high accuracy,

93
00:06:29,820 --> 00:06:30,600
very easily.

94
00:06:31,110 --> 00:06:34,410
So that's it for this lesson, and I'll see you in the next one.

95
00:06:34,740 --> 00:06:35,570
Thank you for watching.