1
00:00:01,110 --> 00:00:02,190
Hey, welcome back.

2
00:00:02,910 --> 00:00:08,790
Remember, we had a section on COVAX a while back callbacks were methods we can run during a trading

3
00:00:08,790 --> 00:00:14,310
process so we can print different things, we can load checkpoints, we can implement early stopping

4
00:00:14,820 --> 00:00:15,870
and a few other things.

5
00:00:16,260 --> 00:00:21,450
So we'll take a look and see how we can use that, how we can implement this in PyTorch lightning.

6
00:00:21,990 --> 00:00:26,070
So this is just an illustration of really stopping this graph that I've used before.

7
00:00:26,700 --> 00:00:30,960
So let's take a look at Model Checkpoint 10 in pretty much lightning.

8
00:00:31,440 --> 00:00:36,600
So a model checkpoint as you call back is used to save a model, which at some interval, which we can

9
00:00:36,600 --> 00:00:43,590
define so that the model waits can be little loaded for continuing training or just to just run to use

10
00:00:43,590 --> 00:00:47,370
those, we had to run some inferences and evaluate the model at a time.

11
00:00:47,820 --> 00:00:54,090
So all you have to do is just import early, stopping here from the early stop from PyTorch lightning

12
00:00:54,090 --> 00:00:57,900
COVAX ili stopping and it and really stopping function.

13
00:00:57,900 --> 00:01:03,450
Here we just specify what we want to monitor the patients, which is how many epochs we want to wait

14
00:01:03,780 --> 00:01:08,730
before stopping strict and verbose other strategies that we can use.

15
00:01:09,120 --> 00:01:11,220
The little boys adjust the output strict as a strategy.

16
00:01:11,220 --> 00:01:14,540
We can use whatever we want it to strictly do this or not.

17
00:01:14,550 --> 00:01:19,380
I'm not entirely sure how it's configured, but you can dig into the documentation and take a look.

18
00:01:19,950 --> 00:01:27,000
Mode is equal to minimum, which allows yes, which means that we're looking at a minimum loss or another

19
00:01:27,000 --> 00:01:27,690
maximum loss.

20
00:01:27,690 --> 00:01:32,010
So you can just specify like if you're looking at accuracy, this would need to be set to MAX.

21
00:01:32,160 --> 00:01:33,870
So let's run this block.

22
00:01:35,220 --> 00:01:35,700
There we go.

23
00:01:36,300 --> 00:01:38,460
And now let's look at a model check point one.

24
00:01:38,580 --> 00:01:43,140
So again, from the PyTorch torch callbacks, we import model checkpoint.

25
00:01:43,590 --> 00:01:46,560
We again specify we want to monitor validation loss.

26
00:01:47,400 --> 00:01:53,760
We said the part we want to save on models, do we set a file name and notice that we can actually put

27
00:01:53,760 --> 00:01:54,830
variables here as well?

28
00:01:54,830 --> 00:02:01,860
These curly brackets indicate we're putting the epoch here as two decimal places validation laws as

29
00:02:01,860 --> 00:02:02,250
well.

30
00:02:02,250 --> 00:02:07,830
So you can actually see the file name, pretty descriptive file name as well for when you you're saving

31
00:02:07,830 --> 00:02:08,520
the sequence.

32
00:02:09,060 --> 00:02:12,660
And we can save the top three models if we want to save that.

33
00:02:13,080 --> 00:02:16,770
And again, we set them more to minimum because we're looking at a loss, we want a minimum loss, not

34
00:02:16,770 --> 00:02:17,310
a maximum.

35
00:02:18,000 --> 00:02:21,180
Again, if this was the validation accuracy, you would use max here.

36
00:02:21,510 --> 00:02:27,150
So let's run this and create our checkpoint callback and our early stop callback.

37
00:02:27,750 --> 00:02:28,170
No.

38
00:02:28,320 --> 00:02:29,430
What do we need to do here?

39
00:02:30,090 --> 00:02:35,130
Note what we're doing actually right now is just creating a printing callback, which just allows us

40
00:02:35,130 --> 00:02:40,820
to print outputs of different things during the training so we can just simply do it through this here.

41
00:02:40,890 --> 00:02:45,780
So this inherits from this class here and this basically on the first start.

42
00:02:45,780 --> 00:02:49,710
We're just printing this here on the first, on the end of it.

43
00:02:50,160 --> 00:02:55,660
We are printing this now and then on the training and as well, we're printing this here.

44
00:02:55,680 --> 00:03:00,660
So these are just different functions that we can use to print different things at different times.

45
00:03:00,660 --> 00:03:03,870
And all we do is just print here, however you can do.

46
00:03:04,650 --> 00:03:11,110
You can have any kind of code you want here to run basically on startup, on the end or at the training

47
00:03:11,160 --> 00:03:11,400
end.

48
00:03:11,880 --> 00:03:14,340
So let's create this printing callback.

49
00:03:15,450 --> 00:03:20,940
I know this this one, there's a class here and we'll be passing, and now let's see how we train with

50
00:03:20,940 --> 00:03:21,510
callbacks.

51
00:03:21,990 --> 00:03:24,390
So again, we specify a batch size and learning rate.

52
00:03:24,390 --> 00:03:26,300
Even though we won't be using those values.

53
00:03:26,310 --> 00:03:28,440
Those are just values that we need to send in.

54
00:03:29,250 --> 00:03:34,230
And because we're using the automatic selected that size and automatically selected learning, right?

55
00:03:35,220 --> 00:03:35,760
We do.

56
00:03:36,270 --> 00:03:40,650
We created a trainer and we just do p one trainer, APL sorry, trainer.

57
00:03:41,100 --> 00:03:44,640
Specify We're using one GPU if you want is all we have available.

58
00:03:45,090 --> 00:03:51,780
Set our maximum number of epochs set of progress, bar refresh rate and then specify all the callbacks

59
00:03:51,780 --> 00:03:55,920
that we want to use right here in this array, just like in Keros how we did it.

60
00:03:56,430 --> 00:04:00,960
And then once this training object has been configured for those parameters we can do trainer thought

61
00:04:00,960 --> 00:04:03,800
fit model and that's about it.

62
00:04:03,810 --> 00:04:04,920
And then it starts training.

63
00:04:04,920 --> 00:04:10,620
It prints out the model summary here and you can see it's training right now, so we can just wait for

64
00:04:10,620 --> 00:04:12,050
this as well.

65
00:04:12,090 --> 00:04:13,890
Can we observe anything?

66
00:04:32,280 --> 00:04:37,530
All right, so are executing the values from day and then we move on to the next epoch just like we

67
00:04:37,530 --> 00:04:38,160
did before.

68
00:04:38,490 --> 00:04:39,540
So this is pretty cool.

69
00:04:39,900 --> 00:04:46,350
And you can see here we just printed out the 3D print printing things that we wanted to start here.

70
00:04:46,810 --> 00:04:49,980
It's starting to initialize train and the train is now initialized.

71
00:04:50,790 --> 00:04:54,090
And you can see those results here so that it's pretty cool.

72
00:04:55,290 --> 00:04:56,130
So that's it.

73
00:04:56,130 --> 00:05:00,160
For now, you can actually take a look at this in the intensive intensive board after it's finished.

74
00:05:00,480 --> 00:05:05,820
And unfortunately, we can't run it live like you would on the local machine, where you can run tens

75
00:05:05,820 --> 00:05:09,730
of boards separately and observe logs while it trains.

76
00:05:09,750 --> 00:05:14,460
It doesn't work in CoLab, although there's a way to make it work, but it doesn't actually work naturally.

77
00:05:14,460 --> 00:05:15,750
Like you'd expect, it's here.

78
00:05:17,460 --> 00:05:19,020
So now let's take a look at this.

79
00:05:19,290 --> 00:05:24,960
This is how we can restore from checkpoint to simply do a checkpoint callback, which is the object.

80
00:05:24,960 --> 00:05:32,010
We created the best model part, and it returns to part of the best Mod. It's quite convenient, so

81
00:05:32,010 --> 00:05:37,290
now we can load that model here, but just using the best model parts in this function when we want

82
00:05:37,290 --> 00:05:38,490
to run some inferences.

83
00:05:38,970 --> 00:05:42,450
So this is how we do it on model so we can do a lightning model.

84
00:05:42,960 --> 00:05:49,650
Dart lowered from checkpoint specified that size, learning rate and a part which is given by this line

85
00:05:49,650 --> 00:05:50,070
here.

86
00:05:51,120 --> 00:05:58,140
And then we go the model so we can set the model to if you could change it to evaluation mode and then

87
00:05:58,140 --> 00:06:00,300
freeze it and then we can run it as well.

88
00:06:01,140 --> 00:06:03,660
So now let's take a look at number nine here.

89
00:06:04,530 --> 00:06:10,620
This is how we save our model for production deployments so we can export our models here to a total

90
00:06:10,620 --> 00:06:15,030
script model, which can be used in different inference engines.

91
00:06:15,390 --> 00:06:17,880
We can use it probably on mobile as well.

92
00:06:17,890 --> 00:06:19,920
You can use it on different servers.

93
00:06:19,920 --> 00:06:23,510
You can convert it to an end video deep stream model as well.

94
00:06:23,530 --> 00:06:27,870
The number of things you can do adopt models with the two and an X model.

95
00:06:28,320 --> 00:06:32,960
All of those are model, the model type formats that can be run on different systems.

96
00:06:32,970 --> 00:06:38,010
There isn't a full standard all wheel, and an X is supposed to be the standard for deep learning models.

97
00:06:38,520 --> 00:06:43,680
But these things change quite quickly, and the way of it is what it is.

98
00:06:43,680 --> 00:06:48,120
It's not a very easy field to be in because of how fast things are changing.

99
00:06:48,570 --> 00:06:54,780
But for now, P2 models of what script models, which are very versatile and can be used in a number

100
00:06:54,780 --> 00:06:56,040
of inference engines.

101
00:06:56,460 --> 00:07:02,520
So to convert it, all you have to do is load a model here, then do Model two toward script script

102
00:07:02,730 --> 00:07:04,980
and gives you the script violence variable called script.

103
00:07:05,670 --> 00:07:08,760
And then we just see that using torched object that save.

104
00:07:09,240 --> 00:07:11,880
And we just specify the file name here and the path.

105
00:07:12,370 --> 00:07:13,320
And that's basically it.

106
00:07:14,430 --> 00:07:17,010
So now let's take a look at how we can use that model.

107
00:07:17,010 --> 00:07:19,860
We saved there to inference now.

108
00:07:20,310 --> 00:07:23,330
So we just import too much vision support live.

109
00:07:23,340 --> 00:07:29,190
And no, even though this may have been an imported previously, we get OGI, we set our device to GPU

110
00:07:29,190 --> 00:07:32,370
if it's available GPU, meaning CUDA in this case.

111
00:07:33,090 --> 00:07:34,350
And then what do we do?

112
00:07:34,560 --> 00:07:36,300
We just take the validate.

113
00:07:36,310 --> 00:07:42,990
Take 32 images from the validation data set by using converting it to a table and then getting the sample

114
00:07:43,020 --> 00:07:44,430
something out of it.

115
00:07:44,910 --> 00:07:49,770
We take the images here, not liberals, because we don't want to see the labels right now, and we

116
00:07:49,770 --> 00:07:53,280
put those images onto CUDA and onto the GPU.

117
00:07:53,910 --> 00:07:59,910
Then we just to create or plot that we're going to bring tight lipped because we want it to be nice

118
00:07:59,910 --> 00:08:00,570
and compact.

119
00:08:01,290 --> 00:08:04,350
Then we want to show you the results just yet.

120
00:08:05,160 --> 00:08:10,350
Then what we do, we take the pre-trained model that we've loaded that was loaded up here.

121
00:08:11,010 --> 00:08:17,550
And then what we do, we just do the regular stuff where we get the max class probability outputs and

122
00:08:17,760 --> 00:08:18,270
we prayed.

123
00:08:18,900 --> 00:08:24,810
Then we just loop through it here and get to have a lookup dictionary for its here.

124
00:08:24,810 --> 00:08:30,540
So when we're actually putting it here, we can actually see the class name itself.

125
00:08:31,170 --> 00:08:34,980
It's all quite simple and we can just visualize our predictions right there.

126
00:08:35,850 --> 00:08:36,870
So that's pretty cool.

127
00:08:36,930 --> 00:08:38,610
So I'll stop there for now.

128
00:08:39,030 --> 00:08:44,130
And in the next section, we'll take a look at how we can use Page to watch lightning to do multi-GPU

129
00:08:44,220 --> 00:08:44,670
training.

130
00:08:44,960 --> 00:08:47,340
So that's a pretty cool feature that it offers.

131
00:08:47,880 --> 00:08:48,810
So I'll see you there.

132
00:08:48,870 --> 00:08:49,290
Thank you.