1
00:00:11,640 --> 00:00:17,710
In this lecture we are going to do an example of classification and code using pi torch.

2
00:00:18,360 --> 00:00:24,930
Our problem for this lecture will be to predict whether a patient's diagnosis of breast tissue is malignant

3
00:00:24,960 --> 00:00:26,460
or benign.

4
00:00:26,460 --> 00:00:31,410
This is a classic application of machine learning and deep learning especially has made great progress

5
00:00:31,410 --> 00:00:32,910
in this field.

6
00:00:32,910 --> 00:00:36,620
This lecture is going to walk you through a prepared collab notebook.

7
00:00:36,810 --> 00:00:42,750
Although a very good exercise which I always recommend is once you know how this is done to try and

8
00:00:42,750 --> 00:00:51,280
recreate it yourself with as few references as possible as usual you can look at the title of the notebook

9
00:00:51,550 --> 00:00:53,830
to determine what notebook we are currently looking at.

10
00:00:56,670 --> 00:01:02,570
To start we're going to have all our usual imports torch towards that and in number pi and map plot

11
00:01:02,580 --> 00:01:10,370
lib.

12
00:01:10,420 --> 00:01:12,740
Next we are going to load in the data.

13
00:01:12,950 --> 00:01:14,570
This dataset is part of psyche.

14
00:01:14,590 --> 00:01:20,710
Learn so we're going to import the function load breast cancer from the module as killer and that data

15
00:01:20,720 --> 00:01:25,660
sets.

16
00:01:25,680 --> 00:01:35,330
Next we're going to call this function and assign it to a variable called data.

17
00:01:35,330 --> 00:01:40,230
Next we're going to check the type of the data variable since it's not clear what it returns.

18
00:01:40,340 --> 00:01:43,190
We can see that it returns an object of type a bunch

19
00:01:47,360 --> 00:01:49,390
luckily I've already inspected this object.

20
00:01:49,420 --> 00:01:55,460
So I know that it acts kind of like a dictionary in that way we can call the function keys and it's

21
00:01:55,460 --> 00:02:03,500
going to show us all the keys that we can access from this object so we see data Target Target names

22
00:02:03,770 --> 00:02:12,290
description feature names and file name so the first two keys data and target refer to the input data

23
00:02:12,320 --> 00:02:15,200
X and the targets y respectively.

24
00:02:15,290 --> 00:02:19,670
So I'm going to call data dot data that shape to check the shape of X

25
00:02:23,320 --> 00:02:23,590
okay.

26
00:02:23,600 --> 00:02:26,020
So we see that it's five sixty nine by 30.

27
00:02:26,240 --> 00:02:30,200
So that means we have five hundred and sixty nine samples in 30 input features

28
00:02:33,370 --> 00:02:33,660
next.

29
00:02:33,670 --> 00:02:41,540
I'm going to print out the targets here so as you can see it returns a one dimensional array of zeros

30
00:02:41,570 --> 00:02:43,010
and ones as expected

31
00:02:46,280 --> 00:02:52,100
now luckily even though the targets are just zeros and ones we can still determine what they mean by

32
00:02:52,100 --> 00:02:53,270
printing out data.

33
00:02:53,330 --> 00:02:56,560
Target names as promised.

34
00:02:56,590 --> 00:03:00,970
They stand for malignant and benign.

35
00:03:00,990 --> 00:03:06,360
Next we can confirm that the length of y is equal to the length of X 569

36
00:03:12,300 --> 00:03:13,140
next.

37
00:03:13,170 --> 00:03:20,300
Note that we can also print out the input feature names as well so we can see things like mean radius

38
00:03:20,360 --> 00:03:23,740
mean texture mean perimeter and so forth.

39
00:03:23,780 --> 00:03:28,760
Of course you're free to look up this dataset online if you want to learn more about it but remember

40
00:03:28,760 --> 00:03:38,320
that we would just like to treat this as a geometry problem.

41
00:03:38,320 --> 00:03:41,730
Next we're going to split the data into train and test.

42
00:03:41,770 --> 00:03:45,450
You'll notice that I'm importing things as we go along since that helps tell the story.

43
00:03:45,880 --> 00:03:51,740
But in general you'll find that it's more conventional to put all the imports at the top.

44
00:03:51,810 --> 00:03:57,930
So here we're going to import the function train test split from the module scalar and model selection

45
00:03:58,770 --> 00:04:05,430
then we're going to call this function and make the test set 33 percent of the original data set.

46
00:04:05,490 --> 00:04:13,080
This returns X train x test y train and Y tests then we're going to assign the shape of X train to the

47
00:04:13,080 --> 00:04:16,680
variables in indeed which may or may not be useful later

48
00:04:21,300 --> 00:04:21,890
next.

49
00:04:21,900 --> 00:04:24,190
And this is important in deep learning.

50
00:04:24,300 --> 00:04:30,840
We want to scale our data the basic idea behind this is that because the output is a linear combination

51
00:04:30,840 --> 00:04:37,230
of the input you don't want one inputs have a very large range say one million and another input to

52
00:04:37,230 --> 00:04:41,340
have a very small range like say zero point zero zero one.

53
00:04:41,400 --> 00:04:47,250
If this happens then the weights will be too sensitive when the input has a large range and not sensitive

54
00:04:47,250 --> 00:04:50,400
enough when the input has a small range.

55
00:04:50,460 --> 00:04:55,400
So the typical way to deal with this is a normalization or standardization.

56
00:04:55,560 --> 00:05:03,080
Basically this just means subtracting the mean and dividing by the standard deviation luckily circular

57
00:05:03,090 --> 00:05:10,260
and already comes with a class for this called standard scalar so we fit transform on X train and then

58
00:05:10,260 --> 00:05:12,560
we transform on X test.

59
00:05:12,600 --> 00:05:17,360
Remember that it's important not to expose any of our test data to the training pipeline

60
00:05:24,290 --> 00:05:24,710
next.

61
00:05:24,710 --> 00:05:28,020
It's time to do our PI torture work as you saw earlier.

62
00:05:28,040 --> 00:05:34,730
We first need to build a model object which is now a sequence of modules it takes in a list of two modules

63
00:05:34,820 --> 00:05:40,710
linear and sigmoid the linear layer has an input size D and an output size 1.

64
00:05:40,880 --> 00:05:51,290
As mentioned we want to apply the sigmoid activation so that the outputs are in the range 0 and 1.

65
00:05:51,310 --> 00:05:53,760
Next we are going to prepare to train our model.

66
00:05:53,950 --> 00:05:59,740
So we'll need a loss and an optimizer the loss we're going to use is the binary cross entropy loss or

67
00:05:59,740 --> 00:06:04,630
BCE loss which is the appropriate laws for binary classification.

68
00:06:04,630 --> 00:06:09,930
The optimizer will use is called atom which is a common default chosen by deep learning practitioners

69
00:06:15,400 --> 00:06:21,400
next we're going to convert our data into torture tenses before we enter our training loop as you've

70
00:06:21,400 --> 00:06:22,430
seen before.

71
00:06:22,450 --> 00:06:25,910
This involves calling the function at torchlight from num pi.

72
00:06:25,960 --> 00:06:31,710
In addition we still need to convert all the data to flow 30 to one detail you may have missed.

73
00:06:31,720 --> 00:06:37,810
If you're not accustomed to PI torch is that we want to reshape the targets to be a 2D array of size

74
00:06:37,810 --> 00:06:44,230
n by 1 instead of just a one B array of length then in other libraries such as site yet learned carries

75
00:06:44,340 --> 00:06:48,310
and tensor flow a 1 the array is appropriate but that's not the case.

76
00:06:48,320 --> 00:06:54,760
And Patrick.

77
00:06:54,780 --> 00:07:00,330
Next we have our main training loop as promised we're not going to spend much time explaining this because

78
00:07:00,330 --> 00:07:03,050
it's pretty much exactly the same as before.

79
00:07:03,080 --> 00:07:08,210
There is one small additional detail here but it's one that doesn't require any new knowledge.

80
00:07:08,280 --> 00:07:14,340
Now that we have both a train and test set we want to be clear about which data set we use where.

81
00:07:14,340 --> 00:07:19,740
So when we do the training portion meaning a step of gradient descent will want to use the train set

82
00:07:20,010 --> 00:07:21,680
X train and Y train.

83
00:07:21,690 --> 00:07:26,880
This involves doing a forward pass calculating the loss calling lost not backward and then calling the

84
00:07:26,880 --> 00:07:30,850
optimizer step so you can see all these here.

85
00:07:31,000 --> 00:07:37,690
So we pass X train into model to get outputs we pass y train and outputs to get the loss we lost that

86
00:07:37,690 --> 00:07:39,100
backward and then optimize it.

87
00:07:39,110 --> 00:07:46,830
Step but on top of that we would still like to know the test lost per iteration.

88
00:07:46,880 --> 00:07:49,880
This will tell us whether or not our model is over fitting.

89
00:07:49,880 --> 00:07:54,830
For example if we see the test loss increasing during training then we know that we are over fitting

90
00:07:54,830 --> 00:07:57,580
our model to calculate the Test loss.

91
00:07:57,590 --> 00:08:02,180
We do some of the same steps but with the test data instead of the train data.

92
00:08:02,480 --> 00:08:07,230
So we do the forward pass but we pass in x test instead of X train.

93
00:08:07,730 --> 00:08:12,310
Then we calculate the criterion with the test outputs against y test.

94
00:08:12,320 --> 00:08:26,410
Finally we save both the train loss and the test lost in our arrays of losses.

95
00:08:26,410 --> 00:08:28,270
Next we check our loss per iteration

96
00:08:32,610 --> 00:08:37,320
everything appears to look good and the train loss is not too different from the test loss which means

97
00:08:37,320 --> 00:08:38,640
our model is not over fitting

98
00:08:46,010 --> 00:08:46,360
next.

99
00:08:46,420 --> 00:08:48,840
We check the accuracy of our model.

100
00:08:49,000 --> 00:08:53,830
Remember that unlike with regression the metric we care about is very different from the lost function

101
00:08:53,830 --> 00:08:56,680
we minimized in order to train our network.

102
00:08:56,740 --> 00:09:02,020
Accuracy is simply the number we got right divided by the total number of predictions.

103
00:09:02,230 --> 00:09:08,110
In order to get the train predictions we first pass X train into the model which gives us P train.

104
00:09:08,290 --> 00:09:14,620
We want to convert the output which is a torch tensor into an umpire Ray by calling the NUM pi function.

105
00:09:14,710 --> 00:09:19,990
These outputs are still probabilities so we want to call NPD out round in order to convert them all

106
00:09:19,990 --> 00:09:21,760
to zeros and ones.

107
00:09:21,940 --> 00:09:27,790
Once we have our predictions we can calculate the accuracy by doing a point wise comparison with why

108
00:09:27,790 --> 00:09:29,910
train and taking the mean.

109
00:09:29,950 --> 00:09:33,640
Now this calculation often confuses beginners.

110
00:09:33,640 --> 00:09:36,100
How does this calculate the accuracy.

111
00:09:36,100 --> 00:09:40,060
I would encourage you to do a few simple examples in the console on your own.

112
00:09:40,060 --> 00:09:45,000
If you don't get it right away remember that equals equals returns True or false.

113
00:09:45,190 --> 00:09:51,340
You'll get true if two items are equal and false if not this operation happens element wise.

114
00:09:51,340 --> 00:09:57,580
So if y train and P train are arrays of length n then doing equals equals will give us an array of like

115
00:09:57,580 --> 00:10:05,570
then that contains all truths and forces so what happens when you take the mean of an array containing

116
00:10:05,600 --> 00:10:06,840
truths and falsities.

117
00:10:06,860 --> 00:10:08,680
What does that even mean.

118
00:10:08,690 --> 00:10:14,870
Usually it only makes sense to calculate the mean for numbers and clearly true and false are not numbers.

119
00:10:14,870 --> 00:10:19,700
However in Python true is treated like one and false is treated like zero.

120
00:10:19,760 --> 00:10:23,830
So if you try one equals equals true in the console that returns true.

121
00:10:23,910 --> 00:10:28,650
And if you try 0 equals equals false in the console that also returns true.

122
00:10:28,790 --> 00:10:32,820
In other words this is really like an array of zeros and ones.

123
00:10:33,090 --> 00:10:37,540
We have a 1 whenever you got the correct answer and 0 whenever we did not.

124
00:10:37,760 --> 00:10:43,080
You should convince yourself that calculating the accuracy is the same as just adding up all the ones

125
00:10:43,110 --> 00:10:47,120
and dividing by n which is also the same as taking the mean.

126
00:10:47,250 --> 00:10:52,530
If you don't get it right away write it down on paper next.

127
00:10:52,580 --> 00:10:56,610
We do the same thing for the test set to get the test accuracy.

128
00:10:56,750 --> 00:11:02,260
The result is that we get about 98 percent accuracy on both the train and test step which is very good.

129
00:11:07,290 --> 00:11:08,360
As an exercise.

130
00:11:08,390 --> 00:11:13,290
Here is a nice thing to try and he intensive flow when you do a training loop.

131
00:11:13,410 --> 00:11:16,250
You will actually get two arrays of data back.

132
00:11:16,260 --> 00:11:21,630
One is the loss per iteration for both the train and test set and one is the accuracy per iteration

133
00:11:21,630 --> 00:11:23,660
for both the train and test set.

134
00:11:23,700 --> 00:11:27,310
You'll notice that we did not plot the accuracy per iteration here.

135
00:11:27,510 --> 00:11:33,540
So for the exercise you should try to modify this code so that during the training loop we not only

136
00:11:33,540 --> 00:11:39,840
collect the loss at each iteration but also the accuracy at each iteration when the loop is done.

137
00:11:39,870 --> 00:11:45,420
You should be able to plot the accuracy per iteration you should see that as the loss goes down the

138
00:11:45,420 --> 00:11:46,990
accuracy goes up.

139
00:11:47,010 --> 00:11:52,680
This is usually the case you might want to also think about what are the downsides of calculating the

140
00:11:52,680 --> 00:11:55,110
accuracy at each iteration of the loop.