1
00:00:01,560 --> 00:00:04,950
Hi and welcome to the lesson on how to train a CNN.

2
00:00:05,370 --> 00:00:10,200
Now this section is just a high level overview of the training process, and we go into much greater

3
00:00:10,200 --> 00:00:12,840
detail in the next upcoming sections.

4
00:00:13,350 --> 00:00:20,160
So firstly, let's get started and try to understand what you thought of this learn because Okung filters

5
00:00:20,160 --> 00:00:25,830
are basically the feature detectors that detect what's in the features that make up objects in an image

6
00:00:26,190 --> 00:00:28,140
so that we can tell what's in the image.

7
00:00:28,710 --> 00:00:30,180
So let's take a look at this.

8
00:00:30,630 --> 00:00:36,390
Typically, we have the early layers of our CNN learning the low level features.

9
00:00:36,570 --> 00:00:38,160
Those are things like edges.

10
00:00:38,160 --> 00:00:42,380
You can see them here little lines or little blobs as well.

11
00:00:42,390 --> 00:00:46,290
Colors that just edges edges or different angles, mostly.

12
00:00:46,770 --> 00:00:52,530
So these are the features that the low-level consultants, Linda and Linda, detect.

13
00:00:53,250 --> 00:00:54,870
Then we have the mid-level features.

14
00:00:54,900 --> 00:00:58,260
Remember, we have consecutive layers of confidence.

15
00:00:58,620 --> 00:01:02,370
And as those layers go on, they get typically bigger.

16
00:01:02,910 --> 00:01:06,810
Well, those bigger filters allow us to learn more complicated patterns.

17
00:01:06,840 --> 00:01:09,060
You can see the mid-level features.

18
00:01:09,060 --> 00:01:13,020
They're a bit more detailed and you can see something that looks like an eye.

19
00:01:13,500 --> 00:01:17,010
This one sort of looks like the corona virus, if I'm not mistaken.

20
00:01:18,150 --> 00:01:23,040
These are all two lines, but this is a groups of lines, so you can see they're leaning more complex

21
00:01:23,040 --> 00:01:23,420
patterns.

22
00:01:23,430 --> 00:01:25,260
This one is like a mesh screen you can see here.

23
00:01:26,190 --> 00:01:32,970
And then finally, this high level features those last filters before we reach the final out button,

24
00:01:33,300 --> 00:01:34,290
fully connected layers.

25
00:01:34,830 --> 00:01:37,650
They are able to detect a much more high level features.

26
00:01:38,040 --> 00:01:38,980
So look at this one.

27
00:01:38,980 --> 00:01:39,870
This is actually better.

28
00:01:40,070 --> 00:01:40,680
You can see it here.

29
00:01:40,680 --> 00:01:42,240
This one is like a honeycomb pattern.

30
00:01:42,660 --> 00:01:44,490
This one looks like a branch tree branch.

31
00:01:44,970 --> 00:01:47,550
So these are what of all converters, actually, Lynn?

32
00:01:47,700 --> 00:01:49,440
It's a pretty cool visualization, isn't it?

33
00:01:50,280 --> 00:01:55,790
So you can just go ahead and slide just basically just tells you where they are, just tells you what

34
00:01:55,800 --> 00:01:56,580
I just talked about.

35
00:01:58,050 --> 00:02:00,240
So what happens during a training process?

36
00:02:00,750 --> 00:02:07,140
Well, firstly, remember all of the parameters that we have to learn all of those widths and biases.

37
00:02:07,650 --> 00:02:12,660
Well, we initialize them in the beginning of the training process with random values.

38
00:02:13,110 --> 00:02:17,430
So because that's the neural network, the CNN, at that point, it doesn't do anything.

39
00:02:17,440 --> 00:02:20,400
It's just random values and what we do.

40
00:02:20,730 --> 00:02:22,260
We just follow it propagate.

41
00:02:22,440 --> 00:02:25,510
That means you just pass an image to it.

42
00:02:25,530 --> 00:02:31,560
So we just operate the neural network convolutional neural nets on one image or a batch of images,

43
00:02:32,520 --> 00:02:34,470
and we get the total error.

44
00:02:34,620 --> 00:02:41,070
Total error means that we will get some output even from those random values after passing the image

45
00:02:41,070 --> 00:02:41,250
in.

46
00:02:41,820 --> 00:02:43,630
It's obviously going to be wrong in the beginning.

47
00:02:43,650 --> 00:02:48,300
It's going to be random, but at least we can find a way to calculate the total error.

48
00:02:48,870 --> 00:02:56,220
And by calculating the total error, we can use a technique called back propagation to update our gradients,

49
00:02:56,700 --> 00:02:57,780
viability and descent.

50
00:02:59,310 --> 00:03:01,380
And then we continuously do this process.

51
00:03:01,710 --> 00:03:07,560
We propagate more images or a batch of images of data once again until all the images have been propagated

52
00:03:07,890 --> 00:03:10,650
and all the images meaning all images in our dataset.

53
00:03:10,980 --> 00:03:14,670
And that basically this described as one epoch.

54
00:03:16,280 --> 00:03:20,330
And then lastly, we just keep doing more books, we keep doing this over and over.

55
00:03:20,570 --> 00:03:27,470
That's why the training process for CNN or neural networks is so long because we keep having to pass

56
00:03:27,470 --> 00:03:32,180
these images over and over and over into the new role that update the weeds.

57
00:03:32,450 --> 00:03:38,480
It's a very computationally heavy desk, which is why we frequently use GPUs for this.

58
00:03:39,380 --> 00:03:41,450
So let's illustrate this concept here.

59
00:03:41,990 --> 00:03:44,000
These are images that we're going to input into.

60
00:03:44,480 --> 00:03:46,970
We have a six or five or six, eight, eight and one.

61
00:03:47,450 --> 00:03:53,450
Now imagine this is a CNN that has just been a randomly initialized with weird sort of random values.

62
00:03:55,160 --> 00:04:00,230
So we input the that's the truth, and then we get the output here.

63
00:04:00,650 --> 00:04:04,700
Now the output is going to be random values as well random probabilities.

64
00:04:05,000 --> 00:04:06,140
Well, these are the probabilities.

65
00:04:06,140 --> 00:04:08,240
Yet these are just the output.

66
00:04:08,240 --> 00:04:13,880
Scores actually probably should have put probabilities here, but nevertheless, these are just random

67
00:04:13,880 --> 00:04:18,530
unskilled, unsolved maxed scores and you can see it things to five.

68
00:04:18,680 --> 00:04:22,150
Fortunately, that's a good sign, although actually thinks that too.

69
00:04:22,190 --> 00:04:22,550
Sorry.

70
00:04:22,610 --> 00:04:23,240
Not a good sign.

71
00:04:23,720 --> 00:04:27,260
So we can just actually getting sort of five year plan to say sorry.

72
00:04:28,160 --> 00:04:32,060
So, yeah, so then we input another image.

73
00:04:32,390 --> 00:04:32,990
Let's take a look.

74
00:04:34,670 --> 00:04:36,350
We input all the batches in this case here.

75
00:04:36,680 --> 00:04:40,550
So after all, the image batches this image, each images have input.

76
00:04:40,850 --> 00:04:46,550
These are the outputs here, and you can see these are two random outputs that we get from doing this.

77
00:04:47,120 --> 00:04:55,350
Now, the point is of this is that we need to find a way to figure out how the neural network is is

78
00:04:55,370 --> 00:04:58,070
performing based on the error we get here.

79
00:04:58,160 --> 00:05:03,380
Remember, all of these are not going to be correct all the way because I didn't explain this to you

80
00:05:03,380 --> 00:05:03,830
properly.

81
00:05:04,400 --> 00:05:09,200
These here, these are the values scores that it thinks it is for each one.

82
00:05:09,470 --> 00:05:10,490
So hopefully, you remember.

83
00:05:11,780 --> 00:05:14,330
So let's take a look at something here.

84
00:05:15,440 --> 00:05:18,020
Remember, I said, we need to learn from our results.

85
00:05:18,440 --> 00:05:21,920
We need to find a way to quantify how correct our results were.

86
00:05:22,310 --> 00:05:24,410
So going back to this slide, oops.

87
00:05:25,070 --> 00:05:26,450
Sorry for the animations.

88
00:05:27,410 --> 00:05:27,700
Yeah.

89
00:05:28,070 --> 00:05:34,190
Going back to the slide, we need to find out that remember to give the highest score to the four of

90
00:05:34,190 --> 00:05:35,570
the six being a five.

91
00:05:36,170 --> 00:05:37,850
That's what this means is six means it read.

92
00:05:38,030 --> 00:05:42,170
Put you use six as the input and the output score was high.

93
00:05:42,170 --> 00:05:42,740
Score five.

94
00:05:43,220 --> 00:05:43,910
That's wrong.

95
00:05:44,090 --> 00:05:49,820
But we need to find a way to quantify how wrong that score is because the correct values point six.

96
00:05:50,720 --> 00:05:51,590
So let's move on.

97
00:05:54,480 --> 00:06:00,810
So, yes, as I said, we need to quantify how correct our results are and then we need to find a way

98
00:06:00,810 --> 00:06:02,220
to tell the model to do better.

99
00:06:02,760 --> 00:06:08,280
Now you may have remembered that way is true back propagation, which we'll get into shortly.

100
00:06:08,790 --> 00:06:15,060
However, the first thing we need to do is create a loss function, and that loss function basically

101
00:06:15,060 --> 00:06:18,540
is a measure of the error of produce from our neural network.

102
00:06:18,960 --> 00:06:24,720
So that's where we stop now, and I'll talk about lost functions and the importance in the next section.

103
00:06:25,170 --> 00:06:25,610
Thank you.