1
00:00:00,570 --> 00:00:02,860
Welcome back to the course in this lecture.

2
00:00:02,880 --> 00:00:06,390
We'll take a closer look on how you'll actually works.

3
00:00:06,960 --> 00:00:07,770
So let's dive in.

4
00:00:07,980 --> 00:00:12,840
So firstly, your splits the image into a seven by seven grid.

5
00:00:13,470 --> 00:00:16,150
This is important and this is also configurable.

6
00:00:16,170 --> 00:00:19,500
But for now, we'll use seven by seven for teaching purposes.

7
00:00:20,130 --> 00:00:22,530
So not we have a seven by seven grid.

8
00:00:22,710 --> 00:00:23,550
What does that do?

9
00:00:23,610 --> 00:00:26,510
Well, that splits the image up into different cells.

10
00:00:26,520 --> 00:00:29,760
We have 49 different cells here and what we get.

11
00:00:30,120 --> 00:00:37,710
We run a model, a neural network over it and itself gives us, predicts B bounding boxes.

12
00:00:38,160 --> 00:00:44,640
And we also get the confidence of the probability score for each bounding box here having an object.

13
00:00:45,120 --> 00:00:51,780
So for each cell here, we can be found in boxes and you can see the bounding boxes are given by this

14
00:00:51,780 --> 00:00:54,600
center coordinate x way and then their width and their height.

15
00:00:55,260 --> 00:00:58,380
And then we get the confidence for each box having an object.

16
00:00:58,380 --> 00:01:03,090
We don't get the confidence for the object being a dog or a bicycle or call a tree.

17
00:01:03,090 --> 00:01:05,100
Yet we just get the confidence.

18
00:01:05,100 --> 00:01:07,290
I don't eat an object is there or not there?

19
00:01:07,320 --> 00:01:08,370
It's p of object.

20
00:01:09,510 --> 00:01:15,390
Next, we generate all the bowling boxes so you can see we will have tons of fun then boxes for an image

21
00:01:15,390 --> 00:01:19,590
like this, as well as we get the probabilities for each box having an image.

22
00:01:20,640 --> 00:01:21,450
So that's fine.

23
00:01:22,080 --> 00:01:25,950
Next, we take a look at getting to class probability per cell.

24
00:01:26,640 --> 00:01:32,210
So for each, so we run that sure classifier and we get a positive score out of it.

25
00:01:32,250 --> 00:01:37,590
So as you can see for an image like this, we're going to get bicycle, bicycle, bicycle in this pink

26
00:01:37,590 --> 00:01:38,900
region here in the green region.

27
00:01:38,910 --> 00:01:42,270
We're going to get a dog because it can see a dog was in that period there.

28
00:01:42,960 --> 00:01:47,040
Then over here, you're going to get caught on over here, you're going to get dining table.

29
00:01:47,460 --> 00:01:50,340
Apparently, I don't know why, but I guess it's because of the wood.

30
00:01:51,570 --> 00:01:53,670
So moving on, you can see how this is.

31
00:01:53,670 --> 00:01:58,050
The probability class probably scores for each cell, how they actually look.

32
00:01:58,830 --> 00:02:00,780
But what do we have to do next?

33
00:02:00,900 --> 00:02:07,380
Well, next before remember, we had the probability of the bounding boxes that we proposed for each

34
00:02:07,380 --> 00:02:07,770
region.

35
00:02:08,250 --> 00:02:10,980
We have the probability of it being having an object or not.

36
00:02:11,490 --> 00:02:15,300
And then we also have the probability, the class probability piso.

37
00:02:15,960 --> 00:02:20,670
So now we can basically get we can combine them to get our class predictions.

38
00:02:20,670 --> 00:02:24,060
So that's so we can also filter out one predictions that are low.

39
00:02:24,060 --> 00:02:29,010
So if you had a low object score or low confidence, go for that image.

40
00:02:29,400 --> 00:02:32,790
You can drop that box so you can immediately clean up some boxes here.

41
00:02:32,830 --> 00:02:35,550
However, this is actually a cleaned up version right here.

42
00:02:35,580 --> 00:02:38,040
This isn't all the dummy boxes initially that we got.

43
00:02:39,210 --> 00:02:45,480
So what we have to do next is that we have to actually use null maximum suppression and also trash all

44
00:02:45,490 --> 00:02:47,310
the detections which I mentioned before.

45
00:02:47,760 --> 00:02:51,480
So actually, when I said we, we we didn't actually threshold these predictions here.

46
00:02:51,480 --> 00:02:56,340
Yet actually, you can go back and see these are the exact same bungling boxes we propose initially

47
00:02:57,180 --> 00:02:59,620
and now we have the class probabilities combined with it.

48
00:03:00,000 --> 00:03:03,950
But then we drop the ones that are below a threshold after we dropped it off.

49
00:03:03,960 --> 00:03:04,980
We calculate this here.

50
00:03:05,610 --> 00:03:11,340
And we also simultaneously use non maximum suppression, which you've seen before, and aims to basically

51
00:03:11,340 --> 00:03:13,590
get the final bounding box positions here.

52
00:03:14,070 --> 00:03:16,050
So this is what the final output looks like.

53
00:03:17,010 --> 00:03:23,250
So actually, this is what the real output looks like here because we have a seven by seven grid.

54
00:03:23,640 --> 00:03:25,980
We also have the probability of an object here.

55
00:03:26,070 --> 00:03:27,330
That's a p of the object.

56
00:03:27,840 --> 00:03:33,900
We have x y which height and we have that for all the different bounding boxes here.

57
00:03:34,350 --> 00:03:36,990
So you can see this is what the output looks like.

58
00:03:37,200 --> 00:03:38,850
A space is seven by seven.

59
00:03:39,390 --> 00:03:45,540
So we can take a look at the data and understand deeper what the other output is just giving us.

60
00:03:48,180 --> 00:03:54,430
And this here is the inference overview of you can see how it goes in the image comes in here.

61
00:03:54,450 --> 00:03:57,240
This is in different sizes as it passes through the network.

62
00:03:57,720 --> 00:04:00,570
And this is the output block we get right here.

63
00:04:01,800 --> 00:04:07,800
So that's it for this quick overview of how these little works that will help you understand the next

64
00:04:07,800 --> 00:04:11,160
section where we talk about training over your role models.

65
00:04:11,580 --> 00:04:13,020
So stay tuned for that lesson.

66
00:04:13,140 --> 00:04:14,370
Thank you very much for watching.