1
00:00:00,330 --> 00:00:06,780
Hi and welcome to section territory where we took a look at using Yellow Bridge entry in Open City.

2
00:00:06,900 --> 00:00:11,370
So let's open this notebook and let's load off libraries and the functions.

3
00:00:11,370 --> 00:00:17,160
And before we get started, we have two more sessions open, so let's close a bunch of them that we're

4
00:00:17,160 --> 00:00:23,460
no longer using, which are you expect you to do if you're doing this with consecutive lessons?

5
00:00:24,270 --> 00:00:27,000
So let's take a look at what YOLO is.

6
00:00:27,690 --> 00:00:35,730
YOLO is a very cool, and because it's so good at object object detection, it's an object detector

7
00:00:36,090 --> 00:00:37,290
of sounds for you.

8
00:00:37,290 --> 00:00:40,890
Only look once, which means it's a single stage.

9
00:00:40,890 --> 00:00:47,040
Object detector will do object detection detectors much later on in this course, because it's quite

10
00:00:47,040 --> 00:00:51,360
a deep topic under many different object detectors out right now.

11
00:00:51,390 --> 00:00:53,300
But YOLO is perhaps the best.

12
00:00:53,310 --> 00:00:54,600
It's my favorite one to use.

13
00:00:54,870 --> 00:00:57,990
Gives me the best performance almost every time I use it.

14
00:00:58,740 --> 00:01:06,660
And well, what we have here now is just a very easy way to load a YOLO model in open.

15
00:01:07,110 --> 00:01:11,850
No, I believe right now you'll end up, and it only supports up to elevation tree models.

16
00:01:12,330 --> 00:01:18,180
However, I do believe that the guys at ultraorthodox have made some changes or have made some contributions

17
00:01:18,180 --> 00:01:21,000
to the open TV library that allows it to load it.

18
00:01:21,090 --> 00:01:23,100
You'll have five models, which would be amazing.

19
00:01:24,060 --> 00:01:27,810
I don't think it's ready yet, but it might be by the time you're starting this course.

20
00:01:28,710 --> 00:01:33,750
So let's take a look at the what would we do as a human model in open TV.

21
00:01:33,930 --> 00:01:39,500
So to do that, I have some steps involved here which should be in order.

22
00:01:40,860 --> 00:01:42,780
It's not really that complicated here.

23
00:01:44,570 --> 00:01:49,050
This is the same here, so it's already completed, fully complete, actually.

24
00:01:49,090 --> 00:01:52,080
It's only because this line here it runs.

25
00:01:53,040 --> 00:01:54,050
Just an example could.

26
00:01:55,230 --> 00:01:57,330
Why didn't we find the model now?

27
00:01:59,490 --> 00:02:00,200
Take a look.

28
00:02:02,690 --> 00:02:06,970
They will not run this, we didn't run that badly.

29
00:02:07,310 --> 00:02:14,330
So that's your model, and you can see it shouldn't zip everything dear.

30
00:02:14,990 --> 00:02:16,100
Shouldn't take too long.

31
00:02:16,150 --> 00:02:19,240
Well, it's a few hundred makes it leave and yeah, we do.

32
00:02:19,250 --> 00:02:22,280
So we have you and the role model and images.

33
00:02:22,400 --> 00:02:23,870
So now this should work.

34
00:02:26,030 --> 00:02:26,600
So there we go.

35
00:02:27,020 --> 00:02:32,330
So what this code illustrates here is just us losing the model and inspecting the model.

36
00:02:32,960 --> 00:02:37,520
So we load the rebel limbs pointing to this cookie with our names.

37
00:02:37,520 --> 00:02:41,090
Now, cuckoo is some sort of common object dataset.

38
00:02:41,540 --> 00:02:43,830
It's one of the three, actually.

39
00:02:43,840 --> 00:02:47,960
Actually, it's the image that which you don't know an image that is yet to explain that to you and

40
00:02:47,960 --> 00:02:55,160
little slides, but image, net and cocoa all these vast datasets for images.

41
00:02:55,670 --> 00:02:58,880
Cocoa is the object detection detector dataset.

42
00:02:59,300 --> 00:03:05,570
Image net is a classified dataset, which has proven invaluable for computer vision research.

43
00:03:06,560 --> 00:03:07,310
It's quite big.

44
00:03:07,310 --> 00:03:12,320
It's over a million images, so it took quite a number of people in quite a while to annotate the data.

45
00:03:12,950 --> 00:03:18,080
Cocoa similarly has thousands and thousands of images into objects.

46
00:03:18,080 --> 00:03:19,400
I believe in this one.

47
00:03:19,400 --> 00:03:26,480
Cocoa is also cocoa 122 and cocoa 80, so it's 80 object classes, so you can take a look at this cocoa

48
00:03:26,480 --> 00:03:27,120
that lives.

49
00:03:27,140 --> 00:03:28,280
Let's take a look at that file.

50
00:03:28,580 --> 00:03:29,300
So let's fine with the.

51
00:03:31,070 --> 00:03:34,160
We can see it here, and you can actually preview the file.

52
00:03:34,160 --> 00:03:36,930
So these are the E.T. object classes.

53
00:03:36,980 --> 00:03:45,620
That object detector can detect two things like birds, sheep, horses, elephants, people and motorbikes.

54
00:03:45,980 --> 00:03:47,960
It's quite a number of common causes.

55
00:03:48,210 --> 00:03:50,810
Yeah, so it's quite good.

56
00:03:51,020 --> 00:03:57,690
So we also have two of the files here CFC, which is the model structure, and you can see it here.

57
00:03:57,690 --> 00:03:58,640
It's quite long.

58
00:03:58,670 --> 00:04:04,970
This defines the whole model structure for yellow that's coded in their field of vision, tree and widths,

59
00:04:05,450 --> 00:04:07,280
which is a 250 make file.

60
00:04:07,280 --> 00:04:12,950
I believe that stores all the values and weight values for that big model here.

61
00:04:13,940 --> 00:04:15,320
And then we use this function.

62
00:04:16,100 --> 00:04:20,090
This is the DNN read from darknet to lower the model.

63
00:04:20,090 --> 00:04:22,700
So we we'll see if she put as well as the weights.

64
00:04:23,330 --> 00:04:25,170
And then we just send it back in to Greece.

65
00:04:25,190 --> 00:04:26,780
Open TV back in for this one.

66
00:04:27,320 --> 00:04:31,070
Depending on the hardware you're using, you can get a different back and you can use.

67
00:04:31,080 --> 00:04:39,860
But for now, it's just use standard open TV backend and we use dot net that dot cuddly animals just

68
00:04:39,860 --> 00:04:43,920
to illustrate the names, the 254 different layers in this model.

69
00:04:44,390 --> 00:04:49,190
And you can print this in this big, long list here to see it if you want to just take a look, inspect

70
00:04:49,670 --> 00:04:53,440
the deepness of this model so there's no point to.

71
00:04:53,450 --> 00:04:56,010
But just forgive us curiosity sake.

72
00:04:56,030 --> 00:04:56,570
We can do it.

73
00:04:56,990 --> 00:04:59,960
So let's take a look at the function that we're going to use here.

74
00:05:00,170 --> 00:05:03,590
So we have DNN Blob from image.

75
00:05:03,950 --> 00:05:08,240
This transforms the image as we loaded into input.

76
00:05:08,480 --> 00:05:13,430
Possesses that image so that we can now feed that image into our little model detector.

77
00:05:14,480 --> 00:05:21,140
And then we have this other function here that takes that blob and calls it blob from image.

78
00:05:22,490 --> 00:05:22,720
Sorry.

79
00:05:22,760 --> 00:05:23,630
It's the same function here.

80
00:05:23,630 --> 00:05:29,390
Sorry, it's just an example of the properties that we can use here, so we can define a size scaling

81
00:05:29,390 --> 00:05:34,850
factors, whether we swap a, b and whatever cropping the sensor or not.

82
00:05:35,930 --> 00:05:41,550
So let's move on to actually using yellow in open city.

83
00:05:41,630 --> 00:05:44,110
So we're going to set the path.

84
00:05:44,120 --> 00:05:49,730
We have a bunch of images here and we just get those images out by using this, this comprehension here.

85
00:05:49,810 --> 00:05:55,220
This this list gives you all the file names in this directory, and we loop through each file name.

86
00:05:55,580 --> 00:06:01,430
We load the image, get the height and weight, convert it into into a blob.

87
00:06:02,090 --> 00:06:09,260
Then we passed the blob to our dot net here and then do a forward pass, not dot forward with the output.

88
00:06:09,650 --> 00:06:16,460
So actually, this should we come to that because it millennium's if we need to get and then from that,

89
00:06:16,460 --> 00:06:17,840
we just create this reuse here.

90
00:06:18,230 --> 00:06:20,840
This is where we get it down in boxes, confidence and ideas.

91
00:06:21,380 --> 00:06:28,760
So for the outputs of that and the outputs, we guess loops, you iterate through that, we get the

92
00:06:28,760 --> 00:06:34,670
detections from there, as well as confidence scores and class sizes and anything that has a confidence

93
00:06:34,670 --> 00:06:37,610
score above one above 0.75.

94
00:06:38,060 --> 00:06:42,140
We draw it onto the image and copy it out as well.

95
00:06:42,140 --> 00:06:50,390
If we wanted to store those images and then we apply something called on maximum suppression and mass

96
00:06:50,780 --> 00:06:55,280
to basically get rid of overlapping bounding boxes so that it looks a bit cleaner.

97
00:06:56,000 --> 00:06:58,880
And this is where we actually draw the binary boxes here.

98
00:06:58,940 --> 00:06:59,480
Sorry, so it.

99
00:06:59,480 --> 00:07:01,550
Enjoy this or it's going to disappear quickly.

100
00:07:02,110 --> 00:07:06,700
But this is us where we actually just get the bounding box as well.

101
00:07:06,880 --> 00:07:07,210
All right.

102
00:07:08,110 --> 00:07:09,940
So we just stored the boxes there.

103
00:07:09,940 --> 00:07:17,790
We applied and invested it, and then we just draw the final boxes from here into the image.

104
00:07:18,460 --> 00:07:19,330
So let's run.

105
00:07:19,330 --> 00:07:25,310
This consists a lot going on there, but it's actually quite faster and inferences using yellow.

106
00:07:25,360 --> 00:07:26,290
It's what it's famous for.

107
00:07:26,830 --> 00:07:28,310
So you can see here, this is great.

108
00:07:28,310 --> 00:07:29,830
This is got to cup my cup.

109
00:07:29,830 --> 00:07:34,150
That was my little vacation I went to as well.

110
00:07:34,630 --> 00:07:37,180
So got the dog correct.

111
00:07:37,660 --> 00:07:40,330
Got this bottle of rum and a cup behind it.

112
00:07:40,340 --> 00:07:40,810
Correct?

113
00:07:42,010 --> 00:07:43,240
It got the bus.

114
00:07:43,270 --> 00:07:47,830
People were sending a few people, some cars here, so it's quite good.

115
00:07:48,760 --> 00:07:50,710
This one gets all the cups as well.

116
00:07:51,850 --> 00:07:55,090
And it got me and the ball isn't that hard to see.

117
00:07:55,090 --> 00:07:58,660
But this person and this is sports more, I believe.

118
00:08:00,070 --> 00:08:02,190
Discipline got truck this one.

119
00:08:02,200 --> 00:08:05,260
It got all the potted plant and sea levels here.

120
00:08:05,680 --> 00:08:07,290
This one says ruins.

121
00:08:07,300 --> 00:08:12,490
I think it's a bit hard to see, but you can inspect the springs directly or you can even increase the

122
00:08:12,490 --> 00:08:13,540
size if you wanted it.

123
00:08:14,410 --> 00:08:17,080
This one, this cup, there's a small element, so it's a big.

124
00:08:17,080 --> 00:08:17,850
That's why it looks a big.

125
00:08:17,860 --> 00:08:18,850
It's so much bigger.

126
00:08:19,540 --> 00:08:22,780
So that's it for using yellow and open.

127
00:08:22,780 --> 00:08:28,480
To be a good exercise for you to do is to run yellow on a video.

128
00:08:28,650 --> 00:08:34,240
You remember that all the all the previous examples we had with videos and club where we used a video

129
00:08:34,240 --> 00:08:34,810
writer.

130
00:08:34,930 --> 00:08:37,030
And then we just played it back into club.

131
00:08:37,910 --> 00:08:44,350
Get a sample VIDEO Download something from YouTube or wherever you want and run you on it.

132
00:08:44,500 --> 00:08:49,600
It's a good exercise for you to get used to using yellow on videos, so thank you.

133
00:08:49,810 --> 00:08:56,770
And now let's move on to the next lesson which is implementing neural style transfer would open TV.

134
00:08:57,460 --> 00:08:58,720
So stay tuned for that.

135
00:08:58,840 --> 00:08:59,230
Thank you.
