1
00:00:00,600 --> 00:00:04,230
Now, let's start our deep segmentation lesson.

2
00:00:04,680 --> 00:00:11,390
And in this lesson, we will be using cameras to load a unit and segment model will train them.

3
00:00:11,400 --> 00:00:14,460
One of the data sets will generate its predictions.

4
00:00:14,700 --> 00:00:16,170
So now let's get started.

5
00:00:16,290 --> 00:00:20,310
So open notebook 53 and we'll begin to lesson.

6
00:00:21,240 --> 00:00:27,240
So in this lesson, what we're going to do, as I said before, is that we're going to load unit and

7
00:00:27,240 --> 00:00:32,580
segment and train them on this data set and then we'll run some predictions.

8
00:00:32,730 --> 00:00:35,370
So to do that, we have to clone this report here.

9
00:00:35,940 --> 00:00:41,730
So initially, this is the original report, but I had to make some slight changes fit to work.

10
00:00:41,730 --> 00:00:43,690
So it is cloned from my report here.

11
00:00:43,690 --> 00:00:47,790
I just left it to credit the original all of this work.

12
00:00:48,360 --> 00:00:53,220
So I've already run this notebook just to make sure everything was working fine before the lesson started.

13
00:00:53,880 --> 00:00:56,100
It takes about one second to execute.

14
00:00:56,880 --> 00:01:03,090
Next, we need to install some of the packages on that repo, so we just see the into that territory

15
00:01:03,090 --> 00:01:03,870
where it's installed.

16
00:01:04,470 --> 00:01:05,310
You can see it here.

17
00:01:05,490 --> 00:01:06,630
Image segmentation.

18
00:01:08,520 --> 00:01:09,720
And then just run.

19
00:01:09,720 --> 00:01:16,110
Python sets up the pie install so that installs all of the packages we need.

20
00:01:16,110 --> 00:01:16,560
It takes.

21
00:01:16,890 --> 00:01:18,990
It's pretty quick, takes about four or five seconds.

22
00:01:19,740 --> 00:01:21,660
Next, we need to download the dataset.

23
00:01:22,080 --> 00:01:23,040
So the dataset.

24
00:01:23,040 --> 00:01:25,410
I'll explore this dataset with you shortly.

25
00:01:25,440 --> 00:01:30,720
It's basically like a street view type dataset where you're looking at traffic pictures here.

26
00:01:30,720 --> 00:01:32,250
So let's take a look at it here.

27
00:01:33,290 --> 00:01:38,580
Let's take a look at some of the test images, since these will load faster because it's less images

28
00:01:38,580 --> 00:01:41,580
in the territory and all that's widened.

29
00:01:41,760 --> 00:01:48,150
So you can see these are all taken basically from a dashcam or camera mounted on top to call.

30
00:01:49,740 --> 00:01:55,800
And then it's annotated afterward, which we can explore the annotations so you can see there's a directory

31
00:01:55,800 --> 00:01:58,200
for images and there's a directory for annotations.

32
00:01:58,860 --> 00:02:04,830
However, in this annotations are coded in a way where it's not easy to visualize, even though it's

33
00:02:04,830 --> 00:02:05,730
a PNG file.

34
00:02:07,080 --> 00:02:08,710
You can see it looks like a black screen.

35
00:02:08,730 --> 00:02:14,920
However, if you look closely, you can just about see the outline of some people here, some objects.

36
00:02:14,920 --> 00:02:15,570
So it is.

37
00:02:16,050 --> 00:02:16,850
It is labeled.

38
00:02:16,860 --> 00:02:20,710
However, it's in probably like in zero one two three four.

39
00:02:20,730 --> 00:02:23,610
So when you represent that as a color, the almost all black.

40
00:02:24,150 --> 00:02:30,990
Unfortunately, there are quite a few other formats for labeling segmentation data.

41
00:02:31,020 --> 00:02:36,600
What I would encourage you to do if you want to see or if you want to label your own data is just go

42
00:02:36,600 --> 00:02:37,140
to remove.

43
00:02:37,140 --> 00:02:44,610
The rubble floor has amazing tools that are free to use for small image datasets, and you can generate

44
00:02:44,850 --> 00:02:52,650
quickly label and generate your own dataset for segmentation and almost all of the popular formats they

45
00:02:52,650 --> 00:02:53,100
offer.

46
00:02:53,220 --> 00:02:59,580
So it's quite good, quite a useful tool, and it improves your workflow tremendously as a computer

47
00:02:59,580 --> 00:03:01,110
vision practitioner.

48
00:03:02,310 --> 00:03:04,320
So next, we have to initialize model.

49
00:03:04,320 --> 00:03:08,070
Basically, this downloads to pre-trained model for a unit.

50
00:03:08,160 --> 00:03:13,620
So we're getting the FG unit model here that takes about 10 seconds to get.

51
00:03:14,520 --> 00:03:20,490
Then we start treating this model so you can see we just have this point.

52
00:03:20,490 --> 00:03:24,180
The training images, the directories here, the training annotations to the directory.

53
00:03:24,660 --> 00:03:27,510
This is a karats format and it automatically loads it.

54
00:03:27,720 --> 00:03:34,260
We also store some checkpoints here in this temporary directory, and we're only going to do five epochs

55
00:03:34,410 --> 00:03:37,050
in this just to just illustrate how it works.

56
00:03:37,530 --> 00:03:41,070
But we actually do get some pretty good results with just five epochs, to be fair.

57
00:03:41,640 --> 00:03:48,120
You can see the accuracy starts at seventy six and then progresses all the way up to zero point nine

58
00:03:48,120 --> 00:03:49,620
one, which is actually pretty good.

59
00:03:50,130 --> 00:03:52,650
Now what are accuracy is measuring?

60
00:03:53,310 --> 00:03:58,800
Remember, segmentation is pixel class predictions every pixel this data class.

61
00:03:59,310 --> 00:04:06,270
So this 91 percent might look quite good, but in reality, it can look quite messy is still because

62
00:04:06,270 --> 00:04:08,700
10 percent of the pixels are being misclassified.

63
00:04:09,150 --> 00:04:11,400
Either way, you can take a look and see how it looks.

64
00:04:11,550 --> 00:04:14,790
So we're going to predict this image here.

65
00:04:15,450 --> 00:04:17,190
This is the directory in the name of the image.

66
00:04:17,190 --> 00:04:23,190
And then we're going to generate the output here and in the temp there, actually, which let's see

67
00:04:23,190 --> 00:04:24,030
where it is.

68
00:04:25,560 --> 00:04:26,730
It's actually really not in this.

69
00:04:26,770 --> 00:04:28,500
It's probably the actual temp territory.

70
00:04:28,500 --> 00:04:30,480
So I'm not going to navigate toward it anyway.

71
00:04:31,410 --> 00:04:32,670
So let's run.

72
00:04:32,670 --> 00:04:33,660
I've run this already.

73
00:04:33,930 --> 00:04:37,380
Now let's import my plot lib and take a look at the output.

74
00:04:37,590 --> 00:04:40,020
So this is pretty good.

75
00:04:40,230 --> 00:04:46,740
You can see that some people have a very common problem in segmentation models is that you can see,

76
00:04:46,740 --> 00:04:47,790
well, this is a person.

77
00:04:47,850 --> 00:04:51,690
They are sometimes the other classes that it predicts around that sometimes.

78
00:04:51,690 --> 00:04:58,110
So they are quite a few models, no like detections, mask or CNN that it worked quite well.

79
00:04:58,860 --> 00:04:59,770
However, the unit.

80
00:04:59,970 --> 00:05:06,090
Unit does work fairly well, though, in my opinion, and to a lesser extent, the unit unit is better.

81
00:05:06,750 --> 00:05:11,220
But either way, you can see it's pretty good it got road and the cars pretty well.

82
00:05:11,730 --> 00:05:16,980
These are building clusters over this weird patch that probably something in the building who thinks

83
00:05:16,980 --> 00:05:19,770
it's it's not even shot, maybe a tree or something.

84
00:05:20,880 --> 00:05:28,860
Next, we can see this is a different vision, a different color mapping of it, just to illustrate

85
00:05:28,860 --> 00:05:29,610
what's going on.

86
00:05:30,540 --> 00:05:37,230
Now we can display it with a legend so you can actually see what the classes are stood sky building,

87
00:05:37,230 --> 00:05:42,690
pool road, pavement tree signs and bluefin's car, pedestrian and bicycle.

88
00:05:42,780 --> 00:05:47,580
And you can see it with the classes labeled here overlaid onto the original image.

89
00:05:47,730 --> 00:05:53,880
So this is quite a good way to visualize what you're looking at, and you can explore and see where

90
00:05:53,880 --> 00:05:56,530
your model is having issues.

91
00:05:56,910 --> 00:06:01,320
So you can see rebel-controlled of pavement, correct building or mostly correct.

92
00:06:01,330 --> 00:06:05,820
However, with this little patch and the traffic light, actually this is correct the same symbol.

93
00:06:06,690 --> 00:06:11,640
This is probably a mistake, something that a store shop sign will have the original image.

94
00:06:11,640 --> 00:06:14,790
I can display it afterwards, but we can see it a bit here.

95
00:06:16,080 --> 00:06:21,900
So, yeah, pedestrians and bicyclists got a piece of the bicycle.

96
00:06:22,170 --> 00:06:23,400
That's a bit unfortunate.

97
00:06:23,910 --> 00:06:28,800
And this guy tends to merge into a car, maybe because of the color of his jacket.

98
00:06:29,580 --> 00:06:32,940
Either way, no, let's load and train a signet model.

99
00:06:33,630 --> 00:06:35,820
So the second model trains a bit quicker.

100
00:06:35,830 --> 00:06:42,330
You can see it takes fifty seven fifty five bucks per person, five seconds per epoch, whereas this

101
00:06:42,330 --> 00:06:47,640
the unit takes about 64 65 so slightly faster.

102
00:06:47,970 --> 00:06:48,750
But you can see it.

103
00:06:48,750 --> 00:06:51,720
Definitely accuracy is not as good after five epochs.

104
00:06:52,560 --> 00:06:58,170
However, I haven't experimented much with these two models on this dataset, so perhaps they are.

105
00:06:58,350 --> 00:07:02,370
If you train the segment for longer, maybe you can get better results.

106
00:07:02,400 --> 00:07:02,880
Who knows?

107
00:07:03,360 --> 00:07:06,240
It's kind of doubtful, though, but because the units, it's generally better.

108
00:07:07,020 --> 00:07:09,090
But now we can take a look at the unit.

109
00:07:09,140 --> 00:07:13,440
I'm sorry to segment predictions and well, it looks a lot cleaner.

110
00:07:14,130 --> 00:07:18,450
Definitely makes more mistakes because you can see the road to us merges into the sidewalk.

111
00:07:19,080 --> 00:07:21,820
You can see no idea what this is.

112
00:07:21,840 --> 00:07:24,120
It's probably a building class mixed with other things.

113
00:07:24,750 --> 00:07:30,210
So you can see it's a lot more cartoony, but it's a lot cleaner, less noise in this one.

114
00:07:30,930 --> 00:07:37,370
So that concludes this lesson on using carrots to load, load and trim image.

115
00:07:38,460 --> 00:07:46,470
When I say load and trim unit and segment in the next section, we'll take a look at loading the deep

116
00:07:46,470 --> 00:07:48,720
love of truth in PyTorch.

117
00:07:48,930 --> 00:07:50,340
So stay tuned for that lesson.

118
00:07:50,520 --> 00:07:50,910
Thank you.