1
00:00:00,780 --> 00:00:02,430
Hi, welcome back to the course.

2
00:00:02,580 --> 00:00:08,760
We're going to take a look at OCR, which stands for optical character recognition using the PI Tesseract

3
00:00:08,760 --> 00:00:09,270
Library.

4
00:00:09,600 --> 00:00:10,750
So let's take a look.

5
00:00:10,770 --> 00:00:12,270
Open this notebook and let's take a look.

6
00:00:12,780 --> 00:00:18,990
So Pi Tesseract is a open source library that's available for Python and 50 languages as well.

7
00:00:19,560 --> 00:00:26,130
Where where does where it takes an image input, recognizes, detects and returns a text as a string?

8
00:00:26,760 --> 00:00:29,630
This is a high level overview of how that API works.

9
00:00:29,650 --> 00:00:36,420
The PI Tesseract API takes an input image, does some processing on it, uses a Lepton account open

10
00:00:36,420 --> 00:00:44,400
source image processing library to along with trained data to do its OCR engine to implement that algorithm

11
00:00:44,400 --> 00:00:44,640
here.

12
00:00:45,270 --> 00:00:50,220
And then it goes to a post processor, which probably probably cleans up the text a bit and adds spaces

13
00:00:50,220 --> 00:00:54,930
and removes errors as well, and then provides a tax here as the output.

14
00:00:55,500 --> 00:00:59,040
So you can take a look at this blog that actually explains it more in more detail if you want.

15
00:00:59,730 --> 00:01:06,690
But now let's go down to actually using this Tesseract library, so to install it, we have to install

16
00:01:06,690 --> 00:01:06,900
it.

17
00:01:07,830 --> 00:01:08,760
The Tesseract, we'll see.

18
00:01:08,760 --> 00:01:14,580
Our library is library files first and then do the pipe install of PI Tesseract.

19
00:01:15,060 --> 00:01:16,520
So we've already done it here.

20
00:01:16,530 --> 00:01:17,970
It takes about 12 seconds to run.

21
00:01:18,480 --> 00:01:19,350
Not that long.

22
00:01:19,890 --> 00:01:21,420
So let's go ahead and do that.

23
00:01:22,140 --> 00:01:23,910
And know what we do here.

24
00:01:24,480 --> 00:01:31,120
We are going to import PI Tesseract because you remember previously we just put it to a number of my

25
00:01:31,140 --> 00:01:37,080
plot Lab. Now we're importing PI Tesseract, just like we import a dylib and we create the point to

26
00:01:37,080 --> 00:01:37,950
this year.

27
00:01:38,730 --> 00:01:41,790
This is the PI Tesseract Tesseract command line.

28
00:01:42,330 --> 00:01:46,050
So we just point to where the Tesseract library was installed.

29
00:01:46,080 --> 00:01:51,060
That's what we installed in the first line here in the above code Tesseract OCR.

30
00:01:51,480 --> 00:01:55,440
This is how we are able to use it within Python or listening within CoLab.

31
00:01:55,950 --> 00:01:58,780
It depends on which on your local system.

32
00:01:58,780 --> 00:01:59,220
It depends on.

33
00:01:59,220 --> 00:02:03,780
If you're doing it in Windows and Mac, you may not lead this line of code here anymore.

34
00:02:04,470 --> 00:02:07,740
This is our initial function and is our test images here.

35
00:02:08,250 --> 00:02:10,620
So let's load all little libraries.

36
00:02:11,580 --> 00:02:12,060
There we go.

37
00:02:12,540 --> 00:02:15,390
So now let's try our first OCR trial.

38
00:02:15,990 --> 00:02:23,510
So the image we're testing here is welcome to OCR here, and you can see how simple data right to is.

39
00:02:24,060 --> 00:02:30,710
Once we've loaded it, then you just take an image that you load using open TV here, and that is the

40
00:02:30,720 --> 00:02:31,980
IMG file right here.

41
00:02:32,610 --> 00:02:38,840
You just point that IMG variable to the page to select image, to string function, and that's it.

42
00:02:38,850 --> 00:02:43,950
You get the output text so you can split it off the text here and there we go, so you can see the picture

43
00:02:43,950 --> 00:02:44,820
struct extracted.

44
00:02:44,820 --> 00:02:48,260
Text is welcome to OCR, and that's what was in the image.

45
00:02:48,270 --> 00:02:49,140
This is an image here.

46
00:02:49,320 --> 00:02:52,530
You can't actually highlight the text here, so that looks pretty good.

47
00:02:52,540 --> 00:02:53,580
It does it work?

48
00:02:53,580 --> 00:02:56,310
Do on white text, on black background.

49
00:02:56,310 --> 00:03:00,720
So it's actually we didn't run this separately from this, just so we have it.

50
00:03:01,320 --> 00:03:06,030
And this is one where it's on white text, on black background and it works.

51
00:03:06,030 --> 00:03:09,990
As well as you can see, Tesseract has pretty versatile performance still.

52
00:03:10,590 --> 00:03:16,160
But what about a more messy backgrounds like a back mess like this with a bunch of colors in between?

53
00:03:16,740 --> 00:03:17,490
Let's see.

54
00:03:17,490 --> 00:03:18,360
Let's see if it works.

55
00:03:19,080 --> 00:03:20,730
It didn't work, unfortunately.

56
00:03:21,210 --> 00:03:23,040
So patents are actors only so good.

57
00:03:23,910 --> 00:03:26,490
They're pretty much more advanced libraries.

58
00:03:26,490 --> 00:03:32,940
No easy OCR, which works quite well, but by Tesseract field at this example here.

59
00:03:33,720 --> 00:03:36,090
So let's try, though, on a real world scan.

60
00:03:36,240 --> 00:03:41,220
So let's do one this image here, which is a real world scan from one of my text textbooks.

61
00:03:43,650 --> 00:03:52,410
And you can see it worked, okay, ish, wasn't that accurate and text looks weirdly formatted, but

62
00:03:52,470 --> 00:03:54,120
it worked to an extent.

63
00:03:54,960 --> 00:03:56,220
Now let's let's see something.

64
00:03:56,230 --> 00:03:58,140
Let's see if we actually clean up this image.

65
00:03:58,500 --> 00:04:03,750
So we're going to perform some trash only here using the image filters instead of the open sea view

66
00:04:03,750 --> 00:04:06,970
one, because this one tends to look better for scan texts.

67
00:04:06,990 --> 00:04:11,970
You can see the output here, and let's take a look at home page to select performed.

68
00:04:15,060 --> 00:04:18,520
So you can immediately see that it looks a lot better.

69
00:04:18,540 --> 00:04:21,000
It's not perfect, but it did look a lot better.

70
00:04:21,930 --> 00:04:27,300
So thresholding helps a lot of other processes that we can do would be blurring to clean up the image.

71
00:04:27,540 --> 00:04:31,200
Definitely try showing this skewing because this image is quite skewed.

72
00:04:31,410 --> 00:04:33,000
Have this kind of curve in the middle here.

73
00:04:34,170 --> 00:04:39,570
This boulder should see dilation, erosion, close opening, closing hopes, as well as other noise

74
00:04:39,570 --> 00:04:39,990
removal.

75
00:04:40,260 --> 00:04:42,810
So let's try to correct on another image.

76
00:04:43,230 --> 00:04:50,280
Let's try it on a receipt would receive receipts, which is an American story, I believe so we load

77
00:04:50,280 --> 00:04:50,880
the receipt.

78
00:04:51,330 --> 00:04:54,150
We are going to we're going to do some pre-processing here.

79
00:04:54,570 --> 00:05:00,030
So we're going to actually visit the agency here and then we're going to treasure it differently.

80
00:05:00,660 --> 00:05:02,760
But they're using a scale for this here.

81
00:05:03,540 --> 00:05:05,640
And then just get up with here.

82
00:05:05,640 --> 00:05:10,740
So we get a nice treasure that image here and we run it through by Tesseract at the end.

83
00:05:11,370 --> 00:05:18,150
And you can see that's actually run this stuff sort of looking at the saved output and you can see it

84
00:05:18,150 --> 00:05:19,590
would almost perfectly.

85
00:05:20,430 --> 00:05:21,390
That's quite impressive.

86
00:05:21,420 --> 00:05:29,100
So, you know, you can develop your own receipts scanner and get the data out and run some post-processing

87
00:05:29,100 --> 00:05:29,610
on it.

88
00:05:29,640 --> 00:05:32,280
Do some data analysis using receipts.

89
00:05:32,430 --> 00:05:33,000
It's pretty cool.

90
00:05:34,170 --> 00:05:36,420
So let's take a look at something here.

91
00:05:36,780 --> 00:05:38,940
Let's take a look at the output for this year.

92
00:05:39,750 --> 00:05:46,080
So Pi Tesseract gives this output image to data as a dictionary of keys right here.

93
00:05:46,200 --> 00:05:47,850
So you can use these keys now.

94
00:05:48,780 --> 00:05:54,780
The reason I'm showing you this is that 19 actually use this to get the exact bounding boxes in an image

95
00:05:55,140 --> 00:05:59,370
where you taught text was located, where the parties are up to a text was located.

96
00:06:00,030 --> 00:06:01,740
So let's run this here I'm using.

97
00:06:02,220 --> 00:06:03,480
We're looping through all the boxes.

98
00:06:03,480 --> 00:06:04,230
It returns here.

99
00:06:04,830 --> 00:06:05,910
So you can see this here.

100
00:06:05,910 --> 00:06:08,740
Actually, this key here gives you level page.

101
00:06:08,740 --> 00:06:09,240
No block.

102
00:06:10,290 --> 00:06:16,140
I'm not sure what portion is line number with number and left topped with tight and confidence is that

103
00:06:16,470 --> 00:06:20,370
is the basically the bounding box of the text and the confidence it establishes.

104
00:06:20,850 --> 00:06:23,910
So let's plot this, and it took a look at something very cool.

105
00:06:24,480 --> 00:06:27,060
Oops, we didn't actually look this.

106
00:06:30,130 --> 00:06:36,550
So now we know this there, and you can see the all of the bounding boxes where patents are up to it.

107
00:06:36,670 --> 00:06:41,170
Text was located, you can see it's got some textures here, unfortunately, when it wasn't.

108
00:06:41,200 --> 00:06:45,400
And it leaves out Woolworths here, as well as these two W's and some numbers here and there.

109
00:06:46,030 --> 00:06:48,880
But overall, this performance is performing pretty good.

110
00:06:48,880 --> 00:06:53,730
It, if you were to set a little lower thresholds, could probably get more boxes.

111
00:06:54,280 --> 00:06:56,320
So that's it for disarray.

112
00:06:56,350 --> 00:07:03,310
So no, let's move on to using easy OCR to do the same text detection and with optical character recognition.

113
00:07:03,940 --> 00:07:08,110
However, easy OCR is a much better library than pie Tesseract.

114
00:07:08,320 --> 00:07:13,870
However, it's a bit slow when running on the CPU, but it can luckily run on GPU systems.

115
00:07:13,960 --> 00:07:20,080
If you were to change just to keep you by going to settings and changing into GPU, you would have to.

116
00:07:20,080 --> 00:07:23,080
It would work much quicker, but you'd have to reload the images.

117
00:07:23,080 --> 00:07:24,190
So just remember that.

118
00:07:24,760 --> 00:07:26,830
But now I'm going to leave it as CPU for you guys.

119
00:07:27,460 --> 00:07:30,550
So to run it and install easy OCR.

120
00:07:30,580 --> 00:07:31,480
Just run this.

121
00:07:31,780 --> 00:07:37,630
Click this on Arrow or just run this code block here, and it will download it with easy OCR and it

122
00:07:37,630 --> 00:07:40,500
will ask you to restart runtime.

123
00:07:40,510 --> 00:07:43,630
So to do that actually the most, let's just reinstall this anyway.

124
00:07:47,430 --> 00:07:48,720
It shouldn't take that long for me.

125
00:07:49,650 --> 00:07:49,980
Oh, OK.

126
00:07:50,100 --> 00:07:55,350
Didn't ask me to restart one time because I already did, however, restart your run time when the prompt

127
00:07:55,350 --> 00:07:58,950
comes up and click Yes, and that's it.

128
00:07:59,100 --> 00:08:01,330
You will have to reload your packages telling us.

129
00:08:01,330 --> 00:08:04,320
So that's why we import these packages again here.

130
00:08:04,590 --> 00:08:12,090
So let's firstly, let's just play our test image and we're going to run easy OCR using the detect method.

131
00:08:12,690 --> 00:08:13,060
Oops.

132
00:08:13,080 --> 00:08:21,270
This initial function isn't defined, so let's get it from here and place it up here.

133
00:08:23,430 --> 00:08:24,480
So this is.

134
00:08:26,300 --> 00:08:28,340
Well, probably the name is different.

135
00:08:29,480 --> 00:08:30,380
I'm not sure why.

136
00:08:33,390 --> 00:08:35,100
So this is the impediment here.

137
00:08:35,700 --> 00:08:38,820
This is just a WhatsApp conversation with my friend Sophia.

138
00:08:39,420 --> 00:08:45,420
And what it does to tells you firstly, CUDA is unavailable, so using the CPU and then it tells you

139
00:08:45,420 --> 00:08:53,650
it's detecting OCR in the texts and this do this, this cleaning up, it's soon to be around 14 seconds.

140
00:08:53,670 --> 00:09:00,240
You can see that they're quite a well of the output, however, is in results and we actually don't

141
00:09:00,450 --> 00:09:01,560
display our results.

142
00:09:02,100 --> 00:09:05,250
So let's just put the results here and take a look at it.

143
00:09:05,970 --> 00:09:07,580
And you can see these are the results here.

144
00:09:07,590 --> 00:09:08,490
So what is this?

145
00:09:09,090 --> 00:09:13,500
This is basically remember what pilots are after we got boning boxes.

146
00:09:13,860 --> 00:09:15,930
Confidence scores and texts.

147
00:09:16,380 --> 00:09:17,990
Well, this is all of it here.

148
00:09:18,000 --> 00:09:23,370
So we're going to process this afterward and put the text box on the image just like we did with the

149
00:09:23,370 --> 00:09:25,460
pilot throughout the Millwood scan.

150
00:09:26,520 --> 00:09:30,090
But for now, you can see this is how we run easy OCR.

151
00:09:30,210 --> 00:09:30,960
It's quite simple.

152
00:09:31,590 --> 00:09:33,600
We import to read a function here.

153
00:09:33,960 --> 00:09:38,070
We know the imaging and we just declare that we're going to look at English here.

154
00:09:38,640 --> 00:09:42,900
I have G view that it's true, but I'm I'm actually using GPUs, so that's going to be an error, which

155
00:09:42,900 --> 00:09:43,620
is what the area.

156
00:09:43,620 --> 00:09:44,530
So it was an error.

157
00:09:44,530 --> 00:09:45,120
It was a warning.

158
00:09:45,690 --> 00:09:52,380
So just put this to fall to true, depending if you're using a GPU inside of a list here, you can see

159
00:09:52,380 --> 00:09:58,170
it's a list it tells you is looking for English language tests and then we just have a timer here just

160
00:09:58,170 --> 00:10:00,750
to get the time it's sticking to process this.

161
00:10:01,440 --> 00:10:03,170
And that's not really necessary.

162
00:10:03,240 --> 00:10:05,250
So this is just for statistical purposes.

163
00:10:05,250 --> 00:10:11,460
If you want to monitor the efficiency of your code and we use a read text function, which we imported

164
00:10:11,850 --> 00:10:12,810
here from Rita.

165
00:10:13,620 --> 00:10:18,640
So this is sorry from read it here and then descriptors reader objects here.

166
00:10:18,640 --> 00:10:24,360
And then we have a reader that read text, text, input image, and Rita gets all the output results

167
00:10:24,360 --> 00:10:25,350
that we saw here.

168
00:10:26,400 --> 00:10:29,700
So let's take a look and displaying the text on the image.

169
00:10:30,120 --> 00:10:31,890
So we're going to use this function well.

170
00:10:31,890 --> 00:10:37,470
This block of code here that basically goes to the results, gets the money box text and the probability

171
00:10:37,470 --> 00:10:42,840
scores, and it just draws bounding box gets extracts, the voting box coordinates.

172
00:10:43,350 --> 00:10:48,510
Oh, right here you can see this is top left, top right, bottom right, bottom left, and then we

173
00:10:48,510 --> 00:10:55,050
just clean up the text of it using this little line of code here, and we just put the bong in boxes

174
00:10:55,050 --> 00:10:56,580
and the text onto the image.

175
00:10:56,640 --> 00:10:57,690
So let's do that now.

176
00:10:59,720 --> 00:11:02,820
So again, just an image.

177
00:11:05,550 --> 00:11:06,010
There we go.

178
00:11:06,030 --> 00:11:10,890
So you can see this wood almost perfectly well, it pretty much got all the ducks.

179
00:11:10,890 --> 00:11:14,700
I mean, the system detects from a screenshot, so it should work fairly well.

180
00:11:15,210 --> 00:11:17,700
But it's actually pretty good.

181
00:11:17,910 --> 00:11:20,490
So that's it for easy OCR.

182
00:11:20,550 --> 00:11:20,880
No.

183
00:11:20,880 --> 00:11:26,790
Let's take a look at how easy OCR compares to protests are on the same littlebit image.

184
00:11:27,210 --> 00:11:31,320
So this block of code takes a little while to run because it's a lot of text.

185
00:11:31,770 --> 00:11:38,100
So just look at the opportunity to run this in this video, but you can see particular look.

186
00:11:38,730 --> 00:11:40,380
It is far better than fighter, sir.

187
00:11:40,390 --> 00:11:45,480
I've got all the text that even find some little hidden text in the background here, but we don't really

188
00:11:45,480 --> 00:11:46,110
want this.

189
00:11:46,970 --> 00:11:48,150
It is Arabic for previously.

190
00:11:48,160 --> 00:11:54,120
Such is ignore that for now, but you can see it got almost everything here, and it looks like it got

191
00:11:54,120 --> 00:12:01,140
all the text correct is a bit hard to see if we can zoom in, but it looks like everything seems to

192
00:12:01,140 --> 00:12:03,450
be perfectly perfectly correct.

193
00:12:03,940 --> 00:12:09,180
One was perfectly correct in this image, so this is quite good performance here.

194
00:12:09,990 --> 00:12:11,730
I'm fairly impressed with this.

195
00:12:12,330 --> 00:12:19,470
So if you were to use any losartan applications, I would encourage you to use easy OCR as opposed to

196
00:12:19,470 --> 00:12:20,250
by Tesseract.

197
00:12:20,310 --> 00:12:26,250
However, if speed is a concern and you don't have a GPU available, Titus or up might be a better solution.

198
00:12:26,940 --> 00:12:28,320
So that's it for this lesson.

199
00:12:28,320 --> 00:12:29,070
The next lesson?

200
00:12:29,490 --> 00:12:31,980
We're going to take a look at Barcoo generation and reading.

201
00:12:32,430 --> 00:12:34,260
This is actually backward and QR code.

202
00:12:34,530 --> 00:12:37,920
So let me just rename this appropriately.

203
00:12:39,570 --> 00:12:43,110
So stay tuned and I'll see you in the next section.

204
00:12:43,260 --> 00:12:43,830
Thank you.
