WEBVTT

0
00:00.180 --> 00:02.460
Now that we've seen how we can get started

1
00:02.460 --> 00:06.510
using Selenium WebDriver to automate our browser,

2
00:06.510 --> 00:10.380
the next step is to figure out how to use Selenium to find

3
00:10.380 --> 00:14.973
and locate specific HTML elements on the webpage.

4
00:16.170 --> 00:19.170
Just between where we quit our driver

5
00:19.170 --> 00:22.350
and where we get hold of a particular page,

6
00:22.350 --> 00:24.840
I'm going to show you how easy it is to complete

7
00:24.840 --> 00:29.840
the previous task in day 47 where we tried to get hold

8
00:30.300 --> 00:33.843
of the price of a particular item on amazon.com.

9
00:34.950 --> 00:37.050
This was the item that we were interested in.

10
00:37.050 --> 00:40.470
This was the Instant Pot, and this is the price.

11
00:40.470 --> 00:43.200
So if I go ahead and take this URL

12
00:43.200 --> 00:48.060
and replace this previous just amazon.com URL

13
00:48.060 --> 00:51.090
with the URL that takes us to this page,

14
00:51.090 --> 00:55.590
then we can start inspecting on this particular element.

15
00:55.590 --> 00:58.590
Now, if we take a look at the HTML,

16
00:58.590 --> 01:01.830
we can see that we've got the price showing.

17
01:01.830 --> 01:03.690
And at the time of recording,

18
01:03.690 --> 01:07.200
the price is split into two parts.

19
01:07.200 --> 01:11.160
So you've got a class called, "a-price-whole"

20
01:11.160 --> 01:13.230
which is the number of dollars

21
01:13.230 --> 01:18.230
and a class called, "a-price-fraction", which are the cents.

22
01:18.840 --> 01:23.070
So Amazon can obviously change their website at any time.

23
01:23.070 --> 01:24.690
Previously, for example, the price

24
01:24.690 --> 01:29.690
used to be inside an element with the id, "priceblock_ourprice".

25
01:30.150 --> 01:33.840
So you have to look inside which element the price is in

26
01:33.840 --> 01:36.690
by right-clicking, Inspect element

27
01:36.690 --> 01:40.804
and then checking to see what the ID is called

28
01:40.804 --> 01:44.130
because Amazon can change this set any time.

29
01:44.130 --> 01:45.660
Now, once you've noted down

30
01:45.660 --> 01:49.500
if you want a particular class, or a particular ID,

31
01:49.500 --> 01:53.730
take a look at the Selenium Locator strategies.

32
01:53.730 --> 01:56.370
You can see here in the Selenium Docs,

33
01:56.370 --> 01:59.820
there's many ways that we can find different elements

34
01:59.820 --> 02:02.160
on a page using Selenium.

35
02:02.160 --> 02:04.110
You can see that Selenium provides support

36
02:04.110 --> 02:07.110
for these eight traditional location strategies.

37
02:07.110 --> 02:09.390
In our case, we're looking for the price element

38
02:09.390 --> 02:14.250
on Amazon's page by selecting on a particular class name.

39
02:14.250 --> 02:15.900
Now the Selenium docs are pretty good

40
02:15.900 --> 02:18.120
and include handy little code snippets

41
02:18.120 --> 02:19.740
for everything you want to do.

42
02:19.740 --> 02:22.770
So for example here, you can see how to find an element

43
02:22.770 --> 02:27.300
by class name, and that's exactly what we need to do.

44
02:27.300 --> 02:31.230
And you can do that by using the find_element method.

45
02:31.230 --> 02:34.080
Depending on whether if you're looking to find an element

46
02:34.080 --> 02:39.080
by class name or by ID, we need to use a class called, "By".

47
02:39.420 --> 02:43.800
And it has helpers such as By.CLASS_NAME.

48
02:43.800 --> 02:45.990
And then all we need to do is just to specify

49
02:45.990 --> 02:49.500
the name of the class that we're interested in finding.

50
02:49.500 --> 02:54.500
So the first thing I'll do is to import the By class

51
02:54.630 --> 02:59.630
and that's just from selenium.webdriver.com.common.by.

52
03:00.720 --> 03:02.253
And then import By.

53
03:04.110 --> 03:07.770
And then we can create a variable called price_dollar

54
03:07.770 --> 03:10.140
to hold onto the dollar price element

55
03:10.140 --> 03:11.553
from the Amazon page.

56
03:16.260 --> 03:19.200
For the value, we'll grab the class name of the span

57
03:19.200 --> 03:20.790
from the Amazon page.

58
03:20.790 --> 03:23.940
So the class is called, "a-price-whole"

59
03:23.940 --> 03:26.880
and we're going to paste it into our code.

60
03:26.880 --> 03:29.610
And it's probably easier just so you don't make any typos

61
03:29.610 --> 03:31.440
to just copy the name of that class

62
03:31.440 --> 03:33.033
and paste it into our code.

63
03:33.900 --> 03:35.340
Next, let's do the same thing

64
03:35.340 --> 03:38.133
for the second part of the price, the cents on the dollar.

65
03:38.133 --> 03:41.850
So we'll create a variable called, price_cents,

66
03:41.850 --> 03:46.470
and we'll use driver.find_element(By.CLASS_NAME...

67
03:46.470 --> 03:49.200
and then set the class name value to equal

68
03:49.200 --> 03:51.723
to "a-price-fraction".

69
03:53.220 --> 03:54.840
Again, these things can change.

70
03:54.840 --> 03:56.963
At the time of recording, it's called a-price-fraction

71
03:56.963 --> 04:01.020
but be sure to check what the live situation is

72
04:01.020 --> 04:02.823
when you are doing this exercise.

73
04:03.750 --> 04:07.620
Now let's print out the full price as an F-string.

74
04:07.620 --> 04:11.520
We'll write print(f... and then ...The price is...

75
04:11.520 --> 04:13.413
and we'll insert the ...{price_dollar}...

76
04:16.920 --> 04:19.353
dot, and then the ...{price_cents}.

77
04:22.230 --> 04:24.060
Now what's going to happen here

78
04:24.060 --> 04:28.320
if you try to run this code is we've actually found

79
04:28.320 --> 04:31.560
the element by class name, but right now,

80
04:31.560 --> 04:35.280
price_dollar and price_cents are actually an HTML element.

81
04:35.280 --> 04:39.660
So if we want to have the text that's inside those elements,

82
04:39.660 --> 04:42.270
we have to access the text content.

83
04:42.270 --> 04:44.580
And you can see here in the Selenium Docs,

84
04:44.580 --> 04:46.620
to get the text content,

85
04:46.620 --> 04:49.860
all we need to do is just write .text

86
04:49.860 --> 04:52.140
after we found the element.

87
04:52.140 --> 04:56.820
So we can add .text to the price_dollar and price_cent.

88
04:56.820 --> 04:59.280
So now we can actually get hold

89
04:59.280 --> 05:03.060
of the content that's inside those HTML elements.

90
05:03.060 --> 05:06.960
Now all we need to do is uncomment our driver.quit(),

91
05:06.960 --> 05:08.880
because otherwise, we're going to have a new instance

92
05:08.880 --> 05:11.380
of Chrome running every time I hit the Run button.

93
05:12.360 --> 05:16.980
And now if I hit Run, you'll see our browser open up briefly

94
05:16.980 --> 05:19.719
to the page with the Instant Pot, and then it'll close down

95
05:19.719 --> 05:23.520
and you can see the price printed right here.

96
05:23.520 --> 05:27.840
Now the reason why this is so much shorter than what you did

97
05:27.840 --> 05:31.950
in the last project is because we're driving a browser.

98
05:31.950 --> 05:35.310
So the browser is already sending all of those headers,

99
05:35.310 --> 05:38.550
all of the information that Amazon would expect

100
05:38.550 --> 05:43.500
from an actual user, instead of using a GET request,

101
05:43.500 --> 05:45.750
using the request package.

102
05:45.750 --> 05:47.550
In only three lines,

103
05:47.550 --> 05:51.570
we can do so much more than we could do with Beautiful Soup.

104
05:51.570 --> 05:53.190
Essentially, if you think about it

105
05:53.190 --> 05:56.730
Beautiful Soup was really good for just grabbing

106
05:56.730 --> 06:00.720
and scraping pieces of data from an HTML website

107
06:00.720 --> 06:03.567
but it gets stuck when that website is being rendered

108
06:03.567 --> 06:07.200
using JavaScript or Angular or React

109
06:07.200 --> 06:12.200
and the content that was HTML was taking time to load

110
06:13.680 --> 06:18.120
or it requires certain conditions to load.

111
06:18.120 --> 06:19.260
Whereas in this case,

112
06:19.260 --> 06:22.470
we're effectively doing exactly the same thing

113
06:22.470 --> 06:24.570
as we would as a human,

114
06:24.570 --> 06:26.580
going to this particular URL

115
06:26.580 --> 06:30.430
and then looking at this element and its value.

116
06:31.740 --> 06:33.120
I wanted to show you a couple

117
06:33.120 --> 06:36.210
of other ways that you could find elements.

118
06:36.210 --> 06:38.550
You can see there's a whole bunch of ways

119
06:38.550 --> 06:41.970
of finding elements and there's a lot more methods

120
06:41.970 --> 06:44.790
than you have with Beautiful Soup.

121
06:44.790 --> 06:47.760
One that I want to show you that's quite useful

122
06:47.760 --> 06:50.580
is find element by NAME.

123
06:50.580 --> 06:54.000
If we go to python.org, you'll remember

124
06:54.000 --> 06:56.850
from your lessons on HTML and CSS

125
06:56.850 --> 06:59.880
that most forms, these input forms

126
06:59.880 --> 07:02.850
will have a name attribute.

127
07:02.850 --> 07:06.180
So if I go ahead and inspect on this search bar,

128
07:06.180 --> 07:08.617
you can see it's got an id, it's got a name,

129
07:08.617 --> 07:10.987
it's got a type, it's got a role, it's got a class,

130
07:10.987 --> 07:13.620
it's got a whole bunch of attributes.

131
07:13.620 --> 07:17.493
Now I'm going to search for it by this name, which is "q".

132
07:19.410 --> 07:23.790
So all I need to do here is call the find_element method().

133
07:23.790 --> 07:26.220
And for our By helper class,

134
07:26.220 --> 07:29.310
I specify that I want to search by NAME,

135
07:29.310 --> 07:31.560
so the name of the element

136
07:31.560 --> 07:35.943
and then I provide the name of the search bar, which is "q".

137
07:38.160 --> 07:41.010
Now of course, I'll have to change this URL

138
07:41.010 --> 07:43.743
to actually python.org.

139
07:45.060 --> 07:49.320
And now, once we've found our element by name,

140
07:49.320 --> 07:51.273
we'll call it the search_bar.

141
07:54.240 --> 07:58.053
And then we'll go ahead and print out this search_bar.

142
08:00.600 --> 08:02.010
And then we'll comment out our code

143
08:02.010 --> 08:04.653
that we used for Amazon and then hit Run.

144
08:06.330 --> 08:09.270
When Selenium locates a particular element,

145
08:09.270 --> 08:12.660
it won't actually print out the actual HTML,

146
08:12.660 --> 08:17.370
it'll give it to you as a Selenium element like this.

147
08:17.370 --> 08:18.420
Now if you want to tap

148
08:18.420 --> 08:23.420
into its various attributes or text or tag name,

149
08:23.520 --> 08:26.790
then you'll actually have to do that using a dot.

150
08:26.790 --> 08:29.310
So we can say search_bar.tag_name

151
08:29.310 --> 08:34.310
and it will give us that that is an input tag.

152
08:34.350 --> 08:36.520
Or we could say search_bar.get_attribute

153
08:37.830 --> 08:41.790
and we can specify the attribute that we want the value for.

154
08:41.790 --> 08:42.690
So for example,

155
08:42.690 --> 08:45.630
if I wanted to know what the placeholder value was,

156
08:45.630 --> 08:49.320
then all I have to do is put get_attribute("placeholder")

157
08:49.320 --> 08:51.240
and then we can print that out

158
08:51.240 --> 08:55.110
and it will tell us that the placeholder says, Search.

159
08:55.110 --> 08:57.990
So that is something that you'll find people commonly use

160
08:57.990 --> 09:00.450
with Selenium, to find element by name.

161
09:00.450 --> 09:04.290
And it's really useful when it comes to filling in web forms

162
09:04.290 --> 09:08.010
because most forms will have elements that are organized

163
09:08.010 --> 09:10.920
by name because when the form is submitted,

164
09:10.920 --> 09:15.423
that name is carried along with the value of the inputs.

165
09:16.800 --> 09:19.980
In addition to find element by name and class name,

166
09:19.980 --> 09:22.620
you can of course find an element by id.

167
09:22.620 --> 09:24.630
So for example, the submit button

168
09:24.630 --> 09:28.421
next to the search bar has the id of submit.

169
09:28.421 --> 09:31.500
And once again, we need to use the find element method

170
09:31.500 --> 09:32.850
but this time, we specify

171
09:32.850 --> 09:35.913
that we want to find the element by ID.

172
09:36.810 --> 09:40.653
And we can add in the value of the ID, which is submit.

173
09:41.970 --> 09:43.530
So the code looks like this,

174
09:43.530 --> 09:47.880
button = driver.find_element(By.id...

175
09:47.880 --> 09:52.560
and then we insert the value of that ID, which is "submit".

176
09:52.560 --> 09:54.960
And similar to before, once we've found the element,

177
09:54.960 --> 09:57.090
we can get hold of various properties.

178
09:57.090 --> 10:00.360
For example, we can even print out the size of the element.

179
10:00.360 --> 10:01.530
Let's see what it is.

180
10:01.530 --> 10:05.010
Let's go ahead and print, button, that we found, that element,

181
10:05.010 --> 10:08.280
and then we use .size to see what's the size.

182
10:08.280 --> 10:09.870
And if we print it out, we can see

183
10:09.870 --> 10:13.950
that it has a height of 40 and a width of 46.

184
10:13.950 --> 10:16.740
So Selenium is really powerful

185
10:16.740 --> 10:20.940
and the ability to use their helper methods to find element

186
10:20.940 --> 10:25.200
by ID, name, or class name, allows us to pretty much reach

187
10:25.200 --> 10:29.130
into any website and get hold of any element that we want.

188
10:29.130 --> 10:30.990
And then by using these properties

189
10:30.990 --> 10:34.770
such as .size or .text, we can access whichever parts

190
10:34.770 --> 10:36.543
of the element we're interested in.

191
10:38.670 --> 10:41.160
So you can see that we can do a lot more

192
10:41.160 --> 10:44.043
with Selenium than we could with Beautiful Soup.

193
10:45.600 --> 10:50.013
So that's find_element by ID, by name, by class name,

194
10:50.910 --> 10:55.910
but there's also of course, find_element by CSS selector.

195
10:56.190 --> 10:58.620
And this is probably one of the easiest ways

196
10:58.620 --> 11:01.353
of narrowing down on a particular element.

197
11:02.520 --> 11:04.050
Let's say that I wanted to get hold

198
11:04.050 --> 11:06.390
of this link to the Docs.

199
11:06.390 --> 11:07.950
Let's go ahead and inspect it

200
11:07.950 --> 11:10.410
and it lives inside an anchor tag.

201
11:10.410 --> 11:12.900
But you can see that this anchor tag has no class,

202
11:12.900 --> 11:17.370
it has no ID, has no easy way of identification.

203
11:17.370 --> 11:19.200
So how do we get hold of it?

204
11:19.200 --> 11:22.860
Well, we can see that it lives in a paragraph element.

205
11:22.860 --> 11:24.090
And then going further up,

206
11:24.090 --> 11:27.930
it's inside a div with this particular class.

207
11:27.930 --> 11:31.680
So this has a class of "documentation-widget".

208
11:31.680 --> 11:33.270
So that's pretty specific.

209
11:33.270 --> 11:34.650
So if we wanted to get hold

210
11:34.650 --> 11:36.900
of that anchor tag, we could just say,

211
11:36.900 --> 11:41.010
well, inside this div of "documentation-widget",

212
11:41.010 --> 11:43.593
let's go ahead and find any anchor tags.

213
11:44.460 --> 11:47.280
And to express that as a CSS selector

214
11:47.280 --> 11:50.460
we would specify the class, documentation-widget.

215
11:50.460 --> 11:52.830
And then inside the element with that class,

216
11:52.830 --> 11:55.110
we're looking for an anchor tag.

217
11:55.110 --> 11:59.013
So this would get us our documentation_link.

218
12:00.510 --> 12:01.890
And now if I go ahead

219
12:01.890 --> 12:06.750
and print the documentation_link .text, you can see

220
12:06.750 --> 12:11.750
that it's actually getting hold of this particular link.

221
12:12.480 --> 12:14.760
And this is done even though the actual link

222
12:14.760 --> 12:19.760
doesn't have any easily identifiable name or class or ID.

223
12:21.450 --> 12:24.870
Now, on a similar vein, sometimes it's extremely

224
12:24.870 --> 12:28.980
hard to even find an element even by CSS selector.

225
12:28.980 --> 12:30.810
So if all else fails,

226
12:30.810 --> 12:34.410
one that will always work is the XPath.

227
12:34.410 --> 12:39.410
The XPath is a way of locating a specific HTML element

228
12:40.080 --> 12:42.420
by a path structure.

229
12:42.420 --> 12:45.210
So you've seen what paths look like, right?

230
12:45.210 --> 12:47.966
Users/angela/development, etc.

231
12:47.966 --> 12:51.480
Well, we can also express the navigation

232
12:51.480 --> 12:55.380
to a particular element, drilling down from the top

233
12:55.380 --> 12:59.253
of the tree to a particular node, using the XPath.

234
13:00.240 --> 13:01.620
Let's say I want to get hold

235
13:01.620 --> 13:04.500
of this link at the bottom, Submit Website Bug.

236
13:04.500 --> 13:06.900
So let's go ahead and inspect on it.

237
13:06.900 --> 13:10.230
And you can see it's not particularly unique

238
13:10.230 --> 13:14.490
in terms of having an ID or a class or a name,

239
13:14.490 --> 13:17.010
and even in terms of its structure.

240
13:17.010 --> 13:20.133
It's sort of in this ul and then in li.

241
13:20.970 --> 13:23.970
And it might not be that easy to get to it.

242
13:23.970 --> 13:27.150
But here's where our XPath comes in handy.

243
13:27.150 --> 13:28.448
Let's go ahead and right-click on it

244
13:28.448 --> 13:31.533
and then go to Copy, Copy XPath.

245
13:32.520 --> 13:34.170
And now, we can go back

246
13:34.170 --> 13:38.520
to our code and use it to tap into that element.

247
13:38.520 --> 13:40.020
So we'll say driver

248
13:40.020 --> 13:42.757
and then we'll say find_element(By.XPATH...

249
13:43.710 --> 13:46.653
And then inside here, we're going to paste our XPath.

250
13:49.140 --> 13:51.570
Now, notice that the XPath itself

251
13:51.570 --> 13:53.910
actually has some double quotes.

252
13:53.910 --> 13:57.780
So we have to change our double quotes into single quotes

253
13:57.780 --> 14:01.800
so that it doesn't clash with the inner double quotes.

254
14:01.800 --> 14:03.240
So like this.

255
14:03.240 --> 14:06.270
Now, once we found our bug_link,

256
14:06.270 --> 14:09.843
then we're going to print out the link,

257
14:10.770 --> 14:14.073
and I'm going to get hold of the href here.

258
14:15.030 --> 14:18.540
So now I'm going to print my bug_link.text

259
14:18.540 --> 14:20.610
just to prove that this worked.

260
14:20.610 --> 14:23.430
And if I hit Run, you can see it opens up

261
14:23.430 --> 14:26.730
our browser and you can see it's managed to get hold

262
14:26.730 --> 14:31.650
of that particular link using this very specific Xpath,

263
14:31.650 --> 14:33.360
which looks very much

264
14:33.360 --> 14:36.510
like a file path that we've been using.

265
14:36.510 --> 14:40.440
But it's essentially, the way that we would navigate down

266
14:40.440 --> 14:43.080
through the divs, the uls, the lis,

267
14:43.080 --> 14:46.113
all the way down to that particular anchor tag.

268
14:47.190 --> 14:49.500
And the best part of all is of course,

269
14:49.500 --> 14:54.500
you can use this button here in Chrome to locate any element,

270
14:54.630 --> 14:57.810
and then that would select its location in the code,

271
14:57.810 --> 14:59.250
and then you can right-click

272
14:59.250 --> 15:02.100
and then copy its XPath to get hold

273
15:02.100 --> 15:04.053
of that particular thing.

274
15:05.430 --> 15:06.300
Now I'm going to link

275
15:06.300 --> 15:10.920
to the documentation on XPath from W3 schools.

276
15:10.920 --> 15:13.080
It goes into a little bit more detail

277
15:13.080 --> 15:15.870
about how XPaths are constructed

278
15:15.870 --> 15:19.050
but you don't need to know this in order to work with them.

279
15:19.050 --> 15:22.770
As I've shown, you can simply use the Chrome Developer Tools,

280
15:22.770 --> 15:24.090
and it will help you get

281
15:24.090 --> 15:27.063
to the particular item you're interested in.

282
15:28.860 --> 15:33.860
Now, we've been looking at finding elements using Selenium

283
15:33.960 --> 15:35.910
but notice how we've always stuck

284
15:35.910 --> 15:40.910
to the singular version of that method, find_element().

285
15:40.920 --> 15:43.702
Now, for every single method that I've shown you,

286
15:43.702 --> 15:48.702
there's also a counterpart, which is the find_elements().

287
15:48.810 --> 15:52.770
So for example, find_elements(By.CSS_SELECTOR).

288
15:52.770 --> 15:55.500
And of course, this is just going to find you everything

289
15:55.500 --> 15:58.020
on the screen that matches your criteria

290
15:58.020 --> 16:00.000
and give it to you in a list.

291
16:00.000 --> 16:01.966
And there is a version of this

292
16:01.966 --> 16:06.960
for every single find method that we've shown so far.

293
16:06.960 --> 16:10.470
In the next lesson, I've got a quick challenge for you

294
16:10.470 --> 16:12.870
to put into practice what you've learned

295
16:12.870 --> 16:17.340
and to be able to locate any given element on a webpage.

296
16:17.340 --> 16:19.380
So for all of that and more,

297
16:19.380 --> 16:20.980
I'll see you on the next lesson.