WEBVTT

00:00.270 --> 00:01.440
-: Alright, so in this video,

00:01.440 --> 00:02.610
what we're gonna have a look at is,

00:02.610 --> 00:06.360
how you can analyze landing pages, your landing page,

00:06.360 --> 00:09.720
and competitor's landing pages using GPT Vision.

00:09.720 --> 00:11.880
We're gonna use something called pyppeteer

00:11.880 --> 00:15.270
which is a puppeteer port for Python,

00:15.270 --> 00:18.270
and this allows us to run puppeteer browsers

00:18.270 --> 00:21.930
and to execute some commands on some headless browsers.

00:21.930 --> 00:23.220
And so there's a couple of different types

00:23.220 --> 00:24.743
of installations that we're running in,

00:24.743 --> 00:27.330
there's pyppeteer stealth, pyppeteer,

00:27.330 --> 00:29.760
and we're also gonna use langchain as well.

00:29.760 --> 00:32.430
We are also gonna add in some user agents.

00:32.430 --> 00:35.820
Now, user agents are a way for us to potentially spoof

00:35.820 --> 00:37.560
to other websites and pretend

00:37.560 --> 00:39.780
that we're actually a real browser.

00:39.780 --> 00:41.280
And then what you're gonna need to do is,

00:41.280 --> 00:42.450
when you run this script,

00:42.450 --> 00:45.900
make sure that you put your domain in here

00:45.900 --> 00:47.310
'cause this is what's gonna be

00:47.310 --> 00:48.750
inside of the prompt later on.

00:48.750 --> 00:53.220
So I'm gonna be using the homepage for analysis,

00:53.220 --> 00:55.200
but the homepage of my website

00:55.200 --> 00:57.690
is still under this specific domain.

00:57.690 --> 01:00.120
Now, once you've changed that domain,

01:00.120 --> 01:01.530
we have this function here,

01:01.530 --> 01:05.070
which is called screenshot full page.

01:05.070 --> 01:07.350
And what it does is, it loads a headless browser,

01:07.350 --> 01:11.250
it creates a page and then if the device type is mobile,

01:11.250 --> 01:13.800
it emulates a different view port

01:13.800 --> 01:16.560
and sets the user agent for the mobile user agent

01:16.560 --> 01:18.090
and some other little bits.

01:18.090 --> 01:21.570
But if it's a desktop that's entered into this specific type

01:21.570 --> 01:24.840
of function as a parameter, then we set the view port

01:24.840 --> 01:27.810
to be slightly larger and we also emulate

01:27.810 --> 01:30.030
the view port should be that size

01:30.030 --> 01:32.550
and also updating the user agent here.

01:32.550 --> 01:34.980
And then we've also got a function which is meant

01:34.980 --> 01:38.370
to improve the stealthiness of your browsers

01:38.370 --> 01:40.530
and that comes from puppeteer stealth.

01:40.530 --> 01:44.070
And then what we've got is a bit that says we're gonna wait

01:44.070 --> 01:46.440
until the network is idle.

01:46.440 --> 01:49.800
And this means that if the browser is downloading resources

01:49.800 --> 01:53.520
after it's visited the page, then what's gonna happen is,

01:53.520 --> 01:55.350
you can specifically wait

01:55.350 --> 01:57.390
until all those resource is downloaded.

01:57.390 --> 01:59.190
And then what we do is, we scroll to the bottom

01:59.190 --> 02:02.250
to ensure that all the lazy loaded images are loaded.

02:02.250 --> 02:05.460
And then after that we wait around two seconds

02:05.460 --> 02:08.160
and then we scroll all the way back to the top.

02:08.160 --> 02:10.500
And then what we do is, we take a screenshot

02:10.500 --> 02:13.920
of the entire page by changing the view port

02:13.920 --> 02:17.550
so that the width is equal to the body.scroll width,

02:17.550 --> 02:20.310
and the height is equal to the the scroll height

02:20.310 --> 02:21.840
of the document of body.

02:21.840 --> 02:25.323
We take a single screenshot and we save that screenshot.

02:26.280 --> 02:29.340
And so now, what you do is, you provide your list of URLs

02:29.340 --> 02:31.170
and you can see here what I'm gonna be analyzing

02:31.170 --> 02:34.110
is the homepage and then I'm gonna be comparing

02:34.110 --> 02:36.630
it against these two data engineering pages.

02:36.630 --> 02:40.320
Now, if your domain wasn't inside of this first one,

02:40.320 --> 02:43.500
so you can see my domain is, understandingdata.com,

02:43.500 --> 02:45.600
then it'll raise an error because you do need to update

02:45.600 --> 02:49.320
that domain that's at the top of this script.

02:49.320 --> 02:51.180
And then we get some clean names

02:51.180 --> 02:54.960
and then we basically say, let's create a screenshot.

02:54.960 --> 02:58.110
And then we set up an asynchronous function

02:58.110 --> 03:01.350
which allows us to loop over every URL

03:01.350 --> 03:04.620
for both desktop and mobile devices

03:04.620 --> 03:07.110
and then you'll see we get this kind of output here

03:07.110 --> 03:09.960
where it says, taking a screenshot of understanding data

03:09.960 --> 03:13.380
and we've also got this other one, dufrain.co.uk

03:13.380 --> 03:15.300
and fdmgroup.com.

03:15.300 --> 03:18.150
And then what we do is, we then decide,

03:18.150 --> 03:19.950
okay, we've got all these individual screenshots

03:19.950 --> 03:22.470
and I can show you a couple of examples of those.

03:22.470 --> 03:25.860
This is the desktop, this is the mobile, this is the desktop

03:25.860 --> 03:27.870
for dufrain, this is the mobile for dufrain,

03:27.870 --> 03:29.850
it's quite large as you can see.

03:29.850 --> 03:33.120
And then we've also got the desktop for fdmgroup

03:33.120 --> 03:35.010
and the mobile for fdmgroup.

03:35.010 --> 03:35.843
So they're pretty good.

03:35.843 --> 03:37.290
There's a couple of different edge cases

03:37.290 --> 03:39.120
that we'll wanna talk about in terms

03:39.120 --> 03:42.690
of how you'd wanna improve the script in the future.

03:42.690 --> 03:45.870
But for now, we'll say that these images are good enough.

03:45.870 --> 03:47.460
And so what we then do is,

03:47.460 --> 03:50.100
we need to convert these images to Base64,

03:50.100 --> 03:52.260
and we've got all the images in this folder.

03:52.260 --> 03:53.910
They've been written out to this folder.

03:53.910 --> 03:55.620
And so we're gonna use glob,

03:55.620 --> 03:57.780
which is gonna allow us on the left-hand side,

03:57.780 --> 04:01.650
to go and access all these individual PNG files.

04:01.650 --> 04:03.540
And so, we basically go through 'em,

04:03.540 --> 04:05.940
we'll go and get all the the screenshot files

04:05.940 --> 04:09.810
and we'll encode those into Base64 encoded images.

04:09.810 --> 04:10.980
And then after that,

04:10.980 --> 04:15.090
then we set up a langchain pydantic model.

04:15.090 --> 04:17.040
And the idea behind this is that we want

04:17.040 --> 04:19.950
to not only get the vision model to respond to us,

04:19.950 --> 04:21.540
but we want the vision model to respond

04:21.540 --> 04:23.730
to us in a standardized way

04:23.730 --> 04:25.770
that we could include in a application.

04:25.770 --> 04:28.170
So this could be the backend of the application

04:28.170 --> 04:31.950
and we could provide a standardized output by json.

04:31.950 --> 04:33.650
And so what we've got is a feedback aspect.

04:33.650 --> 04:35.520
So what the aspect is the description

04:35.520 --> 04:37.140
and the recommendations.

04:37.140 --> 04:39.360
And then you've got a website URL,

04:39.360 --> 04:42.480
you've got them some strengths, some areas for improvement,

04:42.480 --> 04:44.880
which are a list of feedback aspects,

04:44.880 --> 04:46.350
and then we've got some general feedback.

04:46.350 --> 04:48.480
There's optional and also some additional comments

04:48.480 --> 04:49.680
as optional too.

04:49.680 --> 04:51.120
And so what happens is,

04:51.120 --> 04:54.570
we then set up a interesting approach where we say,

04:54.570 --> 04:55.595
act as a marketing user researcher.

04:55.595 --> 05:00.030
You'll receive a set of screenshots for my website

05:00.030 --> 05:02.610
and there'll be some different websites.

05:02.610 --> 05:05.190
And then please provide brief analysis of the screenshots

05:05.190 --> 05:08.130
and we must provide a json schema.

05:08.130 --> 05:09.870
And this is this output parser

05:09.870 --> 05:11.880
with the get format instructions allows us

05:11.880 --> 05:15.450
to convert the vision model's response

05:15.450 --> 05:16.860
to a standardized output.

05:16.860 --> 05:20.160
Now, the vision model doesn't currently support json mode

05:20.160 --> 05:21.780
and it doesn't support function calling.

05:21.780 --> 05:22.860
So this is the only way

05:22.860 --> 05:24.690
that we can actually get json data out

05:24.690 --> 05:26.310
of GPT Vision at the moment.

05:26.310 --> 05:27.330
But you'll see what we do is,

05:27.330 --> 05:31.080
we also then say a message saying, here's my website pages

05:31.080 --> 05:33.630
and then we pretend to be the AI

05:33.630 --> 05:36.630
and say thanks for providing your webpage in both desktop

05:36.630 --> 05:37.950
and mobile versions.

05:37.950 --> 05:39.630
Before analyzing them on each research

05:39.630 --> 05:41.610
to different websites to understand the competition,

05:41.610 --> 05:43.080
can you please provide some information

05:43.080 --> 05:44.370
on the different websites?

05:44.370 --> 05:46.770
And then we provide it all of the ones

05:46.770 --> 05:50.700
that aren't our domain in that specific screenshot,

05:50.700 --> 05:52.860
Base64's dictionary.

05:52.860 --> 05:55.200
And so then we get a response from the vision model.

05:55.200 --> 05:57.990
And you can see the vision model gives us a range

05:57.990 --> 06:00.450
of strengths and some areas for improvement.

06:00.450 --> 06:02.970
And so that's basically what we've got at the moment.

06:02.970 --> 06:04.530
You've got this chat model

06:04.530 --> 06:06.360
and that takes the prompt into the model,

06:06.360 --> 06:10.320
into the output passer, we invoke the chain

06:10.320 --> 06:11.790
And then once we get that back,

06:11.790 --> 06:13.560
then you've got this pydantic model.

06:13.560 --> 06:15.270
Now, I think just to talk

06:15.270 --> 06:16.590
on a couple of different edge cases

06:16.590 --> 06:18.660
that have propped up whilst making this,

06:18.660 --> 06:22.380
I think the next steps would be, can we automatically click

06:22.380 --> 06:24.600
on cookie banners and can we also click

06:24.600 --> 06:26.880
on any pop-ups, banner ads?

06:26.880 --> 06:29.400
And can we also create an XY coordinate grid

06:29.400 --> 06:34.110
that will allow us to completely control the vision model

06:34.110 --> 06:36.030
because that's also gonna be important

06:36.030 --> 06:37.620
from, like these two ones,

06:37.620 --> 06:38.820
what we're really trying to do is,

06:38.820 --> 06:42.750
figure out can we reliably generate XY coordinates

06:42.750 --> 06:44.340
to automate actions?

06:44.340 --> 06:47.100
But the other way we could do cookies is by,

06:47.100 --> 06:50.610
the pop-ups is by specifically looking at all the HTML data

06:50.610 --> 06:52.590
and getting rid of any JavaScript

06:52.590 --> 06:54.570
and just looking at the pure DOM

06:54.570 --> 06:58.140
and then asking ChatGPT, can you see any DOM elements

06:58.140 --> 07:01.320
that look worthy of being clicked on pop-up elements?

07:01.320 --> 07:04.800
But I suspect that an XY coordinate grid will allow us

07:04.800 --> 07:07.050
to completely control the vision model

07:07.050 --> 07:09.210
and drive that to be a lot more autonomous

07:09.210 --> 07:11.400
in terms of interacting with the website

07:11.400 --> 07:14.550
and also clicking away banners before it takes screenshots.

07:14.550 --> 07:16.380
So hopefully, you found this useful,

07:16.380 --> 07:17.700
yet there's a lot to go through.

07:17.700 --> 07:21.420
But the main premise is, we are getting lots

07:21.420 --> 07:24.840
of images asynchronously from using a browser

07:24.840 --> 07:26.370
on your local machine.

07:26.370 --> 07:28.470
And then after that, then what you can do is,

07:28.470 --> 07:29.700
you can use all those images

07:29.700 --> 07:33.363
to perform a competitor analysis on multiple landing pages.
