WEBVTT

00:01.650 --> 00:03.090
-: Okay, I'm gonna run you

00:03.090 --> 00:07.260
through how to do analysis of a publication.

00:07.260 --> 00:11.610
The publication I'm looking at today is Lenny's blog.

00:11.610 --> 00:12.443
As you can see,

00:12.443 --> 00:14.970
here are the top posts on Lenny's blog

00:14.970 --> 00:16.890
and I wanted to scrape all of these

00:16.890 --> 00:19.860
and basically summarize what are the sorts of things

00:19.860 --> 00:21.840
that this blog is talking about.

00:21.840 --> 00:23.520
What are some insights I can get?

00:23.520 --> 00:27.030
And if I was pitching this blog or similar publication,

00:27.030 --> 00:28.770
what sort of titles

00:28.770 --> 00:31.200
should I pitch them that would be interesting?

00:31.200 --> 00:33.540
Or potentially what templates are working well for them

00:33.540 --> 00:36.120
that I can use for my own blog?

00:36.120 --> 00:39.993
Okay, so let me show you how this works.

00:41.490 --> 00:44.010
First, I'm gonna install Anthropic,

00:44.010 --> 00:45.660
so I have that running

00:45.660 --> 00:50.070
and going to load the environment,

00:50.070 --> 00:52.950
and that's where I have my OpenAI API key.

00:52.950 --> 00:54.570
I'm just gonna do that in a second.

00:54.570 --> 00:58.080
I'm just gonna pull in here the blog posts.

00:58.080 --> 01:01.410
So this is what you need to get from scraping

01:01.410 --> 01:03.480
and this isn't the scraping tutorial.

01:03.480 --> 01:05.400
This is more on the analysis

01:05.400 --> 01:06.390
and to be honest,

01:06.390 --> 01:08.490
I don't really know that much about web scraping.

01:08.490 --> 01:11.010
I literally just asked ChatGPT

01:11.010 --> 01:13.630
and this is the code that it gave me

01:15.420 --> 01:17.100
to scrape the blog.

01:17.100 --> 01:18.810
And if we go,

01:18.810 --> 01:23.730
let's see, if we pop open this blog again,

01:23.730 --> 01:25.480
let's see if this code still works.

01:26.790 --> 01:28.473
I'm just gonna paste into,

01:30.720 --> 01:33.010
here we go, data copied to clipboard

01:34.020 --> 01:37.420
and basically what this has done

01:39.510 --> 01:43.740
is pasted all the different blog posts and titles.

01:43.740 --> 01:48.573
And what I've then done is put that into this CSV,

01:50.070 --> 01:51.030
paste it into here,

01:51.030 --> 01:52.350
and it's got all the titles,

01:52.350 --> 01:53.640
it's got when it was published,

01:53.640 --> 01:55.020
what the author is, et cetera.

01:55.020 --> 01:56.970
That is what I did for scraping.

01:56.970 --> 02:00.000
I literally just asked ChatGPT to write that code for me,

02:00.000 --> 02:03.060
so I can't teach you that much about web scraping.

02:03.060 --> 02:05.790
But once you have all the posts in here,

02:05.790 --> 02:08.700
then you can download them

02:08.700 --> 02:10.413
and I've done that down here.

02:12.930 --> 02:17.930
So you can see it downloaded all the texts from each post.

02:22.020 --> 02:25.290
Yeah, and the way I've done that just to show you

02:25.290 --> 02:30.290
is if you load the links

02:30.540 --> 02:32.880
and then you get a list of the URLs,

02:32.880 --> 02:35.580
and this is going, just to show you,

02:35.580 --> 02:37.060
it's going through each URL

02:37.920 --> 02:39.810
and it's getting the link

02:39.810 --> 02:41.580
which is the second part,

02:41.580 --> 02:44.160
and then it's finding the file name,

02:44.160 --> 02:49.160
and then it's just reading the text using Beautiful Soup

02:49.860 --> 02:52.620
and then just saving that text locally.

02:52.620 --> 02:54.750
So that's how I downloaded everything

02:54.750 --> 02:56.460
and I just sleep for three seconds here.

02:56.460 --> 02:57.780
It depends on the website.

02:57.780 --> 02:59.220
Again, I'm not a web scrap pro.

02:59.220 --> 03:00.870
I'm not particularly good at this.

03:00.870 --> 03:03.060
This isn't a web scraping tutorial.

03:03.060 --> 03:06.840
Yeah, basically this was also written by ChatGPT,

03:06.840 --> 03:08.580
but if I run into an error,

03:08.580 --> 03:10.257
I just ask ChatGPT

03:10.257 --> 03:11.340
and it will fix it.

03:11.340 --> 03:13.260
So if that's how that works,

03:13.260 --> 03:15.750
so that will download the different URLs.

03:15.750 --> 03:17.640
You'd have them all the text in here.

03:17.640 --> 03:19.800
So that's a prerequisite.

03:19.800 --> 03:21.550
I'm just going to interrupt that

03:22.890 --> 03:24.270
'cause I've already done that part

03:24.270 --> 03:27.300
and it took I think a maybe about an hour

03:27.300 --> 03:29.160
to download this stuff.

03:29.160 --> 03:31.320
All right, so now onto the actual tutorial though

03:31.320 --> 03:36.180
which is how do we summarize this text?

03:36.180 --> 03:39.423
So just gonna load Anthropic first.

03:41.070 --> 03:42.360
Make sure this is working.

03:42.360 --> 03:43.193
Yeah, here we go.

03:43.193 --> 03:45.330
So the way you call Anthropic,

03:45.330 --> 03:48.270
this is Clause Three is you set up the client,

03:48.270 --> 03:51.603
it gets the API key that you determined.

03:53.070 --> 03:55.170
If I just close this,

03:55.170 --> 03:56.820
in your .env file.

03:56.820 --> 03:57.750
I won't show you mine,

03:57.750 --> 04:01.560
but you just need to have something

04:01.560 --> 04:05.310
called Anthropic_API_KEY and then equals

04:05.310 --> 04:07.890
and then the actual key that you get from Anthropic.

04:07.890 --> 04:09.270
But assuming you have that,

04:09.270 --> 04:10.770
then it will load Anthropic

04:10.770 --> 04:14.430
and then you can get, it's very similar to OpenAI API.

04:14.430 --> 04:16.500
You can just specify the model

04:16.500 --> 04:17.640
and then you send the messages.

04:17.640 --> 04:18.600
And I just said, "Hello,"

04:18.600 --> 04:20.520
and it said, "Hello, how can I assist you?"

04:20.520 --> 04:22.170
That's how I know it's working.

04:22.170 --> 04:25.830
Now what I need to do to analyze this

04:25.830 --> 04:27.420
is to just loop through

04:27.420 --> 04:31.920
and pull out different texts from this folder

04:31.920 --> 04:35.460
and then place that into the prompt

04:35.460 --> 04:36.930
and then here, I'm just asking you

04:36.930 --> 04:39.510
to pull out one non-obvious insight from this text.

04:39.510 --> 04:41.520
So it's basically reading the text

04:41.520 --> 04:44.130
and then coming back to me with something interesting.

04:44.130 --> 04:46.260
So let's see how this works.

04:46.260 --> 04:50.010
Okay, so that took 46 seconds for five URLs.

04:50.010 --> 04:51.150
Obviously we could do it for everything,

04:51.150 --> 04:53.280
but here we go.

04:53.280 --> 04:55.260
From this blog post,

04:55.260 --> 04:57.180
it said that borrowing successful features

04:57.180 --> 04:58.410
from other products requires

04:58.410 --> 05:00.690
carefully adapting them to your specific context

05:00.690 --> 05:02.820
and it has a quote here which is pretty cool.

05:02.820 --> 05:04.500
This one, "How the biggest consumer apps

05:04.500 --> 05:06.480
got the first 1,000 users,"

05:06.480 --> 05:09.300
they found the early users from just a single strategy.

05:09.300 --> 05:11.610
That's pretty surprising, right?

05:11.610 --> 05:15.000
Another one here is that ChatGPT

05:15.000 --> 05:16.560
is good for non-engineering teams,

05:16.560 --> 05:18.030
not just engineering teams.

05:18.030 --> 05:19.800
So summarizing customer interviews,

05:19.800 --> 05:21.930
identifying regarding pain points,

05:21.930 --> 05:24.420
finding compelling customer testimonials,

05:24.420 --> 05:26.220
generating action items.

05:26.220 --> 05:27.870
I think that's pretty interesting,

05:27.870 --> 05:29.430
but this is a great way to decide

05:29.430 --> 05:31.590
whether you wanna read the post in the first place

05:31.590 --> 05:33.300
and also to start to notice patterns

05:33.300 --> 05:34.560
about what types of things

05:34.560 --> 05:36.810
are being written about on this blog.

05:36.810 --> 05:38.550
That's quite cool, I find that quite useful.

05:38.550 --> 05:40.950
You could also change the prompt to say whatever you want.

05:40.950 --> 05:42.900
So you could ask it to give you a summary

05:42.900 --> 05:46.650
or to highlight what the quote is about

05:46.650 --> 05:48.090
or whatever it is.

05:48.090 --> 05:49.350
Pull out bullet points

05:49.350 --> 05:51.030
or whatever you would like to do.

05:51.030 --> 05:54.120
But where it gets really interesting is the Claude

05:54.120 --> 05:58.020
has a 200,000 token limit in terms of the context windows.

05:58.020 --> 05:58.853
So you can fit

05:58.853 --> 06:01.860
pretty much an entire Harry Potter book in there

06:01.860 --> 06:03.900
and therefore you can just pull

06:03.900 --> 06:06.240
a bunch of these texts together

06:06.240 --> 06:08.250
and start to look for patterns.

06:08.250 --> 06:11.160
What I did here is if you run this,

06:11.160 --> 06:13.020
it just pulls out the first 50 URLs

06:13.020 --> 06:17.610
and then it appends the text into tags here,

06:17.610 --> 06:21.390
and then it's just telling me how many words basically,

06:21.390 --> 06:22.680
how many tokens is an estimate.

06:22.680 --> 06:24.870
One token is roughly four characters,

06:24.870 --> 06:27.900
so just counted here.

06:27.900 --> 06:30.510
So I'm just under 200,000 which is great.

06:30.510 --> 06:32.283
If I look at that one,

06:35.220 --> 06:39.210
all texts, you can see that it starts with article

06:39.210 --> 06:41.490
and then it has all the texts.

06:41.490 --> 06:43.230
I could probably strip out some of this white space

06:43.230 --> 06:46.650
and then it divides the articles with a line like this.

06:46.650 --> 06:48.840
And the reason I'm using these tags like this,

06:48.840 --> 06:50.610
it's almost like HTML,

06:50.610 --> 06:55.320
is that Claude was trained on tags

06:55.320 --> 06:57.060
and responds really well for them.

06:57.060 --> 07:00.000
So that's just a specific Claude thing

07:00.000 --> 07:02.160
that I recommend you do.

07:02.160 --> 07:03.210
You just get better results

07:03.210 --> 07:05.640
'cause it delineates the specific things

07:05.640 --> 07:06.480
that you're talking about.

07:06.480 --> 07:08.520
In this case, the articles.

07:08.520 --> 07:10.050
So now what we're gonna do

07:10.050 --> 07:13.080
is we've got all of our text here,

07:13.080 --> 07:15.120
so we're reading all text

07:15.120 --> 07:16.770
and then we're sending to Claude Opus.

07:16.770 --> 07:18.300
We're saying summarize these articles,

07:18.300 --> 07:20.670
analyze the word choice, identify any patterns,

07:20.670 --> 07:23.970
and tell me about any non-obvious insights publications.

07:23.970 --> 07:26.370
So now we're passing in all of those articles,

07:26.370 --> 07:29.130
50 articles, the full text of each,

07:29.130 --> 07:31.440
all in one prompt which is pretty cool

07:31.440 --> 07:34.080
and we'll just see what it comes back with.

07:34.080 --> 07:35.250
Okay, we've got the response.

07:35.250 --> 07:36.570
It took about two minutes

07:36.570 --> 07:38.760
because this is a smart model

07:38.760 --> 07:40.800
and therefore it's relatively slower,

07:40.800 --> 07:43.290
but also we're passing it a ton of text, right?

07:43.290 --> 07:45.000
So the longer the prompt,

07:45.000 --> 07:48.660
the longer the time it takes to get back to you.

07:48.660 --> 07:50.040
So analyze these articles.

07:50.040 --> 07:52.350
It did hallucinate, it thinks there's 40 articles,

07:52.350 --> 07:53.730
but there are actually 50.

07:53.730 --> 07:55.230
It's not very good at accounting,

07:55.230 --> 07:58.560
and it summarizes, so it says they cover

07:58.560 --> 08:00.360
a wide range of topics related to product management,

08:00.360 --> 08:02.490
growth, career development, entrepreneurship

08:02.490 --> 08:04.140
and then it gives some common themes,

08:04.140 --> 08:06.990
tactical advice and frameworks, inside stories.

08:06.990 --> 08:08.370
Interviews with experienced leaders,

08:08.370 --> 08:09.780
guidance for early-stage startups,

08:09.780 --> 08:11.550
and personal stories of career journeys.

08:11.550 --> 08:15.630
So already I know if I wanted to do a guest post for Lenny

08:15.630 --> 08:17.310
and I was coming up with new ideas,

08:17.310 --> 08:18.750
I could take a look here.

08:18.750 --> 08:21.630
The content's highly practical, example-driven,

08:21.630 --> 08:23.640
and grounded in real-world experience

08:23.640 --> 08:25.410
so that's important to know.

08:25.410 --> 08:27.660
They go deep on specific topics.

08:27.660 --> 08:29.400
The word style is conversational, direct,

08:29.400 --> 08:32.190
and easy to follow whilst covering complex topics.

08:32.190 --> 08:34.950
There's also frequent characteristics of the writing,

08:34.950 --> 08:38.370
so this is gonna help you get past the editing process

08:38.370 --> 08:40.620
if you are writing a post for this publication.

08:40.620 --> 08:43.200
But again, this could be used for any publication

08:43.200 --> 08:45.090
or even to analyze your own blog

08:45.090 --> 08:47.250
to understand what's interesting about it.

08:47.250 --> 08:48.810
Frequent use the second person "you"

08:48.810 --> 08:50.520
to speak directly to the reader.

08:50.520 --> 08:52.470
A liberal use of bolded numbered lists,

08:52.470 --> 08:53.760
bolding of key phrases,

08:53.760 --> 08:56.520
relatively simple vocabulary, et cetera.

08:56.520 --> 08:58.560
And then the writing is more casual

08:58.560 --> 09:02.220
and bloggy in style compared to traditional business writing

09:02.220 --> 09:05.790
and the there's also more information here.

09:05.790 --> 09:09.120
Kept going, it figured out the reading level.

09:09.120 --> 09:12.480
Some non-obvious patterns, his own writing accounts

09:12.480 --> 09:14.940
for less than half the articles, that's pretty interesting.

09:14.940 --> 09:17.670
And there many compendium-style posts like best-ofs.

09:17.670 --> 09:21.270
There is also failure stories as much as success stories,

09:21.270 --> 09:23.520
organic, product-led growth over a paid acquisition.

09:23.520 --> 09:26.220
Very little mention of ads, that's useful to know.

09:26.220 --> 09:29.010
The design is very rarely mentioned.

09:29.010 --> 09:30.360
That's actually surprising to me

09:30.360 --> 09:33.450
considering it's a product management-focused blog.

09:33.450 --> 09:35.070
Analytics and metrics are important

09:35.070 --> 09:38.400
and there's a bias towards VC-backed startups.

09:38.400 --> 09:40.380
So that's really interesting to know

09:40.380 --> 09:41.460
and maybe some of these things

09:41.460 --> 09:44.340
Lenny himself might not have realized.

09:44.340 --> 09:46.770
Well, now the other thing I wanted to do

09:46.770 --> 09:48.540
is just get all the titles

09:48.540 --> 09:51.270
and think about what a successful title might look like

09:51.270 --> 09:55.590
if I was pitching Lenny to join his blog as a guest speaker.

09:55.590 --> 09:57.960
This is a prompt I've used a bunch of times in the past.

09:57.960 --> 10:00.810
It's based on this thing called the Blank.

10:00.810 --> 10:03.420
This is from a Tim Ferriss interview with James Clear,

10:03.420 --> 10:05.460
the author of "Atomic Habits."

10:05.460 --> 10:07.470
He found when he looked

10:07.470 --> 10:12.470
at all the most successful titles of business books,

10:12.810 --> 10:15.127
that this was a successful template.

10:15.127 --> 10:16.447
"The War of Art,"

10:16.447 --> 10:17.677
"The Psychology of Money,"

10:17.677 --> 10:20.040
"The Subtle Art of Not Giving a --."

10:20.040 --> 10:21.450
Basically you take two things

10:21.450 --> 10:23.460
that are not normally associated

10:23.460 --> 10:24.870
and then put them together,

10:24.870 --> 10:26.497
the blank of blank.

10:26.497 --> 10:28.890
"Odd Topic" is the template he chose for his book,

10:28.890 --> 10:31.440
so habits are not normally atomic.

10:31.440 --> 10:34.140
This doesn't have the blank of blank.

10:34.140 --> 10:36.667
Instead it's just a odd combination.

10:36.667 --> 10:39.067
"Extreme Ownership," ownership's not normally extreme.

10:39.067 --> 10:41.220
"Deep Work," work is not usually deep.

10:41.220 --> 10:43.410
So that's what James Clear choose

10:43.410 --> 10:45.813
and then I just added this one as well.

10:46.770 --> 10:47.850
I was using this prompt

10:47.850 --> 10:49.920
to come up with the title of my O'Reilly book

10:49.920 --> 10:54.300
and this is an example of the type of O'Reilly books.

10:54.300 --> 10:56.160
So if you're not getting good success,

10:56.160 --> 10:57.570
by the way, with this template,

10:57.570 --> 11:01.980
you can just add multiple new templates into here

11:01.980 --> 11:04.680
that cover the types of things you wanna see

11:04.680 --> 11:06.150
and then it will do a much better job

11:06.150 --> 11:08.160
for that specific publication.

11:08.160 --> 11:09.150
All right, what we're trying to do,

11:09.150 --> 11:10.440
we're passing it all the titles

11:10.440 --> 11:14.850
and we just wanna see what types of templates come back

11:14.850 --> 11:17.400
in terms of what are the types of titles

11:17.400 --> 11:19.980
that appear on Lenny's blog.

11:19.980 --> 11:23.280
So it's gonna come up with its own templates like this

11:23.280 --> 11:24.930
if it works well.

11:24.930 --> 11:27.817
Okay, here we go, have some templates.

11:27.817 --> 11:29.130
"How X does Y"

11:29.130 --> 11:30.510
and giving some examples.

11:30.510 --> 11:32.190
How Duolingo reignited user growth,

11:32.190 --> 11:33.210
how Linear builds product,

11:33.210 --> 11:34.680
how Shopify builds product.

11:34.680 --> 11:36.480
That's pretty interesting to me.

11:36.480 --> 11:39.810
Tells me if I work for a successful company like Duolingo,

11:39.810 --> 11:41.503
I could do something similar to that.

11:41.503 --> 11:43.020
I just choose a specific topic

11:43.020 --> 11:45.667
and they say how this company does X.

11:45.667 --> 11:47.427
"How to X," how to use ChatGPT,

11:47.427 --> 11:49.170
how to validate your B@B startup idea,

11:49.170 --> 11:51.270
how to pass a first-round interview.

11:51.270 --> 11:52.860
So that's pretty interesting,

11:52.860 --> 11:56.280
and "The X of Y" or "The X Y."

11:56.280 --> 11:58.200
So if the first word is an adjective

11:58.200 --> 12:00.720
or noun describing an important aspect

12:00.720 --> 12:02.790
and then that's related to the main topic.

12:02.790 --> 12:05.070
So the 10 commandments of salary negotiation,

12:05.070 --> 12:06.840
the ultimate guide to willingness-to-pay,

12:06.840 --> 12:09.300
the unconventional Palantir principles.

12:09.300 --> 12:10.620
And then "X for Y."

12:10.620 --> 12:13.350
So good strategy, bad strategy, okay.

12:13.350 --> 12:15.330
There's some interesting templates in here

12:15.330 --> 12:16.980
and you'll get slightly different ones

12:16.980 --> 12:18.690
if you run it again and again,

12:18.690 --> 12:22.410
but this is a pretty useful prompt I found

12:22.410 --> 12:25.641
just to get really good templates out of titles

12:25.641 --> 12:27.000
and it gives you a good sense

12:27.000 --> 12:30.540
of what are the types of things you should start looking at

12:30.540 --> 12:33.600
over time in order to hitch this blog

12:33.600 --> 12:36.780
or maybe just learn from what works for this blog

12:36.780 --> 12:37.680
because if they're using

12:37.680 --> 12:39.390
the same sort of template again and again,

12:39.390 --> 12:41.370
it's because it's working.

12:41.370 --> 12:42.630
The editor thinks it's working.

12:42.630 --> 12:45.300
Either they'd know and they're doing it consciously

12:45.300 --> 12:48.000
and they're going out finding other examples

12:48.000 --> 12:49.577
or they don't know,

12:49.577 --> 12:52.260
and it just happens to be successful

12:52.260 --> 12:54.720
and it's something you can learn from as well.

12:54.720 --> 12:57.870
So if you're pitching a publication to guest posts

12:57.870 --> 12:59.010
or if you're just trying to understand

12:59.010 --> 13:01.050
what makes a publication successful,

13:01.050 --> 13:02.850
these combination of prompts

13:02.850 --> 13:06.093
combined with some web scraping is really helpful.
