WEBVTT

00:00.690 --> 00:03.150
-: Okay, say you're designing a coffee shop,

00:03.150 --> 00:06.630
and you want to figure out what are the typical interiors

00:06.630 --> 00:07.800
for your competitors.

00:07.800 --> 00:09.120
So, you wanna understand what type

00:09.120 --> 00:12.030
of furniture they're using, whether they have plants,

00:12.030 --> 00:14.670
or whether there's a chalkboard on the wall

00:14.670 --> 00:16.590
for the menu, like this,

00:16.590 --> 00:18.270
and you want to do that analysis

00:18.270 --> 00:21.930
and see how often certain things come up versus not.

00:21.930 --> 00:24.930
So, one interesting way to do this

00:24.930 --> 00:28.350
is instead of manually looking through a bunch of images,

00:28.350 --> 00:31.260
you could tag them automatically with AI,

00:31.260 --> 00:35.070
because GPT Vision is actually pretty good at this now.

00:35.070 --> 00:38.880
What you need is you have to scrape Google Images,

00:38.880 --> 00:40.710
which is actually the hard part, (chuckles)

00:40.710 --> 00:43.370
but thankfully, there's a tool that does this for you,

00:43.370 --> 00:46.290
so if you just search for Google-Image-Scraper,

00:46.290 --> 00:49.590
yeah, literally just search for Google-Image-Scraper,

00:49.590 --> 00:51.780
and that was the first result,

00:51.780 --> 00:54.660
but however you get the data, it doesn't really matter,

00:54.660 --> 00:56.400
but I assure you it works.

00:56.400 --> 01:00.990
I just, this is in the main.py,

01:00.990 --> 01:04.380
so, when you download this to your computer from GitHub,

01:04.380 --> 01:06.780
then you have this code.

01:06.780 --> 01:10.260
You just put in here what are the different keywords

01:10.260 --> 01:11.130
that you wanna search for?

01:11.130 --> 01:13.500
I downloaded a bunch, and you can see a bunch

01:13.500 --> 01:18.500
of coffee shops in Beijing, coffee shops in Rome,

01:18.540 --> 01:20.700
and then also New York.

01:20.700 --> 01:22.650
We're gonna compare them, potentially.

01:22.650 --> 01:26.430
So, I'm gonna just run an abbreviated part of this analysis.

01:26.430 --> 01:29.550
I would say, by the way, it takes a while to download them,

01:29.550 --> 01:32.850
so, here, I downloaded a bunch, and it doesn't always work.

01:32.850 --> 01:37.260
So, you can see, you get 18 every time, 36 in Sydney,

01:37.260 --> 01:40.680
but once you literally just run Python main.py,

01:40.680 --> 01:43.680
you should see these images start appearing,

01:43.680 --> 01:45.210
but that's not the part I'm gonna teach you,

01:45.210 --> 01:47.850
because that is someone else's code,

01:47.850 --> 01:51.060
and I'll let them teach you that in the README,

01:51.060 --> 01:52.500
but what I'm gonna teach you is what to do

01:52.500 --> 01:54.030
when you have the data, right?

01:54.030 --> 01:58.380
This is a script for the GPT Vision API,

01:58.380 --> 02:00.990
and you just need to run these libraries here,

02:00.990 --> 02:02.940
but I have all the code written here

02:02.940 --> 02:06.417
to pull in what are the different coffee shops, right?

02:06.417 --> 02:08.850
You can see how many photos you've got,

02:08.850 --> 02:10.950
I just looked at the first five here.

02:10.950 --> 02:13.350
See, I've got 45 images from New York,

02:13.350 --> 02:15.420
got a bunch of Sydney, et cetera,

02:15.420 --> 02:18.840
so, what I'm gonna do is I'm just gonna

02:18.840 --> 02:21.360
just little bit just to, wait.

02:21.360 --> 02:24.750
Let's just do Sydney just for testing,

02:24.750 --> 02:27.540
because Vision API is expensive.

02:27.540 --> 02:29.430
If you're running it a bunch of times,

02:29.430 --> 02:30.877
it also takes a little bit of time,

02:30.877 --> 02:32.850
so I'm just gonna write it on these 33 images

02:32.850 --> 02:35.070
in Sydney, right?

02:35.070 --> 02:36.660
And there's two parts to this.

02:36.660 --> 02:38.100
One is inductive coding,

02:38.100 --> 02:40.260
and then the second is deductive coding.

02:40.260 --> 02:42.510
Inductive coding is when you're looking at these,

02:42.510 --> 02:43.343
and you look for patterns,

02:43.343 --> 02:46.860
and you compare them, you go, "Okay, that one has a coffee."

02:46.860 --> 02:50.010
You know, that one has a chalkboard for the menu,

02:50.010 --> 02:52.200
and then there's other ones that have a chalkboard

02:52.200 --> 02:53.490
for menus, as well,

02:53.490 --> 02:55.260
so, maybe that's something that...

02:55.260 --> 02:56.850
Oh, here we go, there's another chalkboard,

02:56.850 --> 02:58.380
so, I've noticed that pattern,

02:58.380 --> 03:01.140
and now that I know there's a pattern,

03:01.140 --> 03:02.430
I wanna find that everywhere.

03:02.430 --> 03:04.980
So, there's another chalkboard,

03:04.980 --> 03:07.440
there's another chalkboard, so, that's deductive coding.

03:07.440 --> 03:10.740
Inductive coding is coming up with the patterns,

03:10.740 --> 03:13.200
comparing and contrasting to find what's different

03:13.200 --> 03:14.580
or what's similar,

03:14.580 --> 03:16.770
and then deductive coding is then going

03:16.770 --> 03:19.830
and finding those patterns everywhere else.

03:19.830 --> 03:24.630
So, we're gonna do those two things here with GPT Vision.

03:24.630 --> 03:25.620
All right, so, the first thing we're doing

03:25.620 --> 03:26.730
is inductive coding,

03:26.730 --> 03:29.970
and we need to be able to send an image to OpenAI,

03:29.970 --> 03:32.700
so, the way that you send an image

03:32.700 --> 03:35.100
is with the Base64 encoding.

03:35.100 --> 03:37.110
This literally just takes an image,

03:37.110 --> 03:39.090
and then turns it into some text

03:39.090 --> 03:41.493
that you can send along with your API call.

03:42.360 --> 03:44.910
And all this code does is it just looks in the folder,

03:44.910 --> 03:47.460
and then it just chooses two images at random

03:47.460 --> 03:49.620
from the different folders,

03:49.620 --> 03:51.750
so, we just have the one folder right now,

03:51.750 --> 03:53.370
but if we have multiple folders,

03:53.370 --> 03:55.290
it would look at all of them.

03:55.290 --> 03:56.970
Oh, I did mention, I forget to mention one thing.

03:56.970 --> 03:59.160
When you're using a Google-Image-Scraper,

03:59.160 --> 04:00.870
you'll probably need to download,

04:00.870 --> 04:04.980
I had to download this mac-x64 Chrome for testing,

04:04.980 --> 04:09.060
but when it's creeping these images, you need that in there.

04:09.060 --> 04:11.220
Here, I'll just put us in a folder called webdriver,

04:11.220 --> 04:13.519
and then it opens that.

04:13.519 --> 04:17.010
I'm just mentioning it because I ran into it.

04:17.010 --> 04:19.350
All right, so, back to the inductive coding.

04:19.350 --> 04:20.550
So, it's choosing an image pair,

04:20.550 --> 04:22.620
it's choosing two images at random,

04:22.620 --> 04:25.050
and then it's comparing them.

04:25.050 --> 04:27.000
So, it's getting the Base64 string,

04:27.000 --> 04:29.257
and then it's using this prompt here.

04:29.257 --> 04:32.190
"Compare these two coffee shops using inductive coding.

04:32.190 --> 04:34.410
What features, attributes, or elements of design

04:34.410 --> 04:35.550
are similar or different?"

04:35.550 --> 04:38.250
And then I'm asking for a structured response here.

04:38.250 --> 04:41.010
That's just how you pass it into GPT Vision, by the way.

04:41.010 --> 04:44.010
You just client.chat.completions,

04:44.010 --> 04:45.780
and then you pass the prompt,

04:45.780 --> 04:48.570
but then you also pass in the two images, as well,

04:48.570 --> 04:50.850
in the same content string.

04:50.850 --> 04:52.590
I'm just using the user prompt here,

04:52.590 --> 04:55.770
but you could also have a system prompt at the top.

04:55.770 --> 04:58.290
So, just run this, see what we've got.

04:58.290 --> 05:00.810
So, it's selected two of these images at random,

05:00.810 --> 05:03.720
number 25 and number 31,

05:03.720 --> 05:06.030
and you can see it takes a little bit of time

05:06.030 --> 05:07.830
for the image response,

05:07.830 --> 05:10.410
as this is OpenAI's biggest model.

05:10.410 --> 05:12.960
Here we go, so now, we've got some labels

05:12.960 --> 05:14.820
that there's some things that they share.

05:14.820 --> 05:18.960
So, coffee shop A and B both have modern designs,

05:18.960 --> 05:20.970
they both have wooden elements.

05:20.970 --> 05:23.790
Only Coffee Shop B has natural light,

05:23.790 --> 05:26.730
and only coffee shop A has spacious interior, right?

05:26.730 --> 05:28.290
So, that's really interesting.

05:28.290 --> 05:30.897
Now, we have some labels, we have a dictionary of labels,

05:30.897 --> 05:33.510
and we can run that and then save it.

05:33.510 --> 05:38.510
So, if we look at, we should have here some cleaned labels.

05:39.180 --> 05:42.510
So, you can see, yeah, there's a bunch of labels here

05:42.510 --> 05:44.370
for these different things, and it's rewritten them.

05:44.370 --> 05:47.490
We've got the actual kind of folder that they're in,

05:47.490 --> 05:49.593
the label, and then the font.

05:50.430 --> 05:51.870
Cool, so now, you have your label dictionary,

05:51.870 --> 05:53.820
and that's useful in itself,

05:53.820 --> 05:55.890
because you could run that multiple times

05:55.890 --> 05:57.630
across multiple images,

05:57.630 --> 05:59.220
so, we just compared two of those images.

05:59.220 --> 06:01.620
You can compare every single image against each other.

06:01.620 --> 06:02.790
Just take a little bit of time.

06:02.790 --> 06:04.110
That, you can run again, as well,

06:04.110 --> 06:06.870
and get a bigger dictionary of data.

06:06.870 --> 06:10.410
Now, it comes to the deductive coding,

06:10.410 --> 06:13.080
and this is the one that's more time consuming,

06:13.080 --> 06:14.160
more expensive.

06:14.160 --> 06:17.774
The reason why I've set this up as asynchronous

06:17.774 --> 06:20.040
is 'cause otherwise it takes a very long time to run,

06:20.040 --> 06:22.140
'cause you're going through every single image, and saying,

06:22.140 --> 06:23.797
and every single label, and going,

06:23.797 --> 06:25.620
"Is this label in this image?

06:25.620 --> 06:27.660
Does this coffee shop have wooden elements?

06:27.660 --> 06:29.820
Does this coffee shop have wooden elements?

06:29.820 --> 06:31.800
Does this coffee shop have wooden elements," right?

06:31.800 --> 06:33.750
And then you're going through all the other labels as well.

06:33.750 --> 06:36.210
Does this coffee shop have suspended lighting in it?

06:36.210 --> 06:38.490
Does this coffee shop have seating areas?

06:38.490 --> 06:41.310
There are different tricks for doing this faster or slower,

06:41.310 --> 06:44.280
but for me, I found if I do asynchronous,

06:44.280 --> 06:48.870
means I can get all the labels at once for a single image,

06:48.870 --> 06:51.450
and they all, you know, run quite quickly,

06:51.450 --> 06:53.310
and so, it runs a little bit faster.

06:53.310 --> 06:56.070
So, again, here, we're just passing one image in,

06:56.070 --> 06:57.600
and we're just passing the user prompt,

06:57.600 --> 06:58.740
which is much simpler.

06:58.740 --> 07:01.230
If the label applies to this image, return one,

07:01.230 --> 07:04.320
otherwise return zero, only output one or zero.

07:04.320 --> 07:05.670
All right, if we get a one,

07:05.670 --> 07:08.400
then we are gonna return the label,

07:08.400 --> 07:11.760
and if we don't, then we're not, we're gonna return none.

07:11.760 --> 07:15.210
So, this what this does, this is the asynchronous code.

07:15.210 --> 07:16.800
We just set up a bunch of tasks,

07:16.800 --> 07:18.480
so, we run it for every image,

07:18.480 --> 07:20.430
and then we wait until all the tasks

07:20.430 --> 07:21.450
are done for that image,

07:21.450 --> 07:24.570
it's processing all the labels at the same time,

07:24.570 --> 07:28.080
and OpenAI can handle multiple API calls at the same time.

07:28.080 --> 07:30.210
What will happen is instead of waiting

07:30.210 --> 07:31.590
for each one to come back,

07:31.590 --> 07:33.510
it can process all of them at the same time.

07:33.510 --> 07:36.840
We're gonna get through, say, if we're doing 10 labels,

07:36.840 --> 07:38.790
we'll get through all 10 at the same time

07:38.790 --> 07:40.260
rather than waiting.

07:40.260 --> 07:42.990
If one takes a minute, we're not waiting 10 minutes,

07:42.990 --> 07:44.700
we're getting all of them in one minute,

07:44.700 --> 07:47.220
so, that's the benefit of asynchronous.

07:47.220 --> 07:49.170
Cool, and then we're just gonna return the labels

07:49.170 --> 07:52.320
if the, you know, if the label is in there, right?

07:52.320 --> 07:54.360
And then actually running deductive code,

07:54.360 --> 07:57.167
we just pull open the cleaned_labels.json

07:57.167 --> 07:58.110
that we had before,

07:58.110 --> 07:59.940
and then we have the cleaned labels

07:59.940 --> 08:03.300
which loads the JSON from that dictionary we had,

08:03.300 --> 08:05.040
and then it finds the values.

08:05.040 --> 08:07.950
I've also limited it to just five labels per coffee shop.

08:07.950 --> 08:10.560
Otherwise, it'd take a bit of time.

08:10.560 --> 08:13.110
Cool, all right, so, let me just run this,

08:13.110 --> 08:14.970
and we're processing coffeeshopsydney.

08:14.970 --> 08:16.230
You can see that all three of those

08:16.230 --> 08:18.150
came back at the same time,

08:18.150 --> 08:20.640
but some of them are taking a little bit longer,

08:20.640 --> 08:22.770
but we're not waiting for one to finish

08:22.770 --> 08:24.300
before we get the other.

08:24.300 --> 08:25.133
Cool, so, here we go.

08:25.133 --> 08:27.150
So now, we're processing coffeeshopsydney,

08:29.280 --> 08:32.010
and we'll see how quickly they come back.

08:32.010 --> 08:33.520
Yeah, this is Sydney30.

08:33.520 --> 08:36.330
Okay, now, we're processing Sydney10,

08:36.330 --> 08:37.950
we're getting hanging light fixtures.

08:37.950 --> 08:40.470
So, this is only coming back as positive

08:40.470 --> 08:43.473
if it does have that label, if that makes sense.

08:45.660 --> 08:49.950
Cool, that's where we'll move into analysis but with Vision.

08:49.950 --> 08:52.260
It's really powerful once you figure

08:52.260 --> 08:54.240
how you can apply this to pretty much anything,

08:54.240 --> 08:56.190
like, anytime you have a lot of images,

08:56.190 --> 08:59.520
like maybe in your ads, or on your website,

08:59.520 --> 09:02.640
you can do some analysis to figure out what labels,

09:02.640 --> 09:04.560
and you can define the labels, as well.

09:04.560 --> 09:07.680
You could tell it what sorts of things you're looking for,

09:07.680 --> 09:09.390
give it a list of things.

09:09.390 --> 09:11.100
However you want to do it, that's fine, too.

09:11.100 --> 09:12.420
You could just add new stuff

09:12.420 --> 09:14.220
to this cleaned labels thing, right?

09:14.220 --> 09:17.790
You could just add in some new ideas, right?

09:17.790 --> 09:19.860
Like, I wanna find...

09:19.860 --> 09:21.480
So, this is customers_queuing, right?

09:21.480 --> 09:23.820
I want to find out if there's a barista

09:23.820 --> 09:25.050
that has a tattoo, right?

09:25.050 --> 09:27.960
Like, you could just add that as another option.

09:27.960 --> 09:32.063
Copy, this one has tattooed barista, right?

09:35.160 --> 09:37.350
And it is really funny if the barista's present in this,

09:37.350 --> 09:39.150
but then we're gonna see if they have tattoos,

09:39.150 --> 09:40.260
visible tattoos.

09:40.260 --> 09:41.460
That's how this works,

09:41.460 --> 09:43.110
and then what you're gonna get back

09:43.110 --> 09:45.510
is you're gonna get a file name like this.

09:45.510 --> 09:49.770
This is just a big list of all the different labels, right?

09:49.770 --> 09:53.100
So, it says, coffeeshopnewyork42

09:53.100 --> 09:55.080
has the label, Street View.

09:55.080 --> 09:58.410
This one has the label, Patrons, present,

09:58.410 --> 10:01.110
and then you can do some analysis after this.

10:01.110 --> 10:03.690
Basically, figure out, okay, how many times,

10:03.690 --> 10:06.840
yeah, how many labels does this apply to?

10:06.840 --> 10:09.600
So, how many of the coffee shops has this label?

10:09.600 --> 10:10.860
I'll give you an example.

10:10.860 --> 10:12.240
I'll show you an example here,

10:12.240 --> 10:14.160
because that's still processing,

10:14.160 --> 10:16.080
because it's going through 33 images.

10:16.080 --> 10:19.193
Here's an example from New York that I ran earlier.

10:19.193 --> 10:23.340
So, you can get a list of all the labels into a data frame,

10:23.340 --> 10:25.380
this is like an Excel spreadsheet,

10:25.380 --> 10:27.810
and then you can print out what percentage

10:27.810 --> 10:30.510
of the total images have these different labels.

10:30.510 --> 10:35.130
So here, we're finding that 41% of the patrons,

10:35.130 --> 10:39.330
40% of the coffee shop images show patrons present.

10:39.330 --> 10:41.760
So, to me, you wanna really pay attention

10:41.760 --> 10:44.100
to the ones that are on almost everything,

10:44.100 --> 10:46.200
'cause that's almost like a rule of the game.

10:46.200 --> 10:48.900
If you don't have patrons present in your coffee shop image,

10:48.900 --> 10:50.160
that's probably really bad,

10:50.160 --> 10:53.070
and you definitely want to at least have a stance on that.

10:53.070 --> 10:54.300
Like, maybe have a good reason

10:54.300 --> 10:56.400
for not having patrons present in the image

10:56.400 --> 10:59.160
of your coffee shop when you're uploading that image

10:59.160 --> 11:01.620
to, you know, Google my business,

11:01.620 --> 11:03.390
but you should at least address that.

11:03.390 --> 11:05.880
Have a strategic choice here.

11:05.880 --> 11:09.150
Similarly, for things that are not very popular,

11:09.150 --> 11:10.680
you also might want to think about these

11:10.680 --> 11:11.760
as a differentiator.

11:11.760 --> 11:15.120
So, exposed brick interior, you could argue

11:15.120 --> 11:17.190
that that's not very fashionable anymore,

11:17.190 --> 11:19.530
and that's why only 10% of people have had it,

11:19.530 --> 11:22.590
or you could say, "This is gonna be fashionable.

11:22.590 --> 11:24.390
I can see that this is a trend,

11:24.390 --> 11:27.090
so, I'm gonna start using this," right?

11:27.090 --> 11:30.000
That's how you do this type of analysis.

11:30.000 --> 11:31.590
Obviously, what it doesn't tell you is, like,

11:31.590 --> 11:34.950
which choices to make and what designs to have,

11:34.950 --> 11:37.587
but if you're, you know, not a designer like me,

11:37.587 --> 11:40.830
and you want to build a bit more data to make your decision,

11:40.830 --> 11:42.270
this can help inform your decision,

11:42.270 --> 11:45.540
give you a map of what are the different creative decisions

11:45.540 --> 11:48.900
you need to make when you're doing something designed

11:48.900 --> 11:50.490
or visual, right?

11:50.490 --> 11:51.750
Hopefully that's useful.

11:51.750 --> 11:54.330
This is my little hack for being a data guy,

11:54.330 --> 11:56.850
but also playing in the world of branding.

11:56.850 --> 11:58.550
Hopefully, you find it useful too.
