WEBVTT

00:00.030 --> 00:01.440
-: Earlier on in the ChatGPT section,

00:01.440 --> 00:03.720
we had a look at how we can easily generate images

00:03.720 --> 00:07.050
from prompts and also by using reference images.

00:07.050 --> 00:10.770
Now, I want to also make a point here that ChatGPT Vision

00:10.770 --> 00:12.930
also has reasoning capabilities.

00:12.930 --> 00:14.130
So for example, we can put in

00:14.130 --> 00:16.050
the original portrait of my image

00:16.050 --> 00:17.520
and we can ask a question like,

00:17.520 --> 00:20.610
how many fingers are I holding up?

00:20.610 --> 00:24.000
And ChatGPT is able to not only generate images from images,

00:24.000 --> 00:25.350
but it's also able to provide you

00:25.350 --> 00:28.350
with cognitive reasoning on specific images.

00:28.350 --> 00:30.180
So now ChatGPT is analyzing

00:30.180 --> 00:31.920
the images and it shows us

00:31.920 --> 00:35.130
that we're holding up two fingers in a classic peace sign.

00:35.130 --> 00:36.330
Let's also have a look on Google.

00:36.330 --> 00:37.290
So I'm gonna go and get

00:37.290 --> 00:39.180
an image of several men working

00:39.180 --> 00:41.160
in the road and seeing if it can identify

00:41.160 --> 00:44.400
both the gender and how many people there are in this image.

00:44.400 --> 00:45.930
So then I'm gonna go back to ChatGPT.

00:45.930 --> 00:48.930
I'm gonna click on the plus sign, Upload from computer,

00:48.930 --> 00:50.580
gonna go into My Downloads,

00:50.580 --> 00:52.960
and then I'm gonna say how many people

00:52.960 --> 00:57.873
are in this image and what gender are they?

01:01.403 --> 01:03.210
Okay, so ChatGPT has worked out

01:03.210 --> 01:05.070
that there are two people in the image,

01:05.070 --> 01:07.500
so it can't really identify the gender,

01:07.500 --> 01:10.470
but it has been able to identify that there are two people.

01:10.470 --> 01:12.690
We could also tell ChatGPT to give this

01:12.690 --> 01:14.310
in more of a structured data form.

01:14.310 --> 01:17.640
So I could say, could you output this

01:17.640 --> 01:21.660
as a CSV with probabilities for male

01:21.660 --> 01:25.490
versus female and the number of people identified?

01:27.660 --> 01:29.700
And then we can get structured data directly

01:29.700 --> 01:32.700
out from ChatGPT, and we are just reasoning

01:32.700 --> 01:33.780
over those images.

01:33.780 --> 01:36.060
So again, you can see here, it's worked out

01:36.060 --> 01:38.190
that this is how much it likely thinks

01:38.190 --> 01:41.520
the probability of male is versus the probability of female.

01:41.520 --> 01:42.810
And then it's basically making

01:42.810 --> 01:44.430
a CSV and then it shows us

01:44.430 --> 01:47.250
we can download this specific CSV here.

01:47.250 --> 01:48.900
So I've now got some structured data.

01:48.900 --> 01:50.640
So you've got these Person IDs.

01:50.640 --> 01:52.278
The estimated gender is unknown,

01:52.278 --> 01:55.800
but the probability of male is quite high.

01:55.800 --> 01:56.790
And there you go.

01:56.790 --> 01:58.920
Okay, so I'd recommend having a play around with this.

01:58.920 --> 02:00.690
Maybe add in a couple of images

02:00.690 --> 02:04.080
and try and get some structured data out from these images.

02:04.080 --> 02:06.780
That could include identifying different types of objects,

02:06.780 --> 02:09.360
people, places, or emotions.

02:09.360 --> 02:11.110
Cool, I'll see you in the next one.
