WEBVTT

00:00.000 --> 00:03.139
-: So attached to this video will be a link to a CSV file.

00:03.139 --> 00:05.310
Make sure to download that file.

00:05.310 --> 00:07.470
It's called students.csv.

00:07.470 --> 00:08.640
If we have a look at this,

00:08.640 --> 00:10.510
and I'm gonna go and open this up,

00:10.510 --> 00:12.240
you can see we have some data about students.

00:12.240 --> 00:14.700
So for example, the first name, the last name,

00:14.700 --> 00:17.670
the gender of a student, the age, the grade level,

00:17.670 --> 00:20.580
and some other things such as what type of degree they took,

00:20.580 --> 00:21.810
their attendance rates,

00:21.810 --> 00:24.360
their extracurricular activities, et cetera.

00:24.360 --> 00:26.640
We're gonna use ChatGPT to analyze this dataset

00:26.640 --> 00:28.200
rather than having to use Excel.

00:28.200 --> 00:29.760
Cool, so the first thing is once you've got

00:29.760 --> 00:31.170
that downloaded, go to ChatGPT.

00:31.170 --> 00:33.030
I want you to click on this plus sign,

00:33.030 --> 00:36.375
click from Upload from computer, go to students.csv.

00:36.375 --> 00:40.650
You can also drag the CSV or any file directly

00:40.650 --> 00:42.360
into the chat box as well.

00:42.360 --> 00:43.740
That's also possible.

00:43.740 --> 00:44.910
Then we're gonna say,

00:44.910 --> 00:48.510
can you find out how many students

00:48.510 --> 00:50.380
there are in this dataset?

00:56.187 --> 00:57.330
And what it will do

00:57.330 --> 01:00.450
is ChatGPT will analyze and write code

01:00.450 --> 01:02.310
and you'll see if you click on this Analyzing,

01:02.310 --> 01:03.840
you can see that it's actually going

01:03.840 --> 01:06.570
and generating some code in Python

01:06.570 --> 01:08.310
and it's returning a result.

01:08.310 --> 01:11.310
And then it shows you there are 50 students in the dataset.

01:11.310 --> 01:14.970
So ChatGPT has a way of being able to run code for you

01:14.970 --> 01:17.160
in Python, execute that code,

01:17.160 --> 01:19.290
and then return the answer of that to you

01:19.290 --> 01:20.490
in natural language.

01:20.490 --> 01:24.720
We could say how many males versus females

01:24.720 --> 01:26.823
are there in the CSV?

01:29.220 --> 01:31.290
And again, it's running some Python code.

01:31.290 --> 01:33.150
You don't always have to show the details of this.

01:33.150 --> 01:35.790
You can hide it, but I quite like to see exactly

01:35.790 --> 01:36.870
what's been happening.

01:36.870 --> 01:38.970
So you'll see here it actually ran into an error.

01:38.970 --> 01:41.220
See there's this KeyError Gender

01:41.220 --> 01:43.554
and then it's decided to print out the columns

01:43.554 --> 01:46.170
so that it can specifically figure out

01:46.170 --> 01:47.820
which columns it needs to query.

01:47.820 --> 01:50.250
It then creates a new specific piece of code.

01:50.250 --> 01:51.083
And you can see here,

01:51.083 --> 01:54.090
it is doing a .value_counts on the gender,

01:54.090 --> 01:55.320
which allows it to find out

01:55.320 --> 01:58.410
that there's these 25 females and 25 males.

01:58.410 --> 02:01.323
If we scroll up and have a look at the dataset,

02:02.640 --> 02:06.060
we could also say which major

02:06.060 --> 02:09.423
has the highest attendance rate?

02:12.210 --> 02:13.500
So now it's deciding to group it

02:13.500 --> 02:17.160
by the major and it's using the attendance rate

02:17.160 --> 02:18.120
and getting the mean of that.

02:18.120 --> 02:20.850
And then it finds the one that has the highest maximum.

02:20.850 --> 02:23.100
So math had the highest attendance rate,

02:23.100 --> 02:23.940
and so you can see

02:23.940 --> 02:26.520
that we don't necessarily need to put this data

02:26.520 --> 02:28.110
directly into Microsoft Excel.

02:28.110 --> 02:30.600
We can just put a CSV directly

02:30.600 --> 02:32.730
inside of ChatGPT and start asking it

02:32.730 --> 02:34.080
natural language questions.

02:34.080 --> 02:35.400
We can even plot bar charts.

02:35.400 --> 02:36.750
So I can say, can you plot

02:36.750 --> 02:41.750
a bar chart of attendance rates, broken down by major?

02:46.320 --> 02:48.750
And now it's decided to use a plotting library.

02:48.750 --> 02:49.980
So this is just a piece of code

02:49.980 --> 02:52.953
that allows it to generate graphs dynamically.

02:54.420 --> 02:55.830
So we'll just wait for that to load.

02:55.830 --> 02:57.690
And here you go, here's your bar chart,

02:57.690 --> 03:00.210
and you can easily download this or change this.

03:00.210 --> 03:01.950
You can click on this Download chart

03:01.950 --> 03:04.080
and then you can download that directly to your desktop.

03:04.080 --> 03:05.580
You can also click on, right click,

03:05.580 --> 03:08.100
and then Save Image As if you're on Mac.

03:08.100 --> 03:10.890
So it's possible for you to not only ask questions

03:10.890 --> 03:11.820
in the natural language,

03:11.820 --> 03:13.800
but get automatic data visualizations

03:13.800 --> 03:16.200
made out of ChatGPT.

03:16.200 --> 03:17.033
As an exercise,

03:17.033 --> 03:19.200
I want you spend two minutes asking ChatGPT

03:19.200 --> 03:21.540
some follow-up questions on this dataset,

03:21.540 --> 03:22.860
and also playing around with

03:22.860 --> 03:25.980
what kind of data visualizations you can get it to create.

03:25.980 --> 03:27.730
Cool, I'll see you in the next one.
