WEBVTT

00:00.030 --> 00:00.863
-: Okay, so great.

00:00.863 --> 00:03.480
So you've got billing enabled, you have an API key,

00:03.480 --> 00:06.600
and you're ready to get started using OpenAI's API.

00:06.600 --> 00:08.880
What I want you to do is make sure you've downloaded

00:08.880 --> 00:10.800
and cloned to the GitHub repository.

00:10.800 --> 00:11.633
Then I want you to go

00:11.633 --> 00:15.270
to the openai_features_and_functionality folder.

00:15.270 --> 00:18.300
Then click on core_features_walkthrough.

00:18.300 --> 00:20.250
We're gonna go through and just walk through a couple

00:20.250 --> 00:21.360
of the core features.

00:21.360 --> 00:22.860
I'm not gonna write any code,

00:22.860 --> 00:24.420
but what we're gonna do is just go through

00:24.420 --> 00:26.100
each of the core services

00:26.100 --> 00:28.440
and look at what's happening roughly,

00:28.440 --> 00:29.820
and then in the future lessons,

00:29.820 --> 00:32.160
we'll dive into each individual service

00:32.160 --> 00:33.480
in a bit more detail.

00:33.480 --> 00:35.370
So installing a bunch of packages,

00:35.370 --> 00:39.270
we're then importing the os module and OpenAI module.

00:39.270 --> 00:42.390
Then we are creating a global variable for the model.

00:42.390 --> 00:46.500
In our case, we're gonna be using gpt-4.1-mini

00:46.500 --> 00:49.050
for most of the core features.

00:49.050 --> 00:51.180
Now, you will need to put your API key in here,

00:51.180 --> 00:53.610
so I'll leave that up to you to do that.

00:53.610 --> 00:55.200
The first thing we're gonna have a look at is,

00:55.200 --> 00:59.010
how do you generate text directly from OpenAI's API

00:59.010 --> 01:01.200
using the responses API?

01:01.200 --> 01:02.640
You'll see we have a prompt.

01:02.640 --> 01:06.270
So, "Write a one-sentence bedtime story about a unicorn."

01:06.270 --> 01:09.060
We then create our client.responses.create,

01:09.060 --> 01:11.880
we put in the model that we want to use and the input.

01:11.880 --> 01:15.420
And what you'll get back from that is this .output_text.

01:15.420 --> 01:18.840
You can also as well control the model's parameters

01:18.840 --> 01:21.180
by adding things like a temperature.

01:21.180 --> 01:24.450
So the lower will give you more deterministic outputs

01:24.450 --> 01:26.910
or very similar outputs again and again

01:26.910 --> 01:30.090
and a higher temperature will give you more random outputs.

01:30.090 --> 01:31.920
And you can go and run this cell

01:31.920 --> 01:34.080
and you'll see you get two responses here.

01:34.080 --> 01:36.540
The prompt for each one and the response.

01:36.540 --> 01:40.200
And all we're doing is using that response.output_text

01:40.200 --> 01:42.030
and that will give you some text

01:42.030 --> 01:45.000
specifically from the responses API.

01:45.000 --> 01:46.980
If we're having a look at the second example,

01:46.980 --> 01:49.350
we're using something called structured outputs.

01:49.350 --> 01:54.300
So you'll see we have the same client.responses.create,

01:54.300 --> 01:57.900
we have our model, the input is we have a system message,

01:57.900 --> 02:01.140
which remember is an instructional developer message

02:01.140 --> 02:05.580
that you can put into the start of a conversation

02:05.580 --> 02:08.130
and we're telling it that it's a generator AI,

02:08.130 --> 02:10.500
convert the user's input into a UI,

02:10.500 --> 02:13.080
and we're saying make a user profile form.

02:13.080 --> 02:16.230
Now, we've got here is something called JSON Schema.

02:16.230 --> 02:19.170
JSON Schema is a way for us to tell the model

02:19.170 --> 02:21.600
what we want the structure of that data to be

02:21.600 --> 02:24.480
when it comes back from the model's response.

02:24.480 --> 02:26.790
You'll see we have this type of JSON Schema,

02:26.790 --> 02:30.270
the name, the description, and we have the schema itself,

02:30.270 --> 02:31.770
which is the JSON schema

02:31.770 --> 02:34.080
from this bit all the way down to here.

02:34.080 --> 02:36.930
And we also have this strict is True.

02:36.930 --> 02:39.030
Now, if we go and run this,

02:39.030 --> 02:41.760
then what we get back comes from a string.

02:41.760 --> 02:43.710
So this output text is a string,

02:43.710 --> 02:45.360
and we're then loading that in

02:45.360 --> 02:47.970
directly from the json.loads package.

02:47.970 --> 02:50.460
And you'll now see that we have some structured data

02:50.460 --> 02:52.830
that's come out from OpenAI's API.

02:52.830 --> 02:54.390
This can be also done using

02:54.390 --> 02:57.300
a data validation package called pydantic.

02:57.300 --> 02:59.070
So we can import a base model,

02:59.070 --> 03:01.950
we can set up various models such as the step

03:01.950 --> 03:03.270
where we have an explanation

03:03.270 --> 03:06.660
and an output which have data types of string.

03:06.660 --> 03:09.480
We have some math reasoning which has a list of steps

03:09.480 --> 03:10.890
and a final answer.

03:10.890 --> 03:12.270
And then we can use, for example,

03:12.270 --> 03:14.280
the chat completions endpoint

03:14.280 --> 03:18.000
with the client.beta.chat.completions.parse.

03:18.000 --> 03:21.150
We pass in the model, the messages, and the model.

03:21.150 --> 03:24.063
And then we also put the response format as well.

03:25.380 --> 03:26.640
And what you'll see if you run this

03:26.640 --> 03:29.400
is you get a structured pydantic object.

03:29.400 --> 03:30.960
If I do math reasoning,

03:30.960 --> 03:35.490
what you can see is we can access this .parse

03:35.490 --> 03:37.710
and then we have a maths reasoning object

03:37.710 --> 03:39.480
where we can do the .steps

03:39.480 --> 03:41.610
and we can see here are all the different steps

03:41.610 --> 03:43.260
that the model decided to take

03:43.260 --> 03:46.530
when reasoning over that type of math problem.

03:46.530 --> 03:49.320
We can go in and have a look at the first one, for example.

03:49.320 --> 03:51.540
It's also possible to get ChatGPT

03:51.540 --> 03:53.490
to put an image in here.

03:53.490 --> 03:56.970
And then we will use the responses API with this input.

03:56.970 --> 03:59.370
And we're putting in a role of user

03:59.370 --> 04:01.177
and we're saying the content is,

04:01.177 --> 04:02.520
"What is in this image,"

04:02.520 --> 04:05.730
and we have this type input_image and the image_url.

04:05.730 --> 04:07.740
And then when you run that block of code,

04:07.740 --> 04:11.460
what you'll see is ChatGPT will analyze in real time

04:11.460 --> 04:13.830
what is inside of that image.

04:13.830 --> 04:15.450
You also have text to speech.

04:15.450 --> 04:16.283
So we have this

04:16.283 --> 04:19.950
client.audio.speech.with_streaming_response.create.

04:19.950 --> 04:24.950
We have a different model for this called gpt-4o-mini-tts.

04:24.960 --> 04:27.330
We choose a voice that we're interested in

04:27.330 --> 04:29.100
and what text we would like

04:29.100 --> 04:31.770
to be converted into an MP3 file.

04:31.770 --> 04:34.110
And then what we do is have some instructions

04:34.110 --> 04:37.470
that help to standardize how that voice sounds.

04:37.470 --> 04:40.230
Then we use this response.stream_to_file

04:41.790 --> 04:44.790
and we provide a file path of speech.mp3.

04:44.790 --> 04:46.650
You'll then see if you look on the right,

04:46.650 --> 04:49.500
you then have a speech file which you can then play.

04:49.500 --> 04:52.110
AI: It's a wonderful day to build something people love.

04:52.110 --> 04:52.943
-: That's awesome.

04:52.943 --> 04:57.420
And we also have the ability when we have an MP3

04:57.420 --> 05:01.260
to convert that MP three directly back into transcription.

05:01.260 --> 05:03.540
So you'll see that we're just using the file

05:03.540 --> 05:05.010
that we just already made.

05:05.010 --> 05:06.360
And then we can take that file

05:06.360 --> 05:09.510
and you can see we can easily convert that back into text.

05:09.510 --> 05:12.420
The next concept is this idea of function calling.

05:12.420 --> 05:14.820
So what you can have is a function.

05:14.820 --> 05:16.800
You can set up a bunch of tools.

05:16.800 --> 05:19.350
So in this scenario we only have a Python list

05:19.350 --> 05:20.640
with one tool.

05:20.640 --> 05:24.300
We then create an input message, which is a role of user

05:24.300 --> 05:26.460
and has the content of, "What's the weather

05:26.460 --> 05:27.630
like in Paris today?"

05:27.630 --> 05:32.250
And we can tell ChatGPT that it has access to these tools.

05:32.250 --> 05:34.740
And what you'll see is ChatGPT,

05:34.740 --> 05:37.923
if we look at the response.output,

05:39.510 --> 05:42.930
you'll see that it decided it wants to call a tool,

05:42.930 --> 05:45.240
and then you as the developer will get access

05:45.240 --> 05:46.740
to the tool call,

05:46.740 --> 05:49.200
the arguments that it wants to call it with,

05:49.200 --> 05:51.420
which will be under the .arguments

05:51.420 --> 05:54.120
under this output[0],

05:54.120 --> 05:56.130
and you can see here these are the arguments

05:56.130 --> 05:58.950
that it wants to use, the latitude and longitude.

05:58.950 --> 06:00.660
And then we call the function.

06:00.660 --> 06:02.490
And after calling the function,

06:02.490 --> 06:04.830
we add onto the input messages

06:04.830 --> 06:07.110
the original tool call that it wanted to make,

06:07.110 --> 06:09.570
and also the result of that tool call.

06:09.570 --> 06:11.973
Then we can call ChatGPT again.

06:14.460 --> 06:17.137
After calling it again, it will then respond in text,

06:17.137 --> 06:20.310
"The current weather in Paris is 16 degrees Celsius."

06:20.310 --> 06:23.190
So this basically allows your AI models

06:23.190 --> 06:24.870
to have access to functions

06:24.870 --> 06:28.410
so that they can do work outside of textual environments.

06:28.410 --> 06:30.450
The next thing we have is reasoning models.

06:30.450 --> 06:33.030
So you can use a more powerful model

06:33.030 --> 06:35.400
if you're working on a very complicated task.

06:35.400 --> 06:36.450
The only difference here is

06:36.450 --> 06:38.100
that we've just switched the model name

06:38.100 --> 06:39.930
and we're using o4-mini here.

06:39.930 --> 06:41.970
Now, I will say this will take a little bit longer

06:41.970 --> 06:45.120
to come back, and you won't get immediate responses.

06:45.120 --> 06:48.720
So depending upon how fast you need to respond to your users

06:48.720 --> 06:50.790
and what cost you need to do,

06:50.790 --> 06:53.100
you might want to have a look at the trade-offs

06:53.100 --> 06:54.720
between using reasoning models

06:54.720 --> 06:56.730
versus using standard chat models.

06:56.730 --> 07:00.690
You can see this took around 15.5 seconds to come back.

07:00.690 --> 07:03.000
We also have this idea called embeddings,

07:03.000 --> 07:05.850
which is basically a numerical representation

07:05.850 --> 07:07.200
of what text is.

07:07.200 --> 07:09.150
You can see we have a bunch of words

07:09.150 --> 07:12.930
and then we can have a client.embeddings.create,

07:12.930 --> 07:15.540
choosing a embedding model that we want to pick,

07:15.540 --> 07:19.590
and the words that we want to turn into numerical numbers.

07:19.590 --> 07:21.360
We then extract the embeddings

07:21.360 --> 07:23.400
for each individual piece of embedding.

07:23.400 --> 07:26.610
We then have a function to calculate the .product,

07:26.610 --> 07:29.580
and we basically then create a similarity matrix

07:29.580 --> 07:32.100
for all of the .products between all of the numbers

07:32.100 --> 07:33.630
that get created.

07:33.630 --> 07:35.790
And if you run this, for example,

07:35.790 --> 07:38.580
you will see exactly what's happening here,

07:38.580 --> 07:41.550
where we have a similarity matrix using the .product,

07:41.550 --> 07:44.040
which is the cosine similarity,

07:44.040 --> 07:49.040
and each embedding has around 3,072 dimensions.

07:49.530 --> 07:50.820
There are 10 embeddings,

07:50.820 --> 07:53.640
and this is the similarity matrix between all of them.

07:53.640 --> 07:54.840
You can see, for example,

07:54.840 --> 07:57.810
so king and king is obviously completely the same,

07:57.810 --> 08:01.680
king and man has a 0.41 similarity.

08:01.680 --> 08:06.680
You'll see if we look at, for example, banana versus orange,

08:06.690 --> 08:09.030
that has a 45 similarity.

08:09.030 --> 08:11.910
It has quite a good one with apple as well,

08:11.910 --> 08:13.110
orange and apple,

08:13.110 --> 08:16.050
but it has less similarity against men and women,

08:16.050 --> 08:19.080
and therefore you can use this to see what type

08:19.080 --> 08:22.800
of similar words, you know, are similar to other phrases.

08:22.800 --> 08:27.210
So you can see king and queen are similar with 0.55

08:27.210 --> 08:30.450
versus apple and banana that has a lower similarity.

08:30.450 --> 08:33.690
King and apple even, again, has a lower similarity.

08:33.690 --> 08:35.190
So embeddings are useful

08:35.190 --> 08:37.560
for when we're doing different types of work

08:37.560 --> 08:40.710
where we need to go beyond just using the text.

08:40.710 --> 08:44.040
This is where we will turn the text into a numerical number

08:44.040 --> 08:46.500
and then when we use that for downstream tasks.

08:46.500 --> 08:48.630
Cool, I know this is just a whistle-stop tour

08:48.630 --> 08:51.390
and we're gonna go into more of the features in detail,

08:51.390 --> 08:53.460
but I just thought I have one notebook

08:53.460 --> 08:55.770
that kind of summarizes all of the core functionality

08:55.770 --> 08:57.450
that I would recommend looking at

08:57.450 --> 08:59.433
when using OpenAI's platform.
