WEBVTT

00:00.060 --> 00:00.893
-: Right, welcome back.

00:00.893 --> 00:02.130
So in this video, you're gonna have a look

00:02.130 --> 00:03.660
at how you can count tokens

00:03.660 --> 00:06.180
when you're using the OpenAI API,

00:06.180 --> 00:09.270
and we're gonna be using a package called tiktokens.

00:09.270 --> 00:11.700
In your GitHub repository, I want you to open

00:11.700 --> 00:15.120
the how_to_count_tokens_using_tiktoken notebook.

00:15.120 --> 00:16.770
And we're gonna just go through

00:16.770 --> 00:17.910
and run a couple of the cells

00:17.910 --> 00:19.680
and be familiar with understanding

00:19.680 --> 00:21.270
what this Python package is.

00:21.270 --> 00:23.070
And there's a lot more information if you wanna go

00:23.070 --> 00:26.010
into the depths of how tiktoken actually works.

00:26.010 --> 00:28.920
But essentially, tiktoken is a Python package

00:28.920 --> 00:32.310
that we can use and it is a tokenizer

00:32.310 --> 00:35.460
that will take those larger pieces of sentences

00:35.460 --> 00:38.280
and documents and convert those into tokens.

00:38.280 --> 00:40.080
The first thing you're gonna need to do is you're going

00:40.080 --> 00:43.470
to need to install tiktoken and OpenAI.

00:43.470 --> 00:44.880
And then what we're gonna do

00:44.880 --> 00:48.270
is we're gonna import the tiktoken package.

00:48.270 --> 00:50.100
After you've imported tiktoken,

00:50.100 --> 00:53.250
then you can use a specific type of encoding.

00:53.250 --> 00:54.780
And there's different types of encoding

00:54.780 --> 00:56.580
for different types of models.

00:56.580 --> 01:00.343
The one that you'll find is often used is the cl100k_base.

01:02.370 --> 01:05.643
You can then use the tiktoken.encoding_for_model,

01:06.810 --> 01:09.660
and if you'll see here, you're then able

01:09.660 --> 01:11.970
to do a .encode

01:11.970 --> 01:14.640
and you'll see how many tokens this piece

01:14.640 --> 01:18.750
of text becomes out for gpt-4o-mini.

01:18.750 --> 01:22.500
So we end up having 1, 2, 3, 4, 5, 6 tokens

01:22.500 --> 01:24.570
for that tiktoken is great.

01:24.570 --> 01:28.680
So you can use this tiktoken.encoding_for_model

01:28.680 --> 01:30.240
and then giving it the model name.

01:30.240 --> 01:33.270
And then you will use the .encode method

01:33.270 --> 01:35.910
to figure out exactly what those tokens are.

01:35.910 --> 01:38.520
You can also then have a helper function

01:38.520 --> 01:41.490
where you take in a string and an encoding_name

01:41.490 --> 01:42.840
and you do something like this,

01:42.840 --> 01:44.700
and then you're just counting the length for those

01:44.700 --> 01:46.830
that get returned from that .encode.

01:46.830 --> 01:49.350
So you can see here, we're gonna just call this here

01:49.350 --> 01:51.603
and you've got around about six tokens.

01:52.710 --> 01:56.790
You can also turn tokens back into words

01:56.790 --> 01:59.850
by decoding them using that same encoder.

01:59.850 --> 02:02.940
And as well as that, you can also get the total number

02:02.940 --> 02:04.440
of single token bytes.

02:04.440 --> 02:07.080
And you can see here what that specific token is

02:07.080 --> 02:08.583
after the b character.

02:09.540 --> 02:11.610
We can look at different types of encoding

02:11.610 --> 02:14.640
and you can see this is how you could compare encodings.

02:14.640 --> 02:15.570
We're gonna now look

02:15.570 --> 02:19.080
at how you could count a ChatGPT completion call

02:19.080 --> 02:21.830
using a helper function called num_tokens_from_messages

02:22.890 --> 02:26.040
where we pass in some messages, we pass in a model name,

02:26.040 --> 02:28.380
we try and get that encoding_for_model,

02:28.380 --> 02:31.290
and if there is an error, we just get a zero,

02:31.290 --> 02:33.750
an o200k_base.

02:33.750 --> 02:35.910
And then we look and see if there's models.

02:35.910 --> 02:37.560
And then we basically use

02:37.560 --> 02:40.140
the num_tokens_from_messages function

02:40.140 --> 02:43.140
to calculate the different tokens there are.

02:43.140 --> 02:45.450
We also add on some additional tokens

02:45.450 --> 02:48.553
because every reply is primed with this:

02:48.553 --> 02:51.753
&lt;|start|>assistant&lt;|message|> at the front.

02:52.620 --> 02:55.740
Then what I want you to do is import OpenAI

02:55.740 --> 02:57.900
and you're gonna need to set up your API key here.

02:57.900 --> 02:58.733
If you scroll down,

02:58.733 --> 03:00.690
okay, so we have a bunch of example messages,

03:00.690 --> 03:03.840
one system message, then some user and assistant messages

03:03.840 --> 03:07.710
and user and assistant and finally finishing with the user.

03:07.710 --> 03:10.020
Then what we're gonna do is loop over all of these models

03:10.020 --> 03:12.750
and we're gonna print the num_messages.

03:12.750 --> 03:14.430
We're gonna print the output that we get back

03:14.430 --> 03:17.010
from this num_tokens_from_messages function.

03:17.010 --> 03:20.610
We're then gonna run a client.chat.completions.create,

03:20.610 --> 03:22.470
passing in the model=model

03:22.470 --> 03:25.140
and the messages=example_messages.

03:25.140 --> 03:26.250
And we're then also going

03:26.250 --> 03:29.760
to print out the .usage.prompt_tokens as well.

03:29.760 --> 03:30.960
So that'll be the number of tokens

03:30.960 --> 03:33.270
that OpenAI has calculated for usage.

03:33.270 --> 03:36.240
And you can see here these token estimations match up

03:36.240 --> 03:39.060
with what we were actually charged by OpenAI.

03:39.060 --> 03:41.280
So this can be a really good way if you're hitting

03:41.280 --> 03:43.620
into token limits or you need to know exactly

03:43.620 --> 03:45.450
how to count the number of tokens.

03:45.450 --> 03:48.600
Then you can use the tiktoken package,

03:48.600 --> 03:51.150
which has a built-in encoder allowing you

03:51.150 --> 03:53.220
to get out the number of tokens

03:53.220 --> 03:55.170
for your OpenAI messages.

03:55.170 --> 03:57.030
It's also possible to do more advanced things,

03:57.030 --> 03:58.890
like calculate the number of tool calls.

03:58.890 --> 04:01.440
So if you are using agents and tools,

04:01.440 --> 04:03.870
feel free to have a look at this specific section.

04:03.870 --> 04:04.830
To keep things simple,

04:04.830 --> 04:07.080
we're gonna move along for now, but have a look at that

04:07.080 --> 04:07.980
if you're really interested

04:07.980 --> 04:11.190
in how you can calculate tool token costs for your agents.

04:11.190 --> 04:12.270
The next thing that we're gonna look at

04:12.270 --> 04:15.240
is how we can manage the message history if we want

04:15.240 --> 04:16.680
to only have a certain number

04:16.680 --> 04:19.503
of tokens using the tiktoken package.
