WEBVTT

1
00:00:00.090 --> 00:00:02.730
<v ->Hey, in this video, we'll have a look at what tokens are,</v>

2
00:00:02.730 --> 00:00:05.070
and also we'll look at something called token limits

3
00:00:05.070 --> 00:00:07.410
and how that relates to large language models.

4
00:00:07.410 --> 00:00:11.580
Firstly, tokenization is the process of taking sentences

5
00:00:11.580 --> 00:00:15.930
or words and making those into smaller units called tokens.

6
00:00:15.930 --> 00:00:17.580
On the right here, you can see we've taken

7
00:00:17.580 --> 00:00:21.270
a bunch of sentences and we slowly break that down

8
00:00:21.270 --> 00:00:24.570
by a process called tokenization into tokens.

9
00:00:24.570 --> 00:00:26.700
Now, a token doesn't always have to be a word.

10
00:00:26.700 --> 00:00:29.040
You can have smaller amounts

11
00:00:29.040 --> 00:00:31.650
such as a character or two letters,

12
00:00:31.650 --> 00:00:35.490
but the idea is that ChatGPT takes a series of words

13
00:00:35.490 --> 00:00:37.470
and converts that into tokens.

14
00:00:37.470 --> 00:00:40.020
Each model has a different number of input tokens

15
00:00:40.020 --> 00:00:42.480
and output tokens that can be used.

16
00:00:42.480 --> 00:00:47.480
For example, GPT-5.2 has a $1.75 for input tokens

17
00:00:47.880 --> 00:00:50.370
and $14 for output tokens.

18
00:00:50.370 --> 00:00:52.530
If we have a look at a different model,

19
00:00:52.530 --> 00:00:54.330
like for example, GPT-5 mini,

20
00:00:54.330 --> 00:00:58.290
you can see that the input tokens is $0.25

21
00:00:58.290 --> 00:01:00.180
and the output is $2.

22
00:01:00.180 --> 00:01:02.490
Now, both of these have different context windows,

23
00:01:02.490 --> 00:01:04.080
and obviously as the models change,

24
00:01:04.080 --> 00:01:07.110
these context windows will grow over time

25
00:01:07.110 --> 00:01:08.850
and the prices will also get cheaper

26
00:01:08.850 --> 00:01:10.560
for input and output tokens.

27
00:01:10.560 --> 00:01:12.210
For example, in GPT-5 mini,

28
00:01:12.210 --> 00:01:16.080
you can see that this has 400,000 context windows

29
00:01:16.080 --> 00:01:20.430
and it can output up to 128,000 max output tokens.

30
00:01:20.430 --> 00:01:24.000
If we have a look and go back to see GPT-5.2,

31
00:01:24.000 --> 00:01:26.760
you can see that this has the same context window.

32
00:01:26.760 --> 00:01:28.800
So different models have different prices,

33
00:01:28.800 --> 00:01:30.330
and sometimes they have the same.

34
00:01:30.330 --> 00:01:31.230
Different types of models

35
00:01:31.230 --> 00:01:33.330
can have different lengths of context windows,

36
00:01:33.330 --> 00:01:35.580
and also they could potentially generate

37
00:01:35.580 --> 00:01:37.290
different numbers of output tokens.

38
00:01:37.290 --> 00:01:39.480
They've also got different types of latency.

39
00:01:39.480 --> 00:01:43.080
So you can see GPT-5 nano is very fast,

40
00:01:43.080 --> 00:01:46.260
but it has average reasoning, so use that on smaller tasks

41
00:01:46.260 --> 00:01:49.110
that you don't need a heavier model to use.

42
00:01:49.110 --> 00:01:51.480
Whilst, if we go back and have a look at GPT-5.2,

43
00:01:51.480 --> 00:01:53.160
you can see it's got the highest reasoning.

44
00:01:53.160 --> 00:01:54.300
Its speed is medium,

45
00:01:54.300 --> 00:01:56.430
and obviously it's their flagship model.

46
00:01:56.430 --> 00:01:58.770
It costs quite a lot of money to run that at scale.

47
00:01:58.770 --> 00:01:59.880
There's a really good page

48
00:01:59.880 --> 00:02:01.590
that I'll put a link to in this video

49
00:02:01.590 --> 00:02:03.870
where you can easily find lots of different types of models

50
00:02:03.870 --> 00:02:06.810
that are on OpenAI and all of their varying trade-offs

51
00:02:06.810 --> 00:02:09.690
in terms of the input tokens versus output tokens.

52
00:02:09.690 --> 00:02:11.880
You'll find that if the document is too large

53
00:02:11.880 --> 00:02:15.510
in comparison to how many input tokens you can use,

54
00:02:15.510 --> 00:02:17.610
then that basically will cause the model

55
00:02:17.610 --> 00:02:18.900
to reject that response

56
00:02:18.900 --> 00:02:20.400
and you won't be able to see anything.

57
00:02:20.400 --> 00:02:22.050
So I've just produced a load of text

58
00:02:22.050 --> 00:02:23.340
just to show you what this is like,

59
00:02:23.340 --> 00:02:24.510
and you'll see that actually

60
00:02:24.510 --> 00:02:26.190
I can't even submit this message

61
00:02:26.190 --> 00:02:28.020
because the message is too long.

62
00:02:28.020 --> 00:02:31.380
So every large language model has a restriction

63
00:02:31.380 --> 00:02:33.540
on how many tokens it can go in

64
00:02:33.540 --> 00:02:37.050
and how many tokens it will produce out as the response.

65
00:02:37.050 --> 00:02:38.430
Okay, so the next thing I want you to do

66
00:02:38.430 --> 00:02:39.690
is just a quick exercise.

67
00:02:39.690 --> 00:02:42.300
I want you to go to your browser like Google Chrome.

68
00:02:42.300 --> 00:02:44.820
I want you to type in OpenAI pricing.

69
00:02:44.820 --> 00:02:46.830
I want you to click on the first result,

70
00:02:46.830 --> 00:02:48.450
and we're just gonna have a look at the different types

71
00:02:48.450 --> 00:02:51.360
of pricing for different types of models.

72
00:02:51.360 --> 00:02:53.310
So I'm just gonna go to browser,

73
00:02:53.310 --> 00:02:56.400
to a browser, go OpenAI pricing in Google,

74
00:02:56.400 --> 00:02:57.233
and then you're gonna click this one here,

75
00:02:57.233 --> 00:02:59.340
the openai.com one.

76
00:02:59.340 --> 00:03:01.200
And what you'll see is you've got a range

77
00:03:01.200 --> 00:03:04.320
of different pricing for different types of models.

78
00:03:04.320 --> 00:03:05.880
So go and have a read of this, have a look.

79
00:03:05.880 --> 00:03:07.380
There's some other different types of prices

80
00:03:07.380 --> 00:03:08.340
that you can see here,

81
00:03:08.340 --> 00:03:09.960
but the main one I wanted you to look at

82
00:03:09.960 --> 00:03:12.990
is, for example, o1 versus o3-mini,

83
00:03:12.990 --> 00:03:15.360
and it will also show you the context length

84
00:03:15.360 --> 00:03:16.890
of these models too.

85
00:03:16.890 --> 00:03:18.600
Have a look at the input price,

86
00:03:18.600 --> 00:03:20.490
and the output price are the ones

87
00:03:20.490 --> 00:03:23.243
that you'll want to be concerned with if you're a developer.

88
00:03:24.270 --> 00:03:25.650
Now, let's say you've got some texts

89
00:03:25.650 --> 00:03:26.700
and you're looking to find out

90
00:03:26.700 --> 00:03:28.800
how many tokens are inside that text.

91
00:03:28.800 --> 00:03:32.130
Well, OpenAI have built a really easy to use tool

92
00:03:32.130 --> 00:03:35.280
called a Tokenizer, which you can access

93
00:03:35.280 --> 00:03:38.520
by the platform.openai.com/tokenizer.

94
00:03:38.520 --> 00:03:40.290
If you go and click on this link,

95
00:03:40.290 --> 00:03:43.080
basically it will take you to a webpage,

96
00:03:43.080 --> 00:03:45.360
and you can see different types of models.

97
00:03:45.360 --> 00:03:46.920
You can put in some text here,

98
00:03:46.920 --> 00:03:49.860
and it will show you how many tokens that text has,

99
00:03:49.860 --> 00:03:51.840
and you can see if you look at the color,

100
00:03:51.840 --> 00:03:55.200
the separate tokens, you can also click on a show example,

101
00:03:55.200 --> 00:03:56.640
and then you can see how many characters

102
00:03:56.640 --> 00:03:57.900
and tokens those have.

103
00:03:57.900 --> 00:03:59.430
They also have IDs,

104
00:03:59.430 --> 00:04:03.660
so each token gets translated to a numeric value,

105
00:04:03.660 --> 00:04:05.970
which is allowing the large language model

106
00:04:05.970 --> 00:04:08.370
to understand what that token is in real time.

107
00:04:08.370 --> 00:04:12.300
It's also possible to automatically get the token input size

108
00:04:12.300 --> 00:04:15.900
from text by using a Python package called Tiktoken.

109
00:04:15.900 --> 00:04:18.960
This allows you to take any model that OpenAI offers,

110
00:04:18.960 --> 00:04:20.100
put the model name in,

111
00:04:20.100 --> 00:04:22.800
and figure out how much the token count is

112
00:04:22.800 --> 00:04:23.970
for a specific message.

113
00:04:23.970 --> 00:04:26.070
There are lots of different techniques that you can use

114
00:04:26.070 --> 00:04:27.960
to overcome the token limit.

115
00:04:27.960 --> 00:04:31.260
The first one is if you are struggling with the input size

116
00:04:31.260 --> 00:04:32.790
for what you're working with,

117
00:04:32.790 --> 00:04:35.070
use a bigger model or more powerful models.

118
00:04:35.070 --> 00:04:36.570
You could also shorten your prompt

119
00:04:36.570 --> 00:04:39.450
or the context that you're including into these models.

120
00:04:39.450 --> 00:04:41.820
You can also chunk, which basically means

121
00:04:41.820 --> 00:04:44.550
rather than putting all of the context in one go,

122
00:04:44.550 --> 00:04:46.710
you could put that in several messages.

123
00:04:46.710 --> 00:04:48.930
You can also do something called windowed chunks,

124
00:04:48.930 --> 00:04:51.510
which is you have a sliding window of chunks,

125
00:04:51.510 --> 00:04:53.550
and we'll talk about that later in the course.

126
00:04:53.550 --> 00:04:55.380
And also summarizing the chunks.

127
00:04:55.380 --> 00:04:57.720
So when the message streams gets too long

128
00:04:57.720 --> 00:04:59.670
or the context is too long,

129
00:04:59.670 --> 00:05:02.100
asking for smaller summaries of the information

130
00:05:02.100 --> 00:05:04.440
so that you can put those summaries in instead.

131
00:05:04.440 --> 00:05:06.143
Cool, I'll see you in the next one.

