WEBVTT

00:00.040 --> 00:04.320
Hey, in this video, we're going to talk about the difference between chat models and reasoning models.

00:04.320 --> 00:07.880
We'll learn about why you should use different types of models depending upon the task.

00:07.920 --> 00:09.560
So firstly chat models.

00:09.560 --> 00:13.520
If you've ever used something like ChatGPT, then you've used a chat model.

00:13.520 --> 00:18.640
A chat model is basically a large language model that uses a sequence of messages in a message history

00:18.640 --> 00:21.600
as inputs, and returns messages as outputs.

00:21.640 --> 00:26.320
On the left here you can see we've got what is digital marketing and that is a user message.

00:26.760 --> 00:31.680
You can also see that we've got a response from OpenAI that is an AI message.

00:31.800 --> 00:35.160
There are three different types of messages that generally chat models expose.

00:35.160 --> 00:37.040
There's the developer system message.

00:37.200 --> 00:40.600
There's the user messages which are the messages that you would generally type.

00:40.600 --> 00:43.840
And then there's the AI messages that come back from a chat model.

00:43.840 --> 00:46.640
There are lots of large language model providers to choose from.

00:46.640 --> 00:49.880
For example, you have OpenAI with their GPT series.

00:49.880 --> 00:53.720
We also have anthropic, which offers Claude Sonnet and Claude Opus models.

00:53.720 --> 00:58.280
There is also meta with their free open source models, which are the llama models.

00:58.520 --> 01:00.280
There is also Google Gemini.

01:00.320 --> 01:04.480
They have both a Google Gemini Pro and a Google Gemini flash series of models.

01:04.480 --> 01:06.280
There's Mistral and many more.

01:06.280 --> 01:10.220
Some of the characteristics that you'll find with chat models is that they are actually very quick.

01:10.260 --> 01:11.780
They have low latency.

01:11.780 --> 01:16.140
These models respond quickly and often stream the tokens to you word by word.

01:16.380 --> 01:18.900
They're very good in terms of their reasoning capabilities.

01:18.940 --> 01:23.700
For example, Claude or GPT four are able to solve easy to medium difficulty tasks.

01:23.700 --> 01:28.540
Reasoning models are a newer type of model that uses something called test time compute scaling.

01:28.580 --> 01:34.220
And basically you can think of test time compute scaling as basically allocating more time to thinking

01:34.220 --> 01:37.700
and reasoning before returning the result or the response.

01:37.780 --> 01:43.860
So in general, these models take a lot longer because they're actually allocating time at execution

01:43.860 --> 01:48.180
to think through the problem rationally and then coming up with an overall solution.

01:48.660 --> 01:51.500
Let's have a look at this so we can see that it's got multiple turns.

01:51.540 --> 01:53.340
Turn one has an input.

01:53.340 --> 01:54.540
It has a bit of reasoning.

01:54.660 --> 01:58.300
That output then goes into the second turn which then has some input.

01:58.300 --> 02:01.420
And then that goes into the reasoning which then produces an output.

02:01.420 --> 02:04.340
So you can see it's a multi turn experience.

02:04.340 --> 02:07.060
And basically that means that it's got more thinking time.

02:07.060 --> 02:13.140
So when you ask a question we'll allocate more time before actually responding to you or your query.

02:13.180 --> 02:18.120
So the kind of characteristics you'll get when you're using a reasoning model is increased latency.

02:18.160 --> 02:22.920
Now OpenAI, which have released GPT five, have two modes for using this.

02:23.040 --> 02:25.960
There is the fast mode and there is the thinking mode.

02:26.000 --> 02:29.960
You'll find that if you use the thinking mode, it has additional latency.

02:30.000 --> 02:33.680
It takes longer to come back and longer for you to get that response.

02:33.720 --> 02:40.200
Now, due to this test time compute reasoning, these models will definitely outperform chat models.

02:40.200 --> 02:47.120
If you have a very hard problem or you're not sure, or you don't mind suffering with some additional

02:47.120 --> 02:48.720
time for the response to come back.

02:48.760 --> 02:53.720
If you want to make sure and it's a very important piece of work, definitely use a reasoning model.

02:53.840 --> 02:58.840
They have more time that they spend on the problem and they will give you more accurate results.

02:58.880 --> 03:03.960
Realize that there's always a trade off here, and you are spending longer waiting for the AI model

03:04.000 --> 03:04.920
to get back to you.

03:04.960 --> 03:08.600
Here are some examples of when you should use a chat model versus a reasoning model.

03:08.640 --> 03:13.320
So for example, when we're doing content, quick emails and posts, but when we might use a reasoning

03:13.320 --> 03:16.240
model is technical documentation or research analysis.

03:16.560 --> 03:20.200
For example, if we look at problem solving, basic calculations or simple Q&amp;A.

03:20.440 --> 03:23.910
But if we're using reasoning model, maybe we'd use that for multi-step debugging.

03:23.950 --> 03:27.230
Okay, so you now know the difference between the chat model and a reasoning model.

03:27.230 --> 03:29.510
Let's have a look at how we can do this inside of OpenAI.

03:29.550 --> 03:30.430
Go to ChatGPT.

03:31.150 --> 03:37.470
And then if you go and click in the top left, you can see that we have the option to use GPT five with

03:37.470 --> 03:43.390
either auto fast thinking or if you even upgrade even further, you can get a Pro, which is research

03:43.390 --> 03:44.430
grade intelligence.

03:44.710 --> 03:50.070
You can see that auto will basically do model routing, where it look at your query and decide is it

03:50.070 --> 03:52.510
going to be using a fast inference model.

03:52.510 --> 03:53.430
So a chat model.

03:53.670 --> 03:55.550
And also we have the option to do thinking.

03:55.550 --> 04:00.550
So if I say thinking and then what we're going to type is what is data science.

04:01.390 --> 04:08.630
And then what you'll see is when we then submit our query, you can see that ChatGPT is thinking if

04:08.630 --> 04:14.310
you click on the thinking, you'll see that you can actually see the thoughts and the reasoning tokens

04:14.310 --> 04:15.990
the ChatGPT decided to do.

04:16.350 --> 04:17.750
Then we get our response.

04:17.750 --> 04:24.430
So we had to wait seven seconds for the thinking model to come back with a response for us.

04:24.670 --> 04:30.070
If we contrast this with fast and I go and load a fast model, and then I'm going to keep in the same

04:30.070 --> 04:33.170
conversation and say, what is data science?

04:35.330 --> 04:38.130
You'll now see that we don't have any thinking tokens.

04:38.130 --> 04:40.250
We basically get an immediate response.

04:40.450 --> 04:47.210
However, the model isn't thinking about how best to service our query, so that is how you can switch

04:47.210 --> 04:47.370
now.

04:47.410 --> 04:53.330
Obviously, if you want, you can stick on auto mode, which will look at how complicated your query

04:53.330 --> 04:58.490
is, and it will decide whether to use a fast model or whether to use a thinking based reasoning model

04:58.490 --> 04:59.690
for best practices.

04:59.890 --> 05:04.450
I would recommend for tasks that require increased accuracy, or you must make sure that the error is

05:04.450 --> 05:04.930
less.

05:05.050 --> 05:09.410
Use reasoning models for tasks such as content generation or copywriting.

05:09.410 --> 05:11.130
You can use a smaller chat model.

05:11.410 --> 05:17.010
Reasoning models also take time to respond, so you can also have a reasoning model working on a task

05:17.010 --> 05:22.290
for a couple of minutes whilst you're also using Claude or GPT four alongside that.

05:22.290 --> 05:25.010
So don't wait 2 to 3 minutes for the task to complete.

05:25.210 --> 05:29.090
Go and work on another task whilst you're waiting for the reasoning model to get back to you.

05:29.090 --> 05:33.570
In the next section of the course, we'll look at how you can use standard practices such as list generation,

05:33.610 --> 05:38.370
sentiment analysis and writing clean and clear instructions to make sure that you get the most out of

05:38.370 --> 05:39.010
these models.
