WEBVTT

00:00.680 --> 00:06.050
All right, let me walk you through Grau, which is a cloud provider for LMS.

00:06.050 --> 00:12.680
And the reason why it's really useful is that it hosts all of the open source models, all the really

00:12.680 --> 00:15.410
useful ones anyway, and it's blazing fast.

00:15.410 --> 00:18.440
So say, for example, here I'm using llama three.

00:18.470 --> 00:22.430
Be it hit submit and boom, it's very fast.

00:22.520 --> 00:29.420
You can see here we've got 1200 tokens per second, which is insanely fast, right?

00:29.510 --> 00:32.180
Even for some of the bigger models as well.

00:32.210 --> 00:39.020
If I go to maybe the 70 B one, just hit submit.

00:44.300 --> 00:44.810
Really fast.

00:44.840 --> 00:47.480
It's still 300 something tokens per second.

00:47.660 --> 00:49.280
Why would you use this?

00:49.310 --> 00:55.580
It's a really quick and easy way to use some of the open source models like you saw, but primarily

00:55.580 --> 00:57.200
I use it for the API.

00:57.230 --> 01:02.810
So rather than trying to figure out how to host all these open source models myself and get it as fast

01:02.810 --> 01:06.050
as as what these guys can do instead, I use this.

01:06.140 --> 01:13.130
So if you click view code, you can see that it follows basically the same thing as the OpenAI API,

01:13.130 --> 01:19.550
which is really helpful in terms of making it easy to to swap in and out depending on what you're using.

01:19.580 --> 01:22.370
I'm just going to bring up a script here.

01:22.820 --> 01:24.800
I'm just going to walk you through this.

01:25.130 --> 01:27.200
That's the that's the script.

01:27.290 --> 01:33.500
And you can see that all we have to do here is pip install grok and grok with a Q.

01:33.500 --> 01:41.240
By the way, don't confuse this with grok with a k, which is the Xe.com or Twitter version of Python.

01:41.270 --> 01:43.160
So a little bit confusing naming there.

01:43.160 --> 01:48.080
But but yeah basically it's grok with a Q and it's a hosting service and model hosting service.

01:48.080 --> 01:53.810
So if you pip install that then all you have to do is load the environment.

01:53.810 --> 01:57.980
In particular it needs to be a grok API key variable.

01:58.100 --> 01:59.000
That's the main thing.

01:59.000 --> 02:00.590
And then you want to set your model.

02:00.740 --> 02:03.440
I've just created this function called grok.

02:03.470 --> 02:05.330
You can call it anywhere you want really.

02:05.330 --> 02:09.410
But the important thing is that you follow the OpenAI naming conventions here.

02:09.530 --> 02:15.130
So you create, you create a client and then you do a chart completion and then create.

02:15.130 --> 02:18.340
And then you put in your system message and your user message.

02:18.370 --> 02:23.620
And then in order to get the text out, it's the same thing as opening choices.

02:23.620 --> 02:27.370
And you get the first choice and then you get the message content.

02:27.400 --> 02:34.180
So here I've put in the system message which is which is basically getting it to tell jokes.

02:34.180 --> 02:38.740
And then I have a user prompt here, which is just the topic of the joke.

02:38.770 --> 02:40.840
So here it's telling a joke about cats.

02:40.960 --> 02:43.900
You can say, there you go working remotely.

02:46.090 --> 02:47.260
And tell a joke about it.

02:50.740 --> 02:51.160
There we go.

02:51.160 --> 02:53.110
So the jokes are actually pretty good.

02:53.140 --> 02:58.630
And this is I think one of the big benefits is that because you're using the open source models, I

02:58.630 --> 03:02.800
find that their performance can be really different from what you get from OpenAI.

03:03.460 --> 03:05.140
And this is really fast.

03:05.260 --> 03:10.030
So just to show you, one of the benefits you can get here is you can very quickly iterate through,

03:10.060 --> 03:11.620
just like all the models.

03:11.830 --> 03:17.890
It took four seconds to run it with one, two, three, four, five, six seven different models, and

03:17.890 --> 03:19.150
some of them were fairly big, right?

03:19.180 --> 03:21.580
Like the 3.2 text preview.

03:21.610 --> 03:24.940
They tend to get these models online pretty quickly.

03:24.970 --> 03:30.880
They also have audio for transcription and they have vision models as well for visual.

03:30.910 --> 03:31.630
Transcription.

03:31.660 --> 03:36.460
You can see here you can compare the different models and see how how good they are relative to these

03:36.460 --> 03:36.850
things.

03:36.880 --> 03:37.090
Yeah.

03:37.120 --> 03:39.580
This is this jerk better than this one.

03:39.640 --> 03:46.090
You can you can start to see whether it's worth it to do a bigger level model, for example, which

03:46.210 --> 03:48.490
shows some trade offs in terms of cost and latency.

03:48.490 --> 03:50.800
But it might be worth it for some tasks.

03:50.830 --> 03:55.870
And here you can see this is a common problem I find with the mixture model is that instead of giving

03:55.870 --> 04:00.310
you the joke, it just gives me some preamble first, and then it gives me the joke, which can be a

04:00.310 --> 04:00.880
bit of a pain.

04:00.910 --> 04:04.570
You can start iterating through these things and see how they work.

04:04.570 --> 04:05.530
So really simple.

04:05.530 --> 04:08.830
It's just as as simple as calling a OpenAI.

04:08.860 --> 04:13.300
But you're getting a really fast response and you're getting to use the latest open source models.

04:13.450 --> 04:14.500
Hopefully you guys check it out.

04:14.500 --> 04:18.970
This is my go to provider when I do use open source models.