WEBVTT

00:00.630 --> 00:04.200
-: Now let's explore how you can use LangChain locally

00:04.200 --> 00:07.440
with Llama 3 using LM Studio.

00:07.440 --> 00:09.000
Now, the first thing you're gonna need to do

00:09.000 --> 00:13.110
is download LM Studio onto your local machine.

00:13.110 --> 00:15.660
You can get this from lmstudio.ai.

00:15.660 --> 00:18.630
We'll also put the link inside of this ChatGPT lesson.

00:18.630 --> 00:20.730
After you've installed LM Studio,

00:20.730 --> 00:23.520
you can search for a variety of different specific models

00:23.520 --> 00:24.450
that you're interested in.

00:24.450 --> 00:27.090
So for example, I could be interested in this model.

00:27.090 --> 00:28.500
Then what you could do is then

00:28.500 --> 00:31.050
after you've done that, I would also suggest

00:31.050 --> 00:32.580
that you should probably download

00:32.580 --> 00:34.620
the one with the most downloads.

00:34.620 --> 00:38.430
And you can see someone here has put a Llama 3 instruct

00:38.430 --> 00:41.340
and they've done a 64K token limit.

00:41.340 --> 00:43.170
So that one's been quite interesting.

00:43.170 --> 00:47.360
For my sake as well, I've already got the Llama 3 model

00:47.360 --> 00:50.820
and you can get this specifically from Meta AI.

00:50.820 --> 00:52.500
Let me just show you the one

00:52.500 --> 00:53.910
that I would recommend getting for,

00:53.910 --> 00:55.770
which is this Meta AI one.

00:55.770 --> 00:58.890
So you'll want to download this one here.

00:58.890 --> 01:00.150
So there's the Llama Studio,

01:00.150 --> 01:02.130
and you can see I've already downloaded this

01:02.130 --> 01:05.700
at the Llama 3 8 billion instruct.

01:05.700 --> 01:09.237
Now depending upon what level of quantization you go for,

01:09.237 --> 01:12.840
and quantization is basically how the numbers become.

01:12.840 --> 01:16.790
If it's Q4 and Q5 and Q6 and Q8,

01:16.790 --> 01:19.350
you've got different levels of quantization.

01:19.350 --> 01:20.940
So this is eight bit quantization,

01:20.940 --> 01:23.430
you've got six bit K quantization.

01:23.430 --> 01:25.680
So generally the lower the bit number,

01:25.680 --> 01:28.590
you can see the smaller size of the file.

01:28.590 --> 01:29.730
But you should generally go

01:29.730 --> 01:32.160
for a larger model if you can,

01:32.160 --> 01:35.103
if the full GPU offload is possible.

01:35.103 --> 01:38.820
And then that basically means that this model will be fit

01:38.820 --> 01:41.130
inside of the graphics processing unit.

01:41.130 --> 01:43.184
So definitely go for the larger model,

01:43.184 --> 01:47.460
go for a higher version of quantization, pause the video,

01:47.460 --> 01:49.440
download the model, and then come back.

01:49.440 --> 01:51.360
And once you've downloaded that model,

01:51.360 --> 01:52.650
now that the model is downloaded,

01:52.650 --> 01:56.280
go over to the left hand side and click on AI Chat.

01:56.280 --> 01:58.050
And then what you can do is click on a model

01:58.050 --> 01:59.310
and select a model to load.

01:59.310 --> 02:00.143
So I'm for example,

02:00.143 --> 02:02.490
interested in this Meta Llama 3 Instruct.

02:02.490 --> 02:05.850
So then I'm gonna say the configuration being applied

02:05.850 --> 02:07.320
contains different system prompts.

02:07.320 --> 02:09.480
So I'm gonna say accept a new system prompt.

02:09.480 --> 02:13.620
And then what's happening right now is the Llama 3 model

02:13.620 --> 02:18.510
is being loaded locally into your graphics card into RAM

02:18.510 --> 02:21.210
so that you can interact with Llama locally.

02:21.210 --> 02:24.192
So let's say create a really nice story

02:24.192 --> 02:29.192
that talks about how children are having a good time.

02:29.340 --> 02:30.173
And here you go.

02:30.173 --> 02:33.450
So now we've got Llama 3 that's now talking to us

02:33.450 --> 02:35.640
inside of the LM Studio, which is great.

02:35.640 --> 02:37.950
So you've got a nice little story coming back here.

02:37.950 --> 02:38.970
Now this is all great,

02:38.970 --> 02:42.450
but what we wanna do is also take it one step further.

02:42.450 --> 02:44.730
So what I would also recommend you do

02:44.730 --> 02:46.290
is have a look at this.

02:46.290 --> 02:48.960
So let's stop generating your story.

02:48.960 --> 02:51.540
And then if you go over, you've got the playground,

02:51.540 --> 02:53.970
which allows you to do multi-model session.

02:53.970 --> 02:56.234
And you've got this thing called the local server.

02:56.234 --> 03:00.150
And the local server allows us to load a model.

03:00.150 --> 03:02.730
So I am gonna say, accept this model,

03:02.730 --> 03:04.290
that's gonna load in the model.

03:04.290 --> 03:07.230
And then what we can do is we can,

03:07.230 --> 03:09.960
you can see here, I can start a server.

03:09.960 --> 03:11.970
So when I click Start Server,

03:11.970 --> 03:15.483
that actually provides the same API endpoints

03:15.483 --> 03:18.014
that basically the OpenAI provides,

03:18.014 --> 03:21.090
which means that LangChain can now work locally

03:21.090 --> 03:24.553
against the Llama 3 model.

03:24.553 --> 03:26.340
The next step that you're gonna need

03:26.340 --> 03:29.340
to do is download any Jupyter notebooks that are inside

03:29.340 --> 03:32.161
of this specific lesson and install Python

03:32.161 --> 03:34.350
if you haven't got that already on your computer.

03:34.350 --> 03:35.640
So you'll need to install Python

03:35.640 --> 03:37.560
and the various packages installed.

03:37.560 --> 03:39.480
Now I'm gonna show you some of the notebooks.

03:39.480 --> 03:44.070
So the notebook you'll need is this llama3.ipynb.

03:44.070 --> 03:46.110
So once you've got that loaded up,

03:46.110 --> 03:48.462
basically what you're then gonna do is import

03:48.462 --> 03:52.440
LangChain output parsers and some various packages.

03:52.440 --> 03:53.490
And if you haven't installed those,

03:53.490 --> 03:55.050
you'll need to install those.

03:55.050 --> 03:58.260
And then after that, then you can set up some models

03:58.260 --> 04:00.150
that we want to parse back from Llama 3.

04:00.150 --> 04:04.170
So I want, for example, my articles to have sections,

04:04.170 --> 04:07.638
and I want my article outline to contain a title,

04:07.638 --> 04:09.780
which is a type of string.

04:09.780 --> 04:13.620
And I also want it to have a list of article

04:13.620 --> 04:14.820
that section outlines

04:14.820 --> 04:16.770
and all of the sections of the article.

04:17.707 --> 04:20.580
Now I set up my Pydantic output parser

04:20.580 --> 04:24.330
and I get my format instructions from my output parser.

04:24.330 --> 04:29.226
But instead of talking directly to ChatGPT on OpenAI's API,

04:29.226 --> 04:33.528
we can talk to the LM Studio's local server,

04:33.528 --> 04:37.770
which is at local host 1234 and v1.

04:37.770 --> 04:40.290
And then you can see I can set API key not needed.

04:40.290 --> 04:43.440
And I've also set these extra arguments here

04:43.440 --> 04:46.590
with a stop position on triple back top ticks.

04:46.590 --> 04:48.090
And the reason why we've done that is that

04:48.090 --> 04:50.100
because we want it to produce JSON,

04:50.100 --> 04:53.640
we're telling LangChain actors and SEO specialists,

04:53.640 --> 04:56.010
it's very great at generating articles,

04:56.010 --> 04:58.470
but it should only respond in JSON format.

04:58.470 --> 05:00.750
We're parsing in those format instructions,

05:00.750 --> 05:05.190
but we told it we must finish with this format here.

05:05.190 --> 05:07.170
So it has to finish with the end at this.

05:07.170 --> 05:08.610
And when we hit this,

05:08.610 --> 05:11.370
that's when our stop token is gonna be triggered.

05:11.370 --> 05:12.870
And then we have a human message,

05:12.870 --> 05:14.820
I want you to generate an effective article plan

05:14.820 --> 05:16.680
for me on digital marketing.

05:16.680 --> 05:19.230
And we provided the first set of back ticks.

05:19.230 --> 05:23.430
So then what we can see is if we go back to LM Studio now,

05:23.430 --> 05:25.320
so if I go back to LM Studio,

05:25.320 --> 05:28.110
you'll see that the tokens are starting to stream in

05:28.110 --> 05:30.638
and also see that we've got the title

05:30.638 --> 05:32.899
and now it's doing the description.

05:32.899 --> 05:36.390
So it's doing the description key.

05:36.390 --> 05:39.630
And so what we've got here is a JSON object

05:39.630 --> 05:42.300
that is being built up in real time

05:42.300 --> 05:44.850
and it's using the output parsers

05:44.850 --> 05:46.770
that we defined inside of LangChain.

05:46.770 --> 05:49.302
And again, we are hitting that chat completions endpoint

05:49.302 --> 05:51.675
but we're doing this locally.

05:51.675 --> 05:53.790
And now we've finished, let's have a look

05:53.790 --> 05:57.090
and see if we can parse that into a Pydantic object.

05:57.090 --> 06:00.330
So we've now got our outline, so let's have a look.

06:00.330 --> 06:01.830
So we've got the title,

06:01.830 --> 06:04.470
which is the Digital Marketing Article Plan,

06:04.470 --> 06:08.343
and we should also have a dot sections here as well.

06:09.210 --> 06:13.140
And each section has the title of that article.

06:13.140 --> 06:16.440
So we've managed to specifically look at

06:16.440 --> 06:18.293
how do you download LM Studio

06:18.293 --> 06:21.960
as well as installing the Llama 3 instruct model.

06:21.960 --> 06:23.250
And then after doing that,

06:23.250 --> 06:26.430
then what we've done is we've loaded a local server.

06:26.430 --> 06:28.620
After playing around with Llama 3 in LM Studio,

06:28.620 --> 06:30.330
we've loaded that local server,

06:30.330 --> 06:32.160
we've loaded a Jupyter notebook,

06:32.160 --> 06:35.100
and we've changed the base URL specifically

06:35.100 --> 06:37.470
to target local host 1234.

06:37.470 --> 06:40.500
And we've also put in that nice stop model argument,

06:40.500 --> 06:43.680
which basically stops the model as it finishes JSON.

06:43.680 --> 06:46.470
Because what I found is Llama 3 can specifically

06:46.470 --> 06:48.930
start hallucinating more and more JSON.

06:48.930 --> 06:52.080
So it's good to have a stop sequence attached to the model.

06:52.080 --> 06:52.913
Cool.

06:52.913 --> 06:55.020
Hopefully this gives you a bit of an introduction into

06:55.020 --> 06:56.520
how you can start to use LM Studio

06:56.520 --> 06:58.623
to do automations on your local computer.