WEBVTT

00:00.120 --> 00:01.280
Hey there Eden here.

00:01.280 --> 00:07.840
And in this video, we're going to switch from using OpenAI GPT five to use an open weights model.

00:07.880 --> 00:09.320
Gemma three by Google.

00:09.360 --> 00:12.200
Running locally on our machine using Olama.

00:12.480 --> 00:17.680
Now this is one of the chain's strength, and this is one of the reasons it became so popular when it

00:17.680 --> 00:23.720
came out, because it gives us the ability to interchangeably change the models, the llms that we're

00:23.720 --> 00:24.360
using.

00:24.360 --> 00:28.760
And I like to say that we can switch llms in length chain, like we can switch our socks.

00:28.800 --> 00:30.400
And the process is very simple.

00:30.400 --> 00:35.800
And it really boils down to one line of code that we need to change with the relevant client that we

00:35.800 --> 00:37.800
need to initialize within LinkedIn.

00:37.800 --> 00:42.440
So the beautiful thing here is that the interface for the entire code stays the same.

00:42.640 --> 00:46.160
All we need to do is change the chat model that we're using.

00:46.400 --> 00:52.480
And in this video I'll be showing you the option of hosting the OpenAI model yourself in your machine.

00:52.800 --> 00:58.200
And there are also cloud based providers like grok, for example, where we can access those kinds of

00:58.200 --> 01:03.240
models from the cloud by generating an API key and creating the relevant client.

01:03.440 --> 01:04.740
So let's go to the code.

01:05.500 --> 01:10.660
But before we go and change our link chain code, we need to make sure that Allama is installed in our

01:10.660 --> 01:15.180
machine and that we downloaded Gma3 into our local system here.

01:15.340 --> 01:17.380
So let me go and show you how to do that.

01:17.460 --> 01:24.260
So in Allama let's go and click download and let's go and download for Mac OS.

01:24.300 --> 01:27.700
Of course you should download for your own operating system.

01:28.060 --> 01:34.540
Now, once it finishes downloading it, I'm going to go and install it as I do any software on Mac.

01:34.980 --> 01:38.540
And if you're on windows, simply go with the installation wizard.

01:38.900 --> 01:41.580
And let me go and drag it to the application folder.

01:41.580 --> 01:46.580
And I'm going to replace this because I already have this installed and it's installed right now.

01:46.740 --> 01:49.900
So let me go and open up Allama.

01:49.940 --> 01:56.220
I'm going to write in terminal Allama and we can see now the CLI here and in the available commands.

01:56.220 --> 02:02.300
The gist of it is that we can pull models by writing Allama pull and then the model full name.

02:02.580 --> 02:07.710
And if we want to go and try it in the terminal and talk to that model.

02:07.710 --> 02:11.150
We can use serve on the model we just put.

02:11.350 --> 02:14.790
So let me go now to Olama to the model section here.

02:14.790 --> 02:17.910
And here for example, we can see the new GPT OS.

02:18.430 --> 02:22.550
And we can see now the size of each variant of this model.

02:22.590 --> 02:24.310
How many parameters does it have.

02:24.470 --> 02:29.910
And this is a very good model, which is going to be more than enough for this course.

02:30.110 --> 02:35.470
And it's going to support also function calling and agentic tasks that we're going to also be doing

02:35.470 --> 02:36.350
in this course.

02:36.350 --> 02:38.670
So I also recommend you doing it.

02:38.710 --> 02:44.350
However, I am not going to download it because look at the size is simply massive and I can't fit it

02:44.350 --> 02:45.270
in my computer.

02:45.870 --> 02:51.470
So we can take a look at Gemma three for example, which offers lighter alternatives.

02:51.470 --> 02:58.030
We have many variants of many sizes we can choose from, and for this demo I'm going to choose the 270

02:58.030 --> 03:02.230
million parameters because this is going to be the lightest and fastest.

03:02.390 --> 03:08.510
So let me copy here the full name and let me go to Olama and right here Olama list, which we can see

03:08.510 --> 03:11.650
all the downloaded models which we don't have anything right now.

03:11.850 --> 03:17.770
And now let's write all llama pull and we want to pull that a model.

03:17.770 --> 03:19.970
So we want to give here the full name.

03:20.130 --> 03:22.610
And now we can see we're downloading it.

03:22.610 --> 03:25.810
So let me just fast forward this download for a bit.

03:32.090 --> 03:34.490
And we can see it's finished downloading it.

03:34.530 --> 03:36.690
Now let's write all llama list.

03:36.690 --> 03:39.170
And we can see now the new model that we downloaded.

03:39.690 --> 03:43.050
And let me now open the llama manual with all llama.

03:44.090 --> 03:47.410
And I want to use the run command.

03:47.410 --> 03:49.810
So all I'm a run and the name of the model.

03:50.370 --> 03:54.730
And now it's going to spin up an instance where we can talk to this model.

03:54.730 --> 03:57.330
So it's going to be a CLI interface.

03:59.170 --> 04:00.730
So it's going to fire it up.

04:00.730 --> 04:04.730
Let's go and wait for a second and let me write here.

04:04.730 --> 04:05.970
Hello for example.

04:06.330 --> 04:08.090
And we can see right back the answer.

04:08.090 --> 04:09.690
Hello how can I help you today.

04:10.410 --> 04:13.310
So this is us using it with Ola.

04:13.550 --> 04:21.950
So now we want to use the Linkchain Ola integration to use this local running open weights model.

04:22.390 --> 04:28.550
Gemma three 270 million parameters and we want to use it with our code.

04:29.670 --> 04:32.390
So we want to create an LM variable here.

04:32.830 --> 04:35.910
And let's go and let me paste here this line.

04:36.270 --> 04:43.550
And it's going to be an object of the class Chet Olama where the temperature is going to be zero and

04:43.590 --> 04:48.670
the model is going to be Gemma, 270 million parameters.

04:48.910 --> 04:51.990
And I remind you, we already have this model in our machine.

04:52.390 --> 04:55.910
Now we need to import the Chet Alabama object.

04:55.910 --> 04:59.790
So let's go now to the top of the file where all of our imports.

05:00.790 --> 05:04.830
And let me go and import from LinkedIn Olama.

05:04.830 --> 05:09.430
We already have it installed and we want to import the Chet Olama package.

05:09.950 --> 05:10.670
That's it.

05:10.830 --> 05:11.870
Let's go and run it.

05:15.510 --> 05:20.240
We can see we got here to this breakpoint and we can see now the response.

05:20.840 --> 05:24.880
Notice how fast it was to run it because it's running locally on our machine.

05:24.880 --> 05:26.800
And this is a super light model.

05:27.080 --> 05:32.400
Now if we'll examine also the response we can see we have the summary of Elon Musk.

05:32.400 --> 05:37.080
But we don't have a different section on separate section on the interesting facts.

05:37.280 --> 05:43.360
So while this model was super fast, we can see it really didn't follow everything we asked it to do.

05:43.600 --> 05:49.680
And this is the trade off when using open weights models, which are lite models in a faster and cheaper.

05:49.920 --> 05:55.200
The quality of the answer that we got is probably going to be lower than the first tier models.

05:57.240 --> 05:57.720
Cool.

05:57.720 --> 06:00.920
So we saw how easy it is to switch a model in link chain.

06:00.960 --> 06:06.480
If you want to use instead of OpenAI or Google Gemini and open weights model for this course, you can

06:06.480 --> 06:13.280
do that and I highly recommend you will be using GPT OSS, because this is a model with deep reasoning.

06:13.480 --> 06:19.080
It supports function calling and it's suited for agentic workloads, which we're going to be implementing

06:19.080 --> 06:20.000
in this course.