WEBVTT

1
00:00:00.480 --> 00:00:02.250
<v Instructor>So we got LM Studio installed</v>

2
00:00:02.250 --> 00:00:06.570
and set up, how can we now use an open model?

3
00:00:06.570 --> 00:00:09.990
Now, as you learned in the first course section,

4
00:00:09.990 --> 00:00:12.000
you could of course visit Hugging Face

5
00:00:12.000 --> 00:00:14.520
and there, the models catalog and take a look

6
00:00:14.520 --> 00:00:18.690
at which model you personally find interesting.

7
00:00:18.690 --> 00:00:23.100
It's worth noting though that not all models are supported

8
00:00:23.100 --> 00:00:25.710
by LM Studio, necessarily though,

9
00:00:25.710 --> 00:00:29.790
but you can find out by going to the model card

10
00:00:29.790 --> 00:00:32.250
of a model you might want to use.

11
00:00:32.250 --> 00:00:35.040
And then there on Hugging Face, you will find this,

12
00:00:35.040 --> 00:00:38.160
use this model dropdown here.

13
00:00:38.160 --> 00:00:39.150
And if you expand this,

14
00:00:39.150 --> 00:00:41.940
you see the options you have for running it locally.

15
00:00:41.940 --> 00:00:43.230
You might need to configure

16
00:00:43.230 --> 00:00:45.690
the local apps you have installed in your system though.

17
00:00:45.690 --> 00:00:47.790
And for that, you might need to log in

18
00:00:47.790 --> 00:00:50.010
with your Hugging Face account.

19
00:00:50.010 --> 00:00:51.960
By the way, this will not be the only way

20
00:00:51.960 --> 00:00:54.750
of finding new models and they offer if you don't have that.

21
00:00:54.750 --> 00:00:56.190
And if you don't want that, that's fine.

22
00:00:56.190 --> 00:00:58.950
I'll show you a different, more straightforward way

23
00:00:58.950 --> 00:01:00.330
of finding models in a second.

24
00:01:00.330 --> 00:01:01.770
But that would be one way.

25
00:01:01.770 --> 00:01:04.245
So if you, for example, configure that you have Ollama

26
00:01:04.245 --> 00:01:06.180
and LM Studio on your system,

27
00:01:06.180 --> 00:01:07.620
you should see these options here

28
00:01:07.620 --> 00:01:10.920
if you can run this model, like this with LM Studio.

29
00:01:10.920 --> 00:01:12.960
Now here, I see nothing

30
00:01:12.960 --> 00:01:17.880
because indeed, gemma 3 is supported by LM Studio,

31
00:01:17.880 --> 00:01:20.100
but not the raw version of it.

32
00:01:20.100 --> 00:01:23.070
Instead, as you learned in the quantization

33
00:01:23.070 --> 00:01:25.410
and hardware requirements section,

34
00:01:25.410 --> 00:01:29.010
typically when running models locally, you run a quantized,

35
00:01:29.010 --> 00:01:31.650
a compressed version of that model.

36
00:01:31.650 --> 00:01:34.620
So on Hugging Face, we would scroll down to this list here

37
00:01:34.620 --> 00:01:36.240
to find the quantization

38
00:01:36.240 --> 00:01:38.700
that have been created based on this model.

39
00:01:38.700 --> 00:01:41.610
And if we then take a look at some of them there,

40
00:01:41.610 --> 00:01:45.280
eventually we will find some that do support LM Studio

41
00:01:46.260 --> 00:01:50.400
and you could then click on this to actually open LM Studio

42
00:01:50.400 --> 00:01:52.203
and start loading that model.

43
00:01:53.310 --> 00:01:55.890
But that's not the only way of finding models.

44
00:01:55.890 --> 00:01:57.390
The more straightforward

45
00:01:57.390 --> 00:02:00.480
and convenient way of finding models that are supported

46
00:02:00.480 --> 00:02:04.290
by LM Studio is to go back to LM Studio

47
00:02:04.290 --> 00:02:09.290
and then go to the settings and dare to model search.

48
00:02:09.450 --> 00:02:12.300
Alternatively, you could use the shortcut that's shown here,

49
00:02:12.300 --> 00:02:13.560
which might differ for you,

50
00:02:13.560 --> 00:02:15.960
but you can use whatever is shown there for you.

51
00:02:17.190 --> 00:02:19.320
And here, you'll find a list of models

52
00:02:19.320 --> 00:02:21.570
that are supported by LM Studio.

53
00:02:21.570 --> 00:02:24.660
So for example, here, I could use Gemma 3

54
00:02:24.660 --> 00:02:28.500
and then I can choose which flavor, which version I want.

55
00:02:28.500 --> 00:02:31.830
Do I want the big one with 27 billion parameters,

56
00:02:31.830 --> 00:02:35.010
which is bigger to load and requires more RAM?

57
00:02:35.010 --> 00:02:38.340
Or do I want the very small one with one billion parameters,

58
00:02:38.340 --> 00:02:42.123
which runs on low performance machines as well?

59
00:02:43.309 --> 00:02:45.570
And here, I'll go for the middle ground

60
00:02:45.570 --> 00:02:47.670
and choose the 12 billion parameters model,

61
00:02:47.670 --> 00:02:48.870
but it really doesn't matter.

62
00:02:48.870 --> 00:02:50.730
You can go with the very small one too,

63
00:02:50.730 --> 00:02:52.830
and you could also use a different model.

64
00:02:52.830 --> 00:02:55.500
This does not matter. It's just an example here.

65
00:02:55.500 --> 00:02:56.970
It's also worth noting

66
00:02:56.970 --> 00:02:59.490
that all these models here are loading

67
00:02:59.490 --> 00:03:03.090
through LM Studio, through that model search

68
00:03:03.090 --> 00:03:06.060
will normally be quantized.

69
00:03:06.060 --> 00:03:09.780
For example, this here is a Q4 quantization,

70
00:03:09.780 --> 00:03:13.620
which means it uses four bit integer

71
00:03:13.620 --> 00:03:16.770
to reduce the amount of memory this model needs

72
00:03:16.770 --> 00:03:18.690
for running it locally.

73
00:03:18.690 --> 00:03:20.820
So no matter if you're using this model

74
00:03:20.820 --> 00:03:24.600
or some other model when loading them into LM Studio,

75
00:03:24.600 --> 00:03:27.990
you typically use quantized version simply

76
00:03:27.990 --> 00:03:31.290
because they are faster, they take up less space.

77
00:03:31.290 --> 00:03:35.040
And there is no real reason to use the quantized one

78
00:03:35.040 --> 00:03:37.350
as explained in that quantization

79
00:03:37.350 --> 00:03:39.630
and hardware requirement section.

80
00:03:39.630 --> 00:03:42.390
So you can select whichever model you are interested in.

81
00:03:42.390 --> 00:03:44.880
And then here, right inside of LM Studio,

82
00:03:44.880 --> 00:03:46.890
you can learn a bit more about it.

83
00:03:46.890 --> 00:03:50.970
You learn about the number of parameters used by that model,

84
00:03:50.970 --> 00:03:55.200
the number of downloads when it was updated and so on.

85
00:03:55.200 --> 00:03:58.590
You find a link to the model card on Hugging Face,

86
00:03:58.590 --> 00:04:00.840
which you can click, to be taken

87
00:04:00.840 --> 00:04:04.800
to that respective Hugging Face page of that model.

88
00:04:04.800 --> 00:04:07.260
And you also find the model read me down there

89
00:04:07.260 --> 00:04:09.963
with more details about the model.

90
00:04:10.920 --> 00:04:13.260
You can also choose from download options,

91
00:04:13.260 --> 00:04:15.720
though for this model here, I only have one.

92
00:04:15.720 --> 00:04:19.260
And the default option in general is what you want.

93
00:04:19.260 --> 00:04:22.590
Though for some models like for example,

94
00:04:22.590 --> 00:04:25.110
these different versions of the gemma model,

95
00:04:25.110 --> 00:04:26.520
you can actually choose

96
00:04:26.520 --> 00:04:29.640
from different quantization options here.

97
00:04:29.640 --> 00:04:32.550
And the higher this number here is,

98
00:04:32.550 --> 00:04:35.550
the more space and memory that model will take up.

99
00:04:35.550 --> 00:04:37.770
And of course that's not just available for gemma.

100
00:04:37.770 --> 00:04:39.690
If you take a look at some other models,

101
00:04:39.690 --> 00:04:42.540
like the deep seek models here, though,

102
00:04:42.540 --> 00:04:44.880
these are slimmed down versions

103
00:04:44.880 --> 00:04:47.760
of deep seek R1, not the large one.

104
00:04:47.760 --> 00:04:49.560
But if you take a look at models like these,

105
00:04:49.560 --> 00:04:52.593
you also find different quantization options here.

106
00:04:53.520 --> 00:04:56.370
These gemma models here are a bit special

107
00:04:56.370 --> 00:05:00.450
because this QAT model, which I'm using here in this demo,

108
00:05:00.450 --> 00:05:04.050
has actually been already trained with quantization in mind,

109
00:05:04.050 --> 00:05:06.720
which is why there are no different options to choose from,

110
00:05:06.720 --> 00:05:08.400
but for other models, that differs.

111
00:05:08.400 --> 00:05:10.953
And you can choose exactly the version you want.

112
00:05:11.790 --> 00:05:14.520
You also see the size of the download here,

113
00:05:14.520 --> 00:05:17.190
so that will also be the size occupied on your disc,

114
00:05:17.190 --> 00:05:18.840
once the model has been downloaded

115
00:05:18.840 --> 00:05:20.910
because it will be stored on your disc

116
00:05:20.910 --> 00:05:24.270
so that LM Studio can use it now, tomorrow, and so on.

117
00:05:24.270 --> 00:05:27.900
Of course, you can also delete models from your disc though.

118
00:05:27.900 --> 00:05:29.670
So here, I wanna use this model,

119
00:05:29.670 --> 00:05:31.200
I picked it, I'm happy with it.

120
00:05:31.200 --> 00:05:32.760
And now I can click download here

121
00:05:32.760 --> 00:05:35.670
and this will download it onto your machine,

122
00:05:35.670 --> 00:05:37.830
store it in a specific folder,

123
00:05:37.830 --> 00:05:41.790
and then make it available for use here in LM Studio.

124
00:05:41.790 --> 00:05:44.823
And I'll be back once that download finished for me here.

125
00:05:46.050 --> 00:05:48.390
So here for me, the download finished

126
00:05:48.390 --> 00:05:52.650
and I could now click load model to load it into LM Studio

127
00:05:52.650 --> 00:05:54.000
because when you download it,

128
00:05:54.000 --> 00:05:57.300
it's simply stored on your hard drive on your system,

129
00:05:57.300 --> 00:06:00.210
but you then still need to load it to activate it,

130
00:06:00.210 --> 00:06:03.030
so to say, to be able to use it in LM Studio.

131
00:06:03.030 --> 00:06:05.550
Now alternatively, you can also do that through

132
00:06:05.550 --> 00:06:07.890
that model picker here at the top

133
00:06:07.890 --> 00:06:10.470
because once you have at least one model downloaded,

134
00:06:10.470 --> 00:06:13.800
if you open that picker, you should see that model here.

135
00:06:13.800 --> 00:06:15.330
And of course, if you have more models,

136
00:06:15.330 --> 00:06:17.820
you'll see more models here

137
00:06:17.820 --> 00:06:21.630
and you can simply click on a model here to load it.

138
00:06:21.630 --> 00:06:23.760
By the way, if you're not in user mode,

139
00:06:23.760 --> 00:06:24.870
if you're in power user

140
00:06:24.870 --> 00:06:29.220
or developer mode, you will see more options here.

141
00:06:29.220 --> 00:06:31.620
So if I am in power user mode,

142
00:06:31.620 --> 00:06:35.040
I don't just get this downloads window here,

143
00:06:35.040 --> 00:06:38.550
but I also get more configuration options

144
00:06:38.550 --> 00:06:40.260
when I load a model.

145
00:06:40.260 --> 00:06:42.990
So if I click on a model here, which is the one

146
00:06:42.990 --> 00:06:44.220
that's currently already loaded,

147
00:06:44.220 --> 00:06:46.440
but I could reload it, if I click on it here,

148
00:06:46.440 --> 00:06:48.150
I get more options.

149
00:06:48.150 --> 00:06:50.250
For example, I can set the context length,

150
00:06:50.250 --> 00:06:53.010
the context window size that's available,

151
00:06:53.010 --> 00:06:55.440
and we could even add more advanced options as well.

152
00:06:55.440 --> 00:06:57.570
But I'll get back to those later.

153
00:06:57.570 --> 00:06:59.280
For now, I'll go back to the user mode

154
00:06:59.280 --> 00:07:03.180
and just load the model as is with the default options.

155
00:07:03.180 --> 00:07:04.560
Now in the bottom right corner,

156
00:07:04.560 --> 00:07:07.680
you'll also see your system resources usage.

157
00:07:07.680 --> 00:07:09.360
For example, how much RAM

158
00:07:09.360 --> 00:07:11.610
or video RAM the model is taking up

159
00:07:11.610 --> 00:07:15.240
and how much work your CPU has to do.

160
00:07:15.240 --> 00:07:19.320
You can also click on this here to unload all models.

161
00:07:19.320 --> 00:07:21.930
If you click on that model picker, you'll also see

162
00:07:21.930 --> 00:07:25.140
how much memory this model is consuming.

163
00:07:25.140 --> 00:07:28.860
Now here, I am on M1 Mac and therefore,

164
00:07:28.860 --> 00:07:31.500
I have something that's called unified memory.

165
00:07:31.500 --> 00:07:35.640
So there is no difference between regular RAM and VRAM,

166
00:07:35.640 --> 00:07:37.710
the RAM made available by the graphics card.

167
00:07:37.710 --> 00:07:39.510
It's just one RAM here,

168
00:07:39.510 --> 00:07:41.760
which is why I have one RAM number here

169
00:07:41.760 --> 00:07:44.970
and then how much of that RAM is taken up by that model.

170
00:07:44.970 --> 00:07:46.530
If you're on a different system,

171
00:07:46.530 --> 00:07:49.110
it may differ whether the model was loaded

172
00:07:49.110 --> 00:07:50.640
into your regular RAM,

173
00:07:50.640 --> 00:07:53.700
if you don't have a graphics card with dedicated video RAM,

174
00:07:53.700 --> 00:07:56.310
or a graphics card that's not supported,

175
00:07:56.310 --> 00:07:58.920
or whether it's been loaded into the RAM,

176
00:07:58.920 --> 00:08:02.280
made available by that graphics card, the video RAM.

177
00:08:02.280 --> 00:08:04.470
It will simply depend on your system.

178
00:08:04.470 --> 00:08:07.890
In general, if the model is loaded into your regular RAM,

179
00:08:07.890 --> 00:08:11.700
not the video RAM, using it will be quite a bit slower.

180
00:08:11.700 --> 00:08:14.970
So generating the response will take a bit longer

181
00:08:14.970 --> 00:08:18.510
for the model, but it will still work.

182
00:08:18.510 --> 00:08:20.820
Okay, so here, I got this model selected now.

183
00:08:20.820 --> 00:08:23.530
Now of course I can click on create a new chat

184
00:08:24.450 --> 00:08:27.780
and then start chatting with that model.

185
00:08:27.780 --> 00:08:29.460
And just to make it really clear

186
00:08:29.460 --> 00:08:32.250
that this is running locally on our system,

187
00:08:32.250 --> 00:08:35.490
I will disable my wifi so

188
00:08:35.490 --> 00:08:37.890
that I got no internet connection,

189
00:08:37.890 --> 00:08:41.820
and still, I can ask what can you do for me

190
00:08:41.820 --> 00:08:45.180
and send this message to this locally running model.

191
00:08:45.180 --> 00:08:48.600
And then here in LM Studio, you'll get back a response

192
00:08:48.600 --> 00:08:51.990
that's automatically parsed and formatted as markdown.

193
00:08:51.990 --> 00:08:55.293
And that's therefore quite nice to read.

194
00:08:56.430 --> 00:08:59.370
Now I'll re-enable my wifi here,

195
00:08:59.370 --> 00:09:02.193
but we don't need it to use this model, of course.

