WEBVTT

1
00:00:00.360 --> 00:00:02.670
<v Maximilian>Now there is more you can control here.</v>

2
00:00:02.670 --> 00:00:05.640
Most importantly, structured output is interesting.

3
00:00:05.640 --> 00:00:07.650
But before we get there,

4
00:00:07.650 --> 00:00:11.070
I wanna dive into some other important settings

5
00:00:11.070 --> 00:00:13.980
because of course we also got our main app settings here

6
00:00:13.980 --> 00:00:15.450
in the bottom right corner.

7
00:00:15.450 --> 00:00:19.590
Now here you can control the appearance of this app,

8
00:00:19.590 --> 00:00:24.590
but you can also tweak the runtime and hardware settings.

9
00:00:24.930 --> 00:00:29.850
Runtime simply means which underlying tool is being used

10
00:00:29.850 --> 00:00:32.340
for loading and running these models.

11
00:00:32.340 --> 00:00:34.770
And as I mentioned way earlier in the course,

12
00:00:34.770 --> 00:00:39.150
LM Studio like Ollama typically uses llama.cpp.

13
00:00:39.150 --> 00:00:41.790
Now if you are on a macOS device

14
00:00:41.790 --> 00:00:45.210
on a modern Apple Silicon Mac device,

15
00:00:45.210 --> 00:00:48.870
you could also use MLX as an alternative engine.

16
00:00:48.870 --> 00:00:52.740
And you see that I got both engines installed here.

17
00:00:52.740 --> 00:00:54.840
And which engine will be used depends

18
00:00:54.840 --> 00:00:57.960
on which file is loaded, which model file.

19
00:00:57.960 --> 00:01:02.960
In my case here, this is a GGUF file

20
00:01:03.180 --> 00:01:08.180
and therefore it will actually use the llama.cpp runtime.

21
00:01:08.970 --> 00:01:10.620
And I can change this here,

22
00:01:10.620 --> 00:01:13.320
but if I had downloaded an MLX file

23
00:01:13.320 --> 00:01:15.990
and some models exist in this format,

24
00:01:15.990 --> 00:01:18.480
then this MLX runtime would be used.

25
00:01:18.480 --> 00:01:20.403
So that's just nice to know.

26
00:01:21.270 --> 00:01:24.180
You can theoretically also manage the versions of

27
00:01:24.180 --> 00:01:26.850
that runtime if you knew that you wanted to switch

28
00:01:26.850 --> 00:01:28.470
to a very specific version,

29
00:01:28.470 --> 00:01:31.740
but in reality, it's unlikely that you would want

30
00:01:31.740 --> 00:01:33.270
to change something there.

31
00:01:33.270 --> 00:01:34.770
Still interesting to see

32
00:01:34.770 --> 00:01:37.230
that multiple runtimes may exist depending

33
00:01:37.230 --> 00:01:40.440
on your operating system and underlying hardware.

34
00:01:40.440 --> 00:01:42.900
Speaking of hardware, that's also something

35
00:01:42.900 --> 00:01:45.120
that can be inspected here in the settings.

36
00:01:45.120 --> 00:01:47.490
Specifically, if you had multiple GPUs,

37
00:01:47.490 --> 00:01:51.900
you could control which GPUs are used for the model

38
00:01:51.900 --> 00:01:53.280
for running the model.

39
00:01:53.280 --> 00:01:56.430
In my case, I only got my one Apple chip,

40
00:01:56.430 --> 00:02:00.060
which acts as the GPU, so that's the one that is being used.

41
00:02:00.060 --> 00:02:01.740
And if I had disabled that,

42
00:02:01.740 --> 00:02:05.250
then my models would be loaded onto the CPU instead,

43
00:02:05.250 --> 00:02:07.560
which is less efficient than using the GPU.

44
00:02:07.560 --> 00:02:10.410
So you typically wanna enable your GPU

45
00:02:10.410 --> 00:02:13.980
and make sure that the models get executed there.

46
00:02:13.980 --> 00:02:17.250
Here you also see the available VRAM capacity.

47
00:02:17.250 --> 00:02:18.510
As mentioned earlier,

48
00:02:18.510 --> 00:02:21.840
those Apple M1 chips have something called unified memory,

49
00:02:21.840 --> 00:02:25.470
which means almost the entire RAM is VRAM.

50
00:02:25.470 --> 00:02:26.820
You also see that down here,

51
00:02:26.820 --> 00:02:31.410
and of course you will see the values for your system there.

52
00:02:31.410 --> 00:02:34.710
Now this guardrails part is also interesting.

53
00:02:34.710 --> 00:02:36.090
The idea here is

54
00:02:36.090 --> 00:02:39.810
that you can control whether LM Studio allows you

55
00:02:39.810 --> 00:02:43.290
to load models into your memory

56
00:02:43.290 --> 00:02:46.110
that shouldn't be loaded into memory or not.

57
00:02:46.110 --> 00:02:47.730
The default is strict

58
00:02:47.730 --> 00:02:51.960
and that is generally what you should keep enabled.

59
00:02:51.960 --> 00:02:54.390
But if you knew that you wanted

60
00:02:54.390 --> 00:02:57.900
to load a pretty large model into memory

61
00:02:57.900 --> 00:02:59.730
and you knew that it would work

62
00:02:59.730 --> 00:03:02.970
and that you would only use it in a way, that's no problem,

63
00:03:02.970 --> 00:03:04.680
you could set this to off

64
00:03:04.680 --> 00:03:07.740
or more relaxed so that you are able

65
00:03:07.740 --> 00:03:12.740
to squeeze bigger models into your limited resources.

66
00:03:12.930 --> 00:03:15.390
That is something you should do with care though,

67
00:03:15.390 --> 00:03:18.240
because of course you can overload your system here

68
00:03:18.240 --> 00:03:20.430
and potentially crash your system.

69
00:03:20.430 --> 00:03:23.490
So I do recommend sticking to strict here,

70
00:03:23.490 --> 00:03:25.260
but if you had the knowledge

71
00:03:25.260 --> 00:03:28.290
and you had the advanced use case that you needed

72
00:03:28.290 --> 00:03:31.260
to basically go beyond the limits,

73
00:03:31.260 --> 00:03:36.180
you could disable or relax those guardrails.

74
00:03:36.180 --> 00:03:38.610
You can also choose custom to control

75
00:03:38.610 --> 00:03:42.240
how much memory may be used for the model to be loaded.

76
00:03:42.240 --> 00:03:45.570
So that also allows you to go into the other direction.

77
00:03:45.570 --> 00:03:48.900
You might know that of the available memory,

78
00:03:48.900 --> 00:03:51.480
48 gigabytes of VRAM in my case,

79
00:03:51.480 --> 00:03:53.850
you only want to use let's say

80
00:03:53.850 --> 00:03:56.520
10 gigabytes at most for a model.

81
00:03:56.520 --> 00:04:00.480
You could then set a custom guardrail to limit that,

82
00:04:00.480 --> 00:04:03.210
and any model that would require more memory would be

83
00:04:03.210 --> 00:04:05.490
blocked from being loaded.

84
00:04:05.490 --> 00:04:06.930
And that can of course be useful

85
00:04:06.930 --> 00:04:10.350
if you also have other processes running on your system

86
00:04:10.350 --> 00:04:12.993
that need a significant amount of VRAM.

87
00:04:13.860 --> 00:04:15.723
Now here, I'll go back to strict.