WEBVTT

1
00:00:00.270 --> 00:00:02.370
<v Instructor>So how can you access</v>

2
00:00:02.370 --> 00:00:04.800
locally running Large Language Models

3
00:00:04.800 --> 00:00:07.923
programmatically with help of LM Studio?

4
00:00:08.910 --> 00:00:12.240
Well, as a first step in LM Studio,

5
00:00:12.240 --> 00:00:15.150
you should go to power user or developer

6
00:00:15.150 --> 00:00:18.450
since now it is all about development.

7
00:00:18.450 --> 00:00:21.150
And then there to this developer view,

8
00:00:21.150 --> 00:00:23.013
this developer area here.

9
00:00:24.150 --> 00:00:27.420
And there, you must switch this status

10
00:00:27.420 --> 00:00:30.120
of the LM Studio server,

11
00:00:30.120 --> 00:00:32.400
which is the thing you will be able to access

12
00:00:32.400 --> 00:00:36.183
programmatically from stopped to running.

13
00:00:37.320 --> 00:00:38.820
As a result of doing that,

14
00:00:38.820 --> 00:00:41.160
you'll get a log output down there,

15
00:00:41.160 --> 00:00:43.920
which shows you log messages from requests

16
00:00:43.920 --> 00:00:46.230
that were sent to that server.

17
00:00:46.230 --> 00:00:49.260
You can clear that and you'll see any new messages

18
00:00:49.260 --> 00:00:52.983
as they arrive or as new requests hit the server.

19
00:00:53.940 --> 00:00:57.960
You see the API endpoints, so the URLs you could say,

20
00:00:57.960 --> 00:01:01.860
to which you can send requests to, for example,

21
00:01:01.860 --> 00:01:04.020
get a list of available models

22
00:01:04.020 --> 00:01:07.443
or to generate a new response from a model.

23
00:01:08.970 --> 00:01:11.970
And you see the domain to which you have

24
00:01:11.970 --> 00:01:13.530
to send these requests.

25
00:01:13.530 --> 00:01:15.810
So you send them to a combination of that domain

26
00:01:15.810 --> 00:01:18.450
and then this endpoint.

27
00:01:18.450 --> 00:01:21.690
Now, if you don't know anything about writing code,

28
00:01:21.690 --> 00:01:23.670
then this lecture

29
00:01:23.670 --> 00:01:26.640
and this part of this section won't be for you

30
00:01:26.640 --> 00:01:28.500
because of course, programmatically

31
00:01:28.500 --> 00:01:32.340
using Large Language Models only is interesting

32
00:01:32.340 --> 00:01:34.680
if you know how to write code

33
00:01:34.680 --> 00:01:37.410
or if you at least have a basic understanding of

34
00:01:37.410 --> 00:01:39.270
how programming works.

35
00:01:39.270 --> 00:01:41.460
Of course you can use a Large Language Model

36
00:01:41.460 --> 00:01:42.870
to generate code for you,

37
00:01:42.870 --> 00:01:45.660
but you still need to have a general understanding

38
00:01:45.660 --> 00:01:47.280
of what an API is

39
00:01:47.280 --> 00:01:51.300
and what it means to send a request to an API.

40
00:01:51.300 --> 00:01:54.510
But if you got that, then LM Studio has you covered,

41
00:01:54.510 --> 00:01:57.870
because now we got a running server running locally

42
00:01:57.870 --> 00:02:02.040
on our system only accessible by us here on our machine.

43
00:02:02.040 --> 00:02:04.920
So, not available to the worldwide web

44
00:02:04.920 --> 00:02:06.705
unless you configured your machine

45
00:02:06.705 --> 00:02:09.840
to be reachable from everyone else out there, of course,

46
00:02:09.840 --> 00:02:12.270
which by default won't be the case.

47
00:02:12.270 --> 00:02:15.210
And you then also get some settings you can tweak here

48
00:02:15.210 --> 00:02:16.470
about that server.

49
00:02:16.470 --> 00:02:19.680
Now, for example, the port, which is also reflected here,

50
00:02:19.680 --> 00:02:21.633
to which the requests have to be sent,

51
00:02:22.560 --> 00:02:25.170
you can configure just in time model loading,

52
00:02:25.170 --> 00:02:28.860
which means if a request would be sent for a model

53
00:02:28.860 --> 00:02:31.620
that's not currently loaded in LM Studio,

54
00:02:31.620 --> 00:02:34.590
it would then be loaded on demand.

55
00:02:34.590 --> 00:02:36.540
You can turn this off if you don't want this,

56
00:02:36.540 --> 00:02:37.920
but if you turn it on,

57
00:02:37.920 --> 00:02:39.570
the advantage is that you don't have to

58
00:02:39.570 --> 00:02:41.190
load models in advance.

59
00:02:41.190 --> 00:02:44.430
Of course, requests for models that aren't loaded

60
00:02:44.430 --> 00:02:45.780
yet will take a bit longer

61
00:02:45.780 --> 00:02:48.003
because the model has to be loaded first.

62
00:02:48.990 --> 00:02:50.250
And you can also control

63
00:02:50.250 --> 00:02:53.550
that a model should be auto unloaded if it hasn't been used

64
00:02:53.550 --> 00:02:57.213
for a while, for 60 minutes, for example, as a default.

65
00:02:58.470 --> 00:03:00.990
So these are some settings you can tweak.

66
00:03:00.990 --> 00:03:03.560
But with that, the server is up and running.

67
00:03:03.560 --> 00:03:06.060
And the question then of course is,

68
00:03:06.060 --> 00:03:09.270
how can you use that server?

69
00:03:09.270 --> 00:03:13.470
And the answer, as you also find in the official docs is,

70
00:03:13.470 --> 00:03:18.470
for example, by using the OpenAI SDK, this one here.

71
00:03:21.450 --> 00:03:22.980
And that might be a bit weird

72
00:03:22.980 --> 00:03:25.740
because the idea behind this SDK,

73
00:03:25.740 --> 00:03:29.520
as the name kind of implies, is to use it in conjunction

74
00:03:29.520 --> 00:03:31.800
with the OpenAI API,

75
00:03:31.800 --> 00:03:35.970
so the proprietary models provided by OpenAI.

76
00:03:35.970 --> 00:03:40.680
Which is why when visiting the OpenAI documentation,

77
00:03:40.680 --> 00:03:45.680
you find comparable instructions about using the OpenAI SDK,

78
00:03:46.080 --> 00:03:49.140
just that here it of course makes sense.

79
00:03:49.140 --> 00:03:50.850
But indeed this SDK,

80
00:03:50.850 --> 00:03:54.390
this OpenAI SDK has become pretty much the de facto

81
00:03:54.390 --> 00:03:59.130
standard for interacting with LLM APIs.

82
00:03:59.130 --> 00:04:01.860
Many other providers also adopted it.

83
00:04:01.860 --> 00:04:06.060
For example, Google also allows you to interact with Gemini,

84
00:04:06.060 --> 00:04:08.520
which is their paid models, not the free ones,

85
00:04:08.520 --> 00:04:11.370
but Gemini is the paid LLM

86
00:04:11.370 --> 00:04:14.763
provided by Google through the OpenAI SDK.

87
00:04:15.720 --> 00:04:17.880
And it's the same for LM Studio.

88
00:04:17.880 --> 00:04:22.320
The API that is provided by that LM Studio server

89
00:04:22.320 --> 00:04:26.370
that's spun up by setting it to running.

90
00:04:26.370 --> 00:04:29.310
So the API provided by that server here

91
00:04:29.310 --> 00:04:31.950
is essentially in line with the structure

92
00:04:31.950 --> 00:04:34.230
of the OpenAI API

93
00:04:34.230 --> 00:04:38.700
and can therefore be used through the OpenAI SDK.

94
00:04:38.700 --> 00:04:42.000
The only thing you need to do when aiming to do so

95
00:04:42.000 --> 00:04:45.540
is when initializing that OpenAI SDK,

96
00:04:45.540 --> 00:04:50.280
you have to set the base URL parameter here to this value,

97
00:04:50.280 --> 00:04:51.690
which is the domain

98
00:04:51.690 --> 00:04:56.553
or the URL of this locally running LM Studio API.

99
00:04:57.480 --> 00:04:59.880
Because otherwise, without setting this,

100
00:04:59.880 --> 00:05:03.990
it would try to send requests to the actual OpenAI API,

101
00:05:03.990 --> 00:05:06.540
which is of course not what we want.

102
00:05:06.540 --> 00:05:08.280
Instead, we want to send requests

103
00:05:08.280 --> 00:05:10.920
to our locally running model,

104
00:05:10.920 --> 00:05:13.563
and that can be done through that base URL.

105
00:05:14.520 --> 00:05:17.340
Now of course, it's also worth pointing out

106
00:05:17.340 --> 00:05:19.710
that you could also communicate with

107
00:05:19.710 --> 00:05:23.220
that OpenAI compatible API that is exposed

108
00:05:23.220 --> 00:05:27.330
by LM Studio without using the OpenAI SDK.

109
00:05:27.330 --> 00:05:30.780
You can just send a regular HTTP request

110
00:05:30.780 --> 00:05:33.930
with help of any programming language you want.

111
00:05:33.930 --> 00:05:36.180
It is just an API endpoint

112
00:05:36.180 --> 00:05:39.450
you have to send a request to after all.

113
00:05:39.450 --> 00:05:42.390
But it will be an API endpoint that's in line

114
00:05:42.390 --> 00:05:44.250
with the OpenAI API,

115
00:05:44.250 --> 00:05:46.440
and therefore you can use many of the features

116
00:05:46.440 --> 00:05:49.110
provided by the OpenAI API,

117
00:05:49.110 --> 00:05:51.990
just here with our locally running API

118
00:05:51.990 --> 00:05:53.850
that's provided by LM Studio.

119
00:05:53.850 --> 00:05:55.500
So that's quite convenient,

120
00:05:55.500 --> 00:05:58.833
and therefore we can now take a look at some code examples.