WEBVTT

1
00:00:00.810 --> 00:00:04.740
<v Presenter>Now, Ollama does not just have its own API,</v>

2
00:00:04.740 --> 00:00:07.320
which is really powerful because, as explained,

3
00:00:07.320 --> 00:00:09.330
you can do everything you can do

4
00:00:09.330 --> 00:00:13.140
through the command line, also with help of this API.

5
00:00:13.140 --> 00:00:16.920
But they also have an OpenAI compatibility

6
00:00:16.920 --> 00:00:19.413
just like LM Studio did.

7
00:00:20.310 --> 00:00:24.300
So the great thing here is that you can use the OpenAI SDK

8
00:00:24.300 --> 00:00:28.320
as explained in that LM Studio course section

9
00:00:28.320 --> 00:00:32.940
because that OpenAI SDK whilst being built by OpenAI

10
00:00:32.940 --> 00:00:34.920
and whilst, of course, being built in order

11
00:00:34.920 --> 00:00:38.940
to use their API, this SDK can also be used

12
00:00:38.940 --> 00:00:43.940
with other APIs by overriding this base URL setting

13
00:00:44.010 --> 00:00:47.460
and by, for example here, setting it to this address

14
00:00:47.460 --> 00:00:50.073
where the Ollama API is running.

15
00:00:51.150 --> 00:00:53.610
Now, all of that wouldn't work if the Ollama API

16
00:00:53.610 --> 00:00:56.550
wouldn't support it and the default API indeed

17
00:00:56.550 --> 00:00:57.810
has a different shape,

18
00:00:57.810 --> 00:01:02.280
so that API wouldn't work with the OpenAI SDK,

19
00:01:02.280 --> 00:01:05.880
but Ollama simply also offers an additional API

20
00:01:05.880 --> 00:01:08.820
in addition to their built-in default API.

21
00:01:08.820 --> 00:01:11.130
And that additional API is unlocked

22
00:01:11.130 --> 00:01:12.810
and available out of the box.

23
00:01:12.810 --> 00:01:16.110
You don't need to do anything special for that.

24
00:01:16.110 --> 00:01:19.590
So, by just changing that base URL to this URL,

25
00:01:19.590 --> 00:01:20.910
you can communicate

26
00:01:20.910 --> 00:01:25.110
with the Ollama OpenAI compatible API

27
00:01:25.110 --> 00:01:28.920
through the OpenAI SDK, for example.

28
00:01:28.920 --> 00:01:31.110
And, therefore, for example, you could have code

29
00:01:31.110 --> 00:01:34.650
like this where I'm using that OpenAI SDK,

30
00:01:34.650 --> 00:01:37.230
where I changed that base URL

31
00:01:37.230 --> 00:01:41.310
and where I set that API key to not get a warning or error,

32
00:01:41.310 --> 00:01:42.810
but the value doesn't matter.

33
00:01:42.810 --> 00:01:44.790
It's not going to get used anyways

34
00:01:44.790 --> 00:01:47.220
because the model is running locally on our system.

35
00:01:47.220 --> 00:01:49.740
We don't need to authenticate.

36
00:01:49.740 --> 00:01:51.660
But I did set up this client

37
00:01:51.660 --> 00:01:54.870
and then I'm asking for a chat completion.

38
00:01:54.870 --> 00:01:58.050
So for an AI-generated response,

39
00:01:58.050 --> 00:02:00.600
based on that made-up chat history

40
00:02:00.600 --> 00:02:04.320
which I am sending to that Gemma 3 model again.

41
00:02:04.320 --> 00:02:07.770
And, again, here, you could enter any other model identifier

42
00:02:07.770 --> 00:02:11.373
as well, including one of your own customized models.

43
00:02:12.660 --> 00:02:15.270
But I'll go for Gemma 3 again.

44
00:02:15.270 --> 00:02:18.330
And, therefore, now, if I execute this basic py file,

45
00:02:18.330 --> 00:02:21.750
I'm using that OpenAI SDK to communicate

46
00:02:21.750 --> 00:02:26.670
with my locally running, Ollama-powered Gemma 3 model.

47
00:02:26.670 --> 00:02:30.990
And, therefore, here, I got the response by that model.

48
00:02:30.990 --> 00:02:34.320
Of course, you can, therefore, also build your own chat

49
00:02:34.320 --> 00:02:36.857
by again, using the OpenAI SDK

50
00:02:36.857 --> 00:02:40.680
and the Ollama API simply by having an infinite loop

51
00:02:40.680 --> 00:02:43.200
where you keep on asking for more input

52
00:02:43.200 --> 00:02:46.740
and where you then keep on asking the model.

53
00:02:46.740 --> 00:02:51.090
Or you could build an image parsing application

54
00:02:51.090 --> 00:02:54.570
where you find all jpeg and png files

55
00:02:54.570 --> 00:02:56.760
in this demo in a given folder,

56
00:02:56.760 --> 00:02:59.670
in my case here in the images folder,

57
00:02:59.670 --> 00:03:02.250
which indeed is a folder I have in this project,

58
00:03:02.250 --> 00:03:06.540
which includes this image of a lake taken from a plane

59
00:03:06.540 --> 00:03:10.470
and of me in my recording setup.

60
00:03:10.470 --> 00:03:12.480
And you could load these images

61
00:03:12.480 --> 00:03:17.480
and send them again with help of the OpenAI SDK to Ollama

62
00:03:17.700 --> 00:03:20.760
if you're talking to a model that has computer vision,

63
00:03:20.760 --> 00:03:23.073
which is the case for this Gemma 3 model.

64
00:03:24.120 --> 00:03:28.050
The images are converted to a format called Base 64

65
00:03:28.050 --> 00:03:31.680
which is essentially a text representation of the images,

66
00:03:31.680 --> 00:03:36.180
which is the kind of format Ollama and the Gemma model wants

67
00:03:36.180 --> 00:03:39.453
for taking a look at the images, so to say.

68
00:03:40.410 --> 00:03:42.990
And then you can ask questions about these images,

69
00:03:42.990 --> 00:03:45.180
like, for example, you can ask Ollama,

70
00:03:45.180 --> 00:03:48.870
and the Gemma 3 model to describe this image in detail.

71
00:03:48.870 --> 00:03:51.930
And I do this for both images.

72
00:03:51.930 --> 00:03:54.900
So if I execute this image parser python file,

73
00:03:54.900 --> 00:03:57.660
it starts with the max image here,

74
00:03:57.660 --> 00:03:59.610
sends that along with the prompt

75
00:03:59.610 --> 00:04:02.730
to Gemma 3 powered by Ollama.

76
00:04:02.730 --> 00:04:04.590
And since I'm not streaming in the response,

77
00:04:04.590 --> 00:04:05.733
this can take a while.

78
00:04:06.810 --> 00:04:09.210
And then here, I got that response

79
00:04:09.210 --> 00:04:11.880
where indeed it again correctly describes

80
00:04:11.880 --> 00:04:15.390
that I'm making a thumbs-up gesture

81
00:04:15.390 --> 00:04:17.760
that I'm in a recording streaming setup

82
00:04:17.760 --> 00:04:20.763
and all the other details here also sound good to me.

83
00:04:21.630 --> 00:04:23.820
And it's then, of course, also processing

84
00:04:23.820 --> 00:04:26.400
that other image of that lake.

85
00:04:26.400 --> 00:04:28.710
And here, with it being done,

86
00:04:28.710 --> 00:04:30.960
it, again, correctly captures

87
00:04:30.960 --> 00:04:33.333
that this was taken from an airplane.

88
00:04:34.740 --> 00:04:38.880
Here, we also see that we got some layered mountains

89
00:04:38.880 --> 00:04:40.833
and, of course, that lake.

90
00:04:41.970 --> 00:04:44.550
So that all works and is another example

91
00:04:44.550 --> 00:04:47.280
of using an Ollama-hosted,

92
00:04:47.280 --> 00:04:51.123
locally running OpenAI model programmatically.