WEBVTT

1
00:00:00.540 --> 00:00:02.640
<v Maximilian>So we can chat with Ollama.</v>

2
00:00:02.640 --> 00:00:04.410
That's pretty straightforward.

3
00:00:04.410 --> 00:00:06.900
There are two important features about this chat,

4
00:00:06.900 --> 00:00:08.460
which I wanna highlight here.

5
00:00:08.460 --> 00:00:11.550
One is related to multi-line messages,

6
00:00:11.550 --> 00:00:14.430
because right now, if I type something here, like Hi,

7
00:00:14.430 --> 00:00:17.490
and I then want to enter a new line to structure my input,

8
00:00:17.490 --> 00:00:21.690
for example, if I hit Enter, I just sent this as a message,

9
00:00:21.690 --> 00:00:24.690
which might not always be what I wanna do.

10
00:00:24.690 --> 00:00:26.640
So in order to send multi-line messages,

11
00:00:26.640 --> 00:00:30.150
you can and should use triple double quotes.

12
00:00:30.150 --> 00:00:32.280
Then if you hit Enter, you don't send a message,

13
00:00:32.280 --> 00:00:34.110
but you enter a new line.

14
00:00:34.110 --> 00:00:37.215
So here I could now write a message like this,

15
00:00:37.215 --> 00:00:41.160
and you then end with triple double quotes again.

16
00:00:41.160 --> 00:00:43.770
So opening and closing triple double quotes

17
00:00:43.770 --> 00:00:46.800
and in between, your multi-line message.

18
00:00:46.800 --> 00:00:48.990
And this will then send the message.

19
00:00:48.990 --> 00:00:50.891
So that's one important thing.

20
00:00:50.891 --> 00:00:52.470
What's also important

21
00:00:52.470 --> 00:00:55.380
is that when dealing with a vision model,

22
00:00:55.380 --> 00:00:57.510
which Gemma3 is,

23
00:00:57.510 --> 00:01:00.930
you can, of course, not just use it to handle text input,

24
00:01:00.930 --> 00:01:02.820
but also image input.

25
00:01:02.820 --> 00:01:06.150
Gemma3 models and a couple of other models as well

26
00:01:06.150 --> 00:01:09.840
are able to extract information from images.

27
00:01:09.840 --> 00:01:10.740
Now, you can, of course,

28
00:01:10.740 --> 00:01:14.340
use Ollama programmatically to send images to it,

29
00:01:14.340 --> 00:01:17.310
and that is something we'll explore later in this section,

30
00:01:17.310 --> 00:01:20.880
but you can also use an image here in the chat.

31
00:01:20.880 --> 00:01:21.713
And to do that,

32
00:01:21.713 --> 00:01:24.210
all you have to do is type your prompt

33
00:01:24.210 --> 00:01:26.193
like what's on this image,

34
00:01:27.120 --> 00:01:29.610
and then separate it with a space,

35
00:01:29.610 --> 00:01:32.580
you provide a path to that image.

36
00:01:32.580 --> 00:01:35.010
In my case, it's a image called max.jpg,

37
00:01:35.010 --> 00:01:38.160
which sits in the same folder as I'm currently in.

38
00:01:38.160 --> 00:01:40.170
So I provide a relative path

39
00:01:40.170 --> 00:01:43.170
to that image, starting with ./.

40
00:01:43.170 --> 00:01:45.300
You could also have an absolute path here

41
00:01:45.300 --> 00:01:47.400
to the image that might be stored somewhere else

42
00:01:47.400 --> 00:01:48.840
on your system.

43
00:01:48.840 --> 00:01:50.250
And now, if I sent this,

44
00:01:50.250 --> 00:01:53.160
Ollama detects that I added an image to the prompt.

45
00:01:53.160 --> 00:01:54.930
And for it to detect that,

46
00:01:54.930 --> 00:01:58.950
your image path must come last after your prompt.

47
00:01:58.950 --> 00:02:00.513
The order is important.

48
00:02:01.470 --> 00:02:04.770
And it now sent this image, along with my prompt to Ollama.

49
00:02:04.770 --> 00:02:07.890
And therefore, here, I get a nice output

50
00:02:07.890 --> 00:02:09.450
describing what's on the image,

51
00:02:09.450 --> 00:02:12.930
that it's a man in a recording setup

52
00:02:12.930 --> 00:02:16.410
and that there is a thumbs-up gesture.

53
00:02:16.410 --> 00:02:17.550
And that is correct

54
00:02:17.550 --> 00:02:20.970
because this here is the image I sent to Ollama.

55
00:02:20.970 --> 00:02:22.893
It's this max.jpg file.

56
00:02:23.940 --> 00:02:27.660
So if you are working with a multimodal model

57
00:02:27.660 --> 00:02:29.040
like the Gemma3 model,

58
00:02:29.040 --> 00:02:32.640
which is not just capable of handling text, but also images,

59
00:02:32.640 --> 00:02:34.260
you can send both to it.

60
00:02:34.260 --> 00:02:36.633
It's just the order that's important.

