WEBVTT

1
00:00:00.480 --> 00:00:03.750
<v Maximilian>Creating customized versions,</v>

2
00:00:03.750 --> 00:00:08.370
adjusted versions of open Large Language Models

3
00:00:08.370 --> 00:00:10.830
as you learned it in the previous lectures,

4
00:00:10.830 --> 00:00:13.530
is one of the big advantages

5
00:00:13.530 --> 00:00:17.040
or core features offered by Ollama.

6
00:00:17.040 --> 00:00:20.880
And you saw how you can set the system message or parameters

7
00:00:20.880 --> 00:00:22.380
or even a chat history

8
00:00:22.380 --> 00:00:25.170
and then save that as a new model

9
00:00:25.170 --> 00:00:28.350
to reuse it for future use.

10
00:00:28.350 --> 00:00:31.800
Now, besides using set and save

11
00:00:31.800 --> 00:00:34.350
to configure and save a model,

12
00:00:34.350 --> 00:00:37.650
you can also use another approach.

13
00:00:37.650 --> 00:00:41.190
You can create a so-called model file,

14
00:00:41.190 --> 00:00:43.800
and here I'm doing this in Visual Studio Code,

15
00:00:43.800 --> 00:00:45.600
which is simply a text editor.

16
00:00:45.600 --> 00:00:47.730
You can also use a regular text editor.

17
00:00:47.730 --> 00:00:50.220
You don't need a code editor for this,

18
00:00:50.220 --> 00:00:52.320
and you can do this anywhere on your system

19
00:00:52.320 --> 00:00:54.390
in any folder of your choice.

20
00:00:54.390 --> 00:00:56.790
This is also a file without a file extension.

21
00:00:56.790 --> 00:00:58.983
It's just a file called Modelfile

22
00:01:00.240 --> 00:01:03.840
because Ollama has this concept of model files,

23
00:01:03.840 --> 00:01:06.900
which in the end is inspired by Docker files.

24
00:01:06.900 --> 00:01:08.340
Now you don't need to know Docker.

25
00:01:08.340 --> 00:01:11.760
If Docker files don't tell you anything, that's no problem.

26
00:01:11.760 --> 00:01:14.400
The idea behind these model files is essentially

27
00:01:14.400 --> 00:01:17.730
that you can configure models,

28
00:01:17.730 --> 00:01:22.080
adjust models just as we did it with /set,

29
00:01:22.080 --> 00:01:24.630
but without doing it in the command line,

30
00:01:24.630 --> 00:01:28.410
and then you can load models based on that model file.

31
00:01:28.410 --> 00:01:30.360
So you can simply configure a model

32
00:01:30.360 --> 00:01:33.453
in a more convenient way here, so to say.

33
00:01:34.680 --> 00:01:38.640
Therefore, every model file starts with a from instruction,

34
00:01:38.640 --> 00:01:42.360
which defines the base model you want to build up on.

35
00:01:42.360 --> 00:01:47.360
In my case here, gemma3, and then 12b, and then -it-qat.

36
00:01:50.940 --> 00:01:54.810
So using that same identifier I used all the time.

37
00:01:54.810 --> 00:01:56.640
That will tell Ollama

38
00:01:56.640 --> 00:02:00.300
that this is the general AI model I want to use,

39
00:02:00.300 --> 00:02:02.040
but now I can overwrite

40
00:02:02.040 --> 00:02:05.310
or adjust certain settings about this model.

41
00:02:05.310 --> 00:02:08.130
For example, as you also see in the official docs,

42
00:02:08.130 --> 00:02:11.160
which unfortunately are only available on GitHub.

43
00:02:11.160 --> 00:02:11.993
You'll find a link

44
00:02:11.993 --> 00:02:14.700
to that model file documentation attached.

45
00:02:14.700 --> 00:02:16.620
For example, you can set parameters

46
00:02:16.620 --> 00:02:19.260
with the parameter instruction.

47
00:02:19.260 --> 00:02:23.190
You can set a system message with the system instruction.

48
00:02:23.190 --> 00:02:25.710
You can also set more advanced things,

49
00:02:25.710 --> 00:02:29.010
but I'll get back to one advanced concept,

50
00:02:29.010 --> 00:02:31.800
the template, a little bit later.

51
00:02:31.800 --> 00:02:36.480
For now, I'll focus on parameter, system, and also message.

52
00:02:36.480 --> 00:02:37.860
So let's do that here.

53
00:02:37.860 --> 00:02:39.960
Let's set a parameter

54
00:02:39.960 --> 00:02:43.140
and I'll set the num_ctx parameter,

55
00:02:43.140 --> 00:02:46.413
which defines the available context window length,

56
00:02:47.580 --> 00:02:50.340
and I'll set that to 10,000 tokens.

57
00:02:50.340 --> 00:02:52.500
We can also set another parameter.

58
00:02:52.500 --> 00:02:57.240
For example, the temperature, to let's say 0.5.

59
00:02:57.240 --> 00:02:58.740
The default for the Gemma model

60
00:02:58.740 --> 00:03:01.470
at this point of time is one actually.

61
00:03:01.470 --> 00:03:03.270
Here I'm setting it to 0.5

62
00:03:03.270 --> 00:03:05.820
to make it a bit less creative.

63
00:03:05.820 --> 00:03:08.520
And don't forget, you'll find the default params

64
00:03:08.520 --> 00:03:12.000
for the model you're using in the Ollama models catalog

65
00:03:12.000 --> 00:03:14.100
on the detail page of a given model.

66
00:03:14.100 --> 00:03:18.270
There under params, you find the default parameters.

67
00:03:18.270 --> 00:03:21.270
So here for Gemma 3, the default temperature is one.

68
00:03:21.270 --> 00:03:24.450
Now I'm setting it to 0.5 here,

69
00:03:24.450 --> 00:03:27.510
and I'm doing that through that parameter instruction

70
00:03:27.510 --> 00:03:28.500
in that model file,

71
00:03:28.500 --> 00:03:32.040
and that is one of the officially supported instructions.

72
00:03:32.040 --> 00:03:33.690
Of course, you can only include

73
00:03:33.690 --> 00:03:35.100
the instructions you find here

74
00:03:35.100 --> 00:03:37.140
because these are the only instructions

75
00:03:37.140 --> 00:03:39.300
Ollama will understand.

76
00:03:39.300 --> 00:03:41.280
I'll also add a system message,

77
00:03:41.280 --> 00:03:44.100
and that can of course, be anything you want.

78
00:03:44.100 --> 00:03:46.410
It can be a short one, it can be a long one,

79
00:03:46.410 --> 00:03:50.400
like, "You are a friendly assistant.

80
00:03:50.400 --> 00:03:55.400
You will not answer any questions related to LM Studio."

81
00:03:57.300 --> 00:03:59.850
Now, you should make sure that it's all in one line.

82
00:03:59.850 --> 00:04:02.220
Here, it's getting wrapped by my code editor,

83
00:04:02.220 --> 00:04:04.230
but that's just a visual wrap.

84
00:04:04.230 --> 00:04:05.850
Technically, it's all in one line,

85
00:04:05.850 --> 00:04:07.650
and you should ensure that this is the case,

86
00:04:07.650 --> 00:04:10.080
because if you would split it across multiple lines,

87
00:04:10.080 --> 00:04:13.230
this would be interpreted as a new instruction,

88
00:04:13.230 --> 00:04:15.120
but an unknown instruction

89
00:04:15.120 --> 00:04:19.530
because Ollama only understands from, parameter, system,

90
00:04:19.530 --> 00:04:22.140
and whatever else is specified here.

91
00:04:22.140 --> 00:04:23.940
So that's why that should all be in one line.

92
00:04:23.940 --> 00:04:26.070
Again, here it looks like it's in a new line

93
00:04:26.070 --> 00:04:28.020
because my code editor breaks it

94
00:04:28.020 --> 00:04:31.200
across multiple lines visually, but it is in one line,

95
00:04:31.200 --> 00:04:33.870
as you can also tell by the line numbers.

96
00:04:33.870 --> 00:04:37.350
What's also an interesting setting is the message setting.

97
00:04:37.350 --> 00:04:41.580
This allows you to add messages to this model file

98
00:04:41.580 --> 00:04:45.420
and therefore, to the model you'll build based on that file.

99
00:04:45.420 --> 00:04:48.600
And for that, remember that a couple of lectures ago

100
00:04:48.600 --> 00:04:51.630
when I first showed you the /save command,

101
00:04:51.630 --> 00:04:54.300
I did that to save a chat history

102
00:04:54.300 --> 00:04:57.000
and to reuse that chat history.

103
00:04:57.000 --> 00:05:01.290
Here, we can bake a chat history into this model file

104
00:05:01.290 --> 00:05:03.150
and that history will then be used

105
00:05:03.150 --> 00:05:06.060
by any models we build based on that file,

106
00:05:06.060 --> 00:05:08.490
which is something I'll show you in a second.

107
00:05:08.490 --> 00:05:11.700
Now, after message, you should specify who sent the message,

108
00:05:11.700 --> 00:05:14.640
and that can either be user, if it's you,

109
00:05:14.640 --> 00:05:18.990
if you want to simulate that you sent the message to the AI,

110
00:05:18.990 --> 00:05:21.030
or assistant if you want to simulate

111
00:05:21.030 --> 00:05:26.030
or kind of fake that the AI assistant sent the message.

112
00:05:26.550 --> 00:05:30.360
So here we could, for example, send a user message

113
00:05:30.360 --> 00:05:34.060
and say something like, "Hi, this is a question

114
00:05:34.920 --> 00:05:39.900
submitted via the contact form on the website."

115
00:05:39.900 --> 00:05:42.120
Again, there is no real line break here

116
00:05:42.120 --> 00:05:43.410
and there shouldn't be one.

117
00:05:43.410 --> 00:05:46.113
It's just a visual line break by my editor.

118
00:05:47.040 --> 00:05:49.080
And then I'll add another message here,

119
00:05:49.080 --> 00:05:54.080
assistant, "Hi, thanks for contacting us!"

120
00:05:56.100 --> 00:05:59.340
And now that would be some made up chat history

121
00:05:59.340 --> 00:06:01.290
with which we always start

122
00:06:01.290 --> 00:06:04.800
when we run this adjusted model version,

123
00:06:04.800 --> 00:06:07.173
which we're building here with that model file.

124
00:06:08.100 --> 00:06:09.030
And that could therefore,

125
00:06:09.030 --> 00:06:11.700
be one example model file you wanna build.

126
00:06:11.700 --> 00:06:16.560
The question now is how do we now translate this model file

127
00:06:16.560 --> 00:06:20.973
into an actual model we can run with the Ollama command?

