WEBVTT

1
00:00:00.510 --> 00:00:02.550
<v Maximilian>So the model file concept</v>

2
00:00:02.550 --> 00:00:04.470
is an important concept.

3
00:00:04.470 --> 00:00:07.530
It allows you to build customized versions

4
00:00:07.530 --> 00:00:10.050
based on existing models.

5
00:00:10.050 --> 00:00:13.050
You can also take it a step further though.

6
00:00:13.050 --> 00:00:18.050
You can technically also build new models with a model file

7
00:00:18.150 --> 00:00:22.803
that are not based on base models from the Ollama catalog.

8
00:00:23.820 --> 00:00:27.090
Instead, you could have a GGUF file,

9
00:00:27.090 --> 00:00:30.330
which is a file format that stores these parameters,

10
00:00:30.330 --> 00:00:33.720
these weights, along with some metadata.

11
00:00:33.720 --> 00:00:36.210
And you could get such a GGUF file

12
00:00:36.210 --> 00:00:40.770
because you train the model yourself or from Hugging Face.

13
00:00:40.770 --> 00:00:45.180
Now for example, here, I have the Qwen3 model,

14
00:00:45.180 --> 00:00:48.510
or to be precise, a quantized version of that model.

15
00:00:48.510 --> 00:00:52.050
It's a normal model shared on Hugginng Face.

16
00:00:52.050 --> 00:00:53.250
And that is a model

17
00:00:53.250 --> 00:00:56.250
that is indeed already supported by all Ollama.

18
00:00:56.250 --> 00:01:00.570
So you do find Qwen3 in their model catalog as well.

19
00:01:00.570 --> 00:01:03.240
But if it would not be included there,

20
00:01:03.240 --> 00:01:05.160
and for some models on Hugging Face,

21
00:01:05.160 --> 00:01:06.450
that might be the case

22
00:01:06.450 --> 00:01:09.480
because whilst most models and definitely all important

23
00:01:09.480 --> 00:01:13.680
and popular models are part of the Ollama model catalog,

24
00:01:13.680 --> 00:01:15.960
some models might not be.

25
00:01:15.960 --> 00:01:17.760
And therefore for such situations,

26
00:01:17.760 --> 00:01:21.600
you could download such a GGUF file from Hugging Face.

27
00:01:21.600 --> 00:01:23.460
You could download the model,

28
00:01:23.460 --> 00:01:25.320
the parameters from Hugging Face,

29
00:01:25.320 --> 00:01:29.310
and then use that downloaded GGUF file as a base

30
00:01:29.310 --> 00:01:33.720
to set up your own Ollama-compatible model.

31
00:01:33.720 --> 00:01:34.590
How do you do that?

32
00:01:34.590 --> 00:01:36.150
Well, here for this example,

33
00:01:36.150 --> 00:01:38.130
and you find a link to that attached.

34
00:01:38.130 --> 00:01:41.730
There is a list of download links here

35
00:01:41.730 --> 00:01:44.310
for different quantized versions of that model.

36
00:01:44.310 --> 00:01:48.480
And I wanna use the Q4 KM version here.

37
00:01:48.480 --> 00:01:50.520
And in general, as you also see here,

38
00:01:50.520 --> 00:01:53.370
it's recommended to go for the Q4 versions,

39
00:01:53.370 --> 00:01:57.390
the four bit quantized versions, and then KM or KS.

40
00:01:57.390 --> 00:02:00.690
These typically give you the best trade off between quality

41
00:02:00.690 --> 00:02:04.620
and performance and reducing hardware requirements.

42
00:02:04.620 --> 00:02:07.380
And you can download this file here

43
00:02:07.380 --> 00:02:09.480
directly from Hugging Face.

44
00:02:09.480 --> 00:02:11.220
And again, this is just an example.

45
00:02:11.220 --> 00:02:12.450
You could also have a model

46
00:02:12.450 --> 00:02:15.270
that you trained yourself of course,

47
00:02:15.270 --> 00:02:17.010
and then you could create a model file

48
00:02:17.010 --> 00:02:19.020
next to that downloaded model.

49
00:02:19.020 --> 00:02:21.690
So next to this downloaded GGUF file,

50
00:02:21.690 --> 00:02:23.130
which is a file that contains

51
00:02:23.130 --> 00:02:26.580
all these parameters and weights, and some metadata.

52
00:02:26.580 --> 00:02:28.140
And now with the from instruction,

53
00:02:28.140 --> 00:02:31.320
you would point at that downloaded file.

54
00:02:31.320 --> 00:02:34.320
So you would use that file name including the file extension

55
00:02:35.640 --> 00:02:38.100
and specify it like this here.

56
00:02:38.100 --> 00:02:40.650
And now you would build a brand new model,

57
00:02:40.650 --> 00:02:42.510
or you would define a brand new model

58
00:02:42.510 --> 00:02:44.220
based on that downloaded model,

59
00:02:44.220 --> 00:02:46.803
based on these parameters and weights.

60
00:02:48.120 --> 00:02:50.070
Of course, you can still add a system message

61
00:02:50.070 --> 00:02:51.363
and parameters,

62
00:02:52.380 --> 00:02:56.070
but you also might want to specify a template.

63
00:02:56.070 --> 00:02:59.250
Now the great thing about GGUF files is that,

64
00:02:59.250 --> 00:03:00.270
as I explained,

65
00:03:00.270 --> 00:03:03.330
they do not just contain the parameters,

66
00:03:03.330 --> 00:03:05.430
but also some metadata.

67
00:03:05.430 --> 00:03:07.110
And one important piece of metadata

68
00:03:07.110 --> 00:03:10.020
they do include is the chat template,

69
00:03:10.020 --> 00:03:13.323
which in case of Qwen3 is rather complex as you see.

70
00:03:14.340 --> 00:03:18.797
You can also see that on the Qwen3 page of Ollama.

71
00:03:18.797 --> 00:03:20.430
There, if you open the template,

72
00:03:20.430 --> 00:03:22.983
you got a raw complex template there as well.

73
00:03:23.970 --> 00:03:26.970
By the way, these template formats are different.

74
00:03:26.970 --> 00:03:30.240
Ollama uses the Go templating language.

75
00:03:30.240 --> 00:03:31.620
On Hugging Face here,

76
00:03:31.620 --> 00:03:34.170
that is a different language.

77
00:03:34.170 --> 00:03:35.940
So you would need to use

78
00:03:35.940 --> 00:03:38.880
that Go templating language expected by Ollama

79
00:03:38.880 --> 00:03:40.893
if you were to set up your own template.

80
00:03:42.060 --> 00:03:43.650
Now I'll first show you what happens

81
00:03:43.650 --> 00:03:45.750
if you don't specify a template though.

82
00:03:45.750 --> 00:03:48.090
So I'll just set up a basic model file

83
00:03:48.090 --> 00:03:50.880
that just has this from instruction and nothing else.

84
00:03:50.880 --> 00:03:53.220
And of course we could add a system prompt parameters,

85
00:03:53.220 --> 00:03:54.633
but I won't add anything.

86
00:03:55.470 --> 00:03:59.130
And I'll then create my own Qwen model version

87
00:03:59.130 --> 00:04:01.923
based on that model file with Ollama create.

88
00:04:03.870 --> 00:04:08.490
And if I now run this and I ask it something,

89
00:04:08.490 --> 00:04:10.623
I get a super weird response.

90
00:04:14.640 --> 00:04:17.490
And I got this weird response by the model

91
00:04:17.490 --> 00:04:21.510
because it had no structure at all, it had no template,

92
00:04:21.510 --> 00:04:26.510
it did not understand my message I sent to it in the end.

93
00:04:26.640 --> 00:04:28.200
The message I sent to it

94
00:04:28.200 --> 00:04:30.030
was not structured in a way,

95
00:04:30.030 --> 00:04:33.030
was not wrapped with the right identifiers,

96
00:04:33.030 --> 00:04:35.490
it knew from the training process.

97
00:04:35.490 --> 00:04:37.770
Now of course, I can use the chat template here

98
00:04:37.770 --> 00:04:39.960
from Hugging Face and try to use that,

99
00:04:39.960 --> 00:04:42.210
but as mentioned, this must be adjusted

100
00:04:42.210 --> 00:04:44.970
to use the Golang templating language.

101
00:04:44.970 --> 00:04:47.940
And whilst you could of course try to convert this manually

102
00:04:47.940 --> 00:04:49.980
or with help of AI, here,

103
00:04:49.980 --> 00:04:51.990
I'll take the easy route

104
00:04:51.990 --> 00:04:54.630
and I'll just use the one that was built by Ollama.

105
00:04:54.630 --> 00:04:57.390
And of course in reality, I would use the entire model

106
00:04:57.390 --> 00:04:59.160
that was published by Ollama.

107
00:04:59.160 --> 00:05:00.750
This is just a demo here,

108
00:05:00.750 --> 00:05:05.610
but here I'll just use their template, copy all of that,

109
00:05:05.610 --> 00:05:09.090
go back to my model file, and add such a template.

110
00:05:09.090 --> 00:05:13.590
Add the triple quotes to insert a multi-line template,

111
00:05:13.590 --> 00:05:17.493
and insert this here, like this.

112
00:05:18.810 --> 00:05:21.420
Now you can ignore the fact that something's red here,

113
00:05:21.420 --> 00:05:22.623
this should work.

114
00:05:23.700 --> 00:05:26.640
So now with that template added,

115
00:05:26.640 --> 00:05:31.380
I'll first of all remove my model, my broken model,

116
00:05:31.380 --> 00:05:33.990
and then I'll repeat the command for creating it,

117
00:05:33.990 --> 00:05:36.000
now based on this updated model file,

118
00:05:36.000 --> 00:05:38.253
which does now include a proper template.

119
00:05:39.240 --> 00:05:41.880
And if I now run this updated model

120
00:05:41.880 --> 00:05:43.977
and I ask, "How are you,"

121
00:05:45.630 --> 00:05:47.850
I get a more reasonable output.

122
00:05:47.850 --> 00:05:50.070
Now this model, the Qwen model,

123
00:05:50.070 --> 00:05:52.020
is actually a reasoning model which tries

124
00:05:52.020 --> 00:05:54.420
to think first before it responds,

125
00:05:54.420 --> 00:05:56.310
which is why we see this block.

126
00:05:56.310 --> 00:05:58.590
But it generally understood my message

127
00:05:58.590 --> 00:06:01.260
and it gave a reasonable response.

128
00:06:01.260 --> 00:06:03.453
So that's working the way it should,

129
00:06:04.410 --> 00:06:06.360
and that therefore does not just show you

130
00:06:06.360 --> 00:06:09.330
how you can set up models from scratch

131
00:06:09.330 --> 00:06:12.540
just based on such a GGUF file, for example.

132
00:06:12.540 --> 00:06:16.770
But it also shows you the importance of adding templates.

133
00:06:16.770 --> 00:06:18.480
Again, because it's important,

134
00:06:18.480 --> 00:06:20.790
you don't need those when using the models

135
00:06:20.790 --> 00:06:23.970
that are published in the Ollama model catalog.

136
00:06:23.970 --> 00:06:26.580
And you also don't need to specify templates when building

137
00:06:26.580 --> 00:06:30.270
your own customized versions based off those models.

138
00:06:30.270 --> 00:06:33.450
But if you build a model from scratch, as I just showed you,

139
00:06:33.450 --> 00:06:35.643
then these templates do matter.

