WEBVTT

1
00:00:00.000 --> 00:00:01.590
<v ->[Maximilian Schwarzmuller] So as you learned,</v>

2
00:00:01.590 --> 00:00:03.870
you can save and load sessions.

3
00:00:03.870 --> 00:00:06.390
And what happens technically is that a copy

4
00:00:06.390 --> 00:00:07.920
of the model gets created

5
00:00:07.920 --> 00:00:11.460
and for that copied model, all of your settings,

6
00:00:11.460 --> 00:00:14.490
your system message, your parameter settings,

7
00:00:14.490 --> 00:00:17.970
and your existing chat history is saved

8
00:00:17.970 --> 00:00:21.570
and baked into that model, so to say.

9
00:00:21.570 --> 00:00:25.980
And you can see that model if you run Ollama list

10
00:00:25.980 --> 00:00:28.110
because that lists all the models

11
00:00:28.110 --> 00:00:30.659
that have been downloaded onto your system.

12
00:00:30.659 --> 00:00:34.520
You also have Ollama ps, which you can run

13
00:00:34.520 --> 00:00:38.970
to list all currently running models.

14
00:00:38.970 --> 00:00:41.430
So here, Ollama ps does indeed show me

15
00:00:41.430 --> 00:00:44.490
the Gemma 3 model I started because by default,

16
00:00:44.490 --> 00:00:47.370
Ollama keeps them running for five minutes

17
00:00:47.370 --> 00:00:49.890
and after five minutes without action,

18
00:00:49.890 --> 00:00:52.260
it will remove them from memory so

19
00:00:52.260 --> 00:00:55.380
that they no longer take up memory space.

20
00:00:55.380 --> 00:00:58.310
But this one, since I just used it is still running.

21
00:00:58.310 --> 00:01:02.580
And here, you can also see the size it's occupying in memory

22
00:01:02.580 --> 00:01:06.553
and where it's running, if it's on the GPU or a CPU.

23
00:01:06.553 --> 00:01:09.481
GPU, of course, is better as you learned.

24
00:01:09.481 --> 00:01:13.410
But Ollama list shows you all the models you downloaded

25
00:01:13.410 --> 00:01:16.921
onto your system or all the models that are available here.

26
00:01:16.921 --> 00:01:18.690
And that does include

27
00:01:18.690 --> 00:01:23.280
that Gemma 3 quantized 12 billion parameters model

28
00:01:23.280 --> 00:01:24.800
I downloaded earlier,

29
00:01:24.800 --> 00:01:29.800
but it also includes this s1 model I created, this

30
00:01:30.150 --> 00:01:33.750
copy I created, which is that Gemma 3 model

31
00:01:33.750 --> 00:01:35.870
because that is the model I saved,

32
00:01:35.870 --> 00:01:39.300
but it's that model with my system message,

33
00:01:39.300 --> 00:01:41.160
with my parameter settings,

34
00:01:41.160 --> 00:01:44.510
and with my chat history baked-in.

35
00:01:44.510 --> 00:01:48.270
And therefore, if I knew that I want to use this model

36
00:01:48.270 --> 00:01:50.670
that I want to continue with that chat history

37
00:01:50.670 --> 00:01:51.531
and use all the settings

38
00:01:51.531 --> 00:01:55.440
as I saved them when using Ollama Run, instead

39
00:01:55.440 --> 00:01:57.921
of using ollama run gemma3

40
00:01:57.921 --> 00:02:00.720
and so on, I could ollama run s1

41
00:02:00.720 --> 00:02:02.820
since that's the name I chose for saving it.

42
00:02:02.820 --> 00:02:05.790
And that will now restore that session

43
00:02:05.790 --> 00:02:10.500
and run that model with all the settings I did save.

44
00:02:10.500 --> 00:02:13.268
Now, of course, over time, as you work with Ollama,

45
00:02:13.268 --> 00:02:16.830
as you experiment with it and as you add more

46
00:02:16.830 --> 00:02:18.780
and more models, you might end up

47
00:02:18.780 --> 00:02:20.640
with a long list of models.

48
00:02:20.640 --> 00:02:22.470
And, of course, these models

49
00:02:22.470 --> 00:02:24.673
are taking up space on your system.

50
00:02:24.673 --> 00:02:29.040
Now, technically, despite it saying here

51
00:02:29.040 --> 00:02:32.203
that each model has a size of roughly nine gigabytes,

52
00:02:32.203 --> 00:02:37.203
Ollama is smart and it won't just copy models

53
00:02:37.620 --> 00:02:41.040
if you save the derivatives that are based on that model

54
00:02:41.040 --> 00:02:42.450
like that s1 model.

55
00:02:42.450 --> 00:02:45.285
So it will actually not take up 18 gigabytes

56
00:02:45.285 --> 00:02:47.344
of disc space here.

57
00:02:47.344 --> 00:02:51.450
Instead, it will just save the differences,

58
00:02:51.450 --> 00:02:54.706
the additional configuration you applied to the save model

59
00:02:54.706 --> 00:02:58.306
and then use that saved metadata in conjunction

60
00:02:58.306 --> 00:03:02.925
with the original model files when you run the s1 model.

61
00:03:02.925 --> 00:03:05.616
So you can easily create derivatives

62
00:03:05.616 --> 00:03:07.807
and customized versions of models

63
00:03:07.807 --> 00:03:11.376
without quickly cluttering up your disc.

64
00:03:11.376 --> 00:03:13.110
That's the good news.

65
00:03:13.110 --> 00:03:16.260
But still, of course, over time, you might end up

66
00:03:16.260 --> 00:03:17.776
with a long list of models

67
00:03:17.776 --> 00:03:21.990
where you might simply not need all the models anymore.

68
00:03:21.990 --> 00:03:25.560
That's why Ollama also gives you the rm command,

69
00:03:25.560 --> 00:03:27.510
which you can use to remove a model.

70
00:03:27.510 --> 00:03:31.848
And remove means remove it, also removes it from the disc.

71
00:03:31.848 --> 00:03:36.240
So therefore here, I can run ollama rm s1

72
00:03:36.240 --> 00:03:39.133
and that will remove that s1 model.

73
00:03:39.133 --> 00:03:42.570
Again, in case of this model, since it was based on

74
00:03:42.570 --> 00:03:43.800
that Gemma free model,

75
00:03:43.800 --> 00:03:46.373
it didn't actually take up any significant amount

76
00:03:46.373 --> 00:03:47.565
of disc space.

77
00:03:47.565 --> 00:03:50.954
But still, of course, there is no reason to keep old models

78
00:03:50.954 --> 00:03:54.450
around if you're just not using them anymore.

79
00:03:54.450 --> 00:03:57.432
What you can also do with the Ollama command

80
00:03:57.432 --> 00:04:00.771
when managing your locally downloaded models

81
00:04:00.771 --> 00:04:05.074
is use the show command to learn more about a model.

82
00:04:05.074 --> 00:04:09.125
For example, here, I can use Ollama show gemma3,

83
00:04:09.125 --> 00:04:13.304
and then I'll just repeat that, identify here

84
00:04:13.304 --> 00:04:15.570
to learn more about this model.

85
00:04:15.570 --> 00:04:18.450
And essentially, that's the same kind of information we got

86
00:04:18.450 --> 00:04:22.500
with the slash show command when running that model.

87
00:04:22.500 --> 00:04:24.780
Here, you can get it without starting that model

88
00:04:24.780 --> 00:04:27.000
just to learn more about it.

89
00:04:27.000 --> 00:04:29.713
And that can, of course, be particularly helpful

90
00:04:29.713 --> 00:04:34.713
if you don't know what exactly a model was about anymore.

91
00:04:34.856 --> 00:04:38.070
Because, of course, here with the gemma3 model,

92
00:04:38.070 --> 00:04:41.130
it's pretty clear that it's, well, the gemma3 model,

93
00:04:41.130 --> 00:04:43.465
the 12 billion parameter version of it.

94
00:04:43.465 --> 00:04:47.766
But what if you did run that model

95
00:04:47.766 --> 00:04:49.832
and you then set a system message

96
00:04:49.832 --> 00:04:54.832
like You are a friendly assistant and you then saved this,

97
00:04:56.283 --> 00:05:00.723
custom-m1, whatever, you can choose any identifier you want,

98
00:05:00.723 --> 00:05:04.980
and then you quit and you went on vacation, and

99
00:05:04.980 --> 00:05:06.210
after a time, you came back

100
00:05:06.210 --> 00:05:09.024
and you're wondering, what's this model about?

101
00:05:09.024 --> 00:05:13.672
Well, for situations like this, you can use Ollama show

102
00:05:13.672 --> 00:05:17.220
and you will learn that this was indeed based

103
00:05:17.220 --> 00:05:19.410
on the gemma3 model family,

104
00:05:19.410 --> 00:05:23.400
the 12-billion parameter model flavor specifically.

105
00:05:23.400 --> 00:05:26.130
And then you might remember that this was indeed

106
00:05:26.130 --> 00:05:29.431
that custom version of that gemma3 model you saved

107
00:05:29.431 --> 00:05:32.201
that had that system message baked-in.

108
00:05:32.201 --> 00:05:35.670
So that's why the Ollama show command can be useful

109
00:05:35.670 --> 00:05:38.490
for inspecting and understanding your models,

110
00:05:38.490 --> 00:05:40.473
especially your custom models.

