WEBVTT

1
00:00:00.720 --> 00:00:01.770
<v Instructor>So we now heard</v>

2
00:00:01.770 --> 00:00:04.890
about a couple of popular open models.

3
00:00:04.890 --> 00:00:06.990
You learned where to find more models

4
00:00:06.990 --> 00:00:09.960
and you learned why using an open model,

5
00:00:09.960 --> 00:00:14.610
why running it locally may be a good and interesting idea.

6
00:00:14.610 --> 00:00:17.970
What are your options for doing so then?

7
00:00:17.970 --> 00:00:22.560
How can you run open large language models locally then?

8
00:00:22.560 --> 00:00:24.840
Well, there are plenty of ways

9
00:00:24.840 --> 00:00:27.510
of running open models locally on your system

10
00:00:27.510 --> 00:00:30.540
or also on servers you may be rented,

11
00:00:30.540 --> 00:00:33.720
but arguably, the most popular options

12
00:00:33.720 --> 00:00:38.720
are using Llma.cpp, LM Studio or Ollama.

13
00:00:39.630 --> 00:00:42.090
Though it's worth noting that LM Studio

14
00:00:42.090 --> 00:00:47.090
and Ollama are really just wrappers around Llma.cpp.

15
00:00:47.130 --> 00:00:49.770
that make using it more accessible

16
00:00:49.770 --> 00:00:54.770
because Llma.cpp is a very low level solution.

17
00:00:55.080 --> 00:01:00.080
It is a project, a tool you can install on your system,

18
00:01:00.090 --> 00:01:05.090
you can run it on your system, but doing so is a bit clunky

19
00:01:05.190 --> 00:01:08.550
and definitely requires some technical expertise

20
00:01:08.550 --> 00:01:12.060
and maybe also programming experience.

21
00:01:12.060 --> 00:01:14.340
For example, if you wanted to download it,

22
00:01:14.340 --> 00:01:16.950
at least at the point of time where I'm recording this,

23
00:01:16.950 --> 00:01:19.590
especially on Windows, you have to go

24
00:01:19.590 --> 00:01:23.310
to the GitHub repository of this project to release this,

25
00:01:23.310 --> 00:01:27.330
and then from these files, pick the one that's the right one

26
00:01:27.330 --> 00:01:29.610
for your operating system.

27
00:01:29.610 --> 00:01:32.520
But even after doing so, you'll in the end just get

28
00:01:32.520 --> 00:01:35.130
a bunch of tools like the llama-cli

29
00:01:35.130 --> 00:01:39.390
which you have to execute in combination with a gguf file

30
00:01:39.390 --> 00:01:40.620
which is a file format.

31
00:01:40.620 --> 00:01:43.020
You for example, find on Hugging Face

32
00:01:43.020 --> 00:01:46.740
that contains the parameters of a model and some metadata.

33
00:01:46.740 --> 00:01:50.790
And then you would have a very basic AI chatbot experience

34
00:01:50.790 --> 00:01:54.000
in your command line interface, in your terminal.

35
00:01:54.000 --> 00:01:55.260
You could also spin up a server,

36
00:01:55.260 --> 00:01:58.435
but all that definitely is a bit more advanced

37
00:01:58.435 --> 00:02:03.180
and not really aimed at the normal regular end user

38
00:02:03.180 --> 00:02:06.810
who might not have a strong technical background.

39
00:02:06.810 --> 00:02:10.260
That's why on this project page, this GitHub page,

40
00:02:10.260 --> 00:02:11.940
you also find a section

41
00:02:11.940 --> 00:02:14.520
where they essentially recommend other tools

42
00:02:14.520 --> 00:02:16.470
that build up on Llama

43
00:02:16.470 --> 00:02:21.030
or work together with Llama that make using it easier.

44
00:02:21.030 --> 00:02:24.780
And here, you for example, find LM Studio,

45
00:02:24.780 --> 00:02:26.550
which is one of those other tools I mentioned

46
00:02:26.550 --> 00:02:29.880
and one of the main tools we'll focus on in this course

47
00:02:29.880 --> 00:02:33.336
and also, Ollama, which is the other major tool we'll use

48
00:02:33.336 --> 00:02:35.040
in this course.

49
00:02:35.040 --> 00:02:37.174
So as explained, these tools will use Llma.cpp

50
00:02:37.174 --> 00:02:38.460
under the hood.

51
00:02:38.460 --> 00:02:41.220
They will take advantage of it because it is an amazing tool

52
00:02:41.220 --> 00:02:43.440
that delivers amazing performance

53
00:02:43.440 --> 00:02:45.990
for running open models locally,

54
00:02:45.990 --> 00:02:47.730
but they make it much easier

55
00:02:47.730 --> 00:02:50.580
because Llma.cpp itself as mentioned,

56
00:02:50.580 --> 00:02:53.040
is a very low level solution

57
00:02:53.040 --> 00:02:55.680
that's aimed at experienced developers

58
00:02:55.680 --> 00:02:59.880
or people with a strong technical background, I would say.

59
00:02:59.880 --> 00:03:02.760
It does only offer that CLI and server mode as I mentioned.

60
00:03:02.760 --> 00:03:05.730
And whilst it does have plenty of configuration options,

61
00:03:05.730 --> 00:03:09.300
it also basically needs you to understand them

62
00:03:09.300 --> 00:03:11.850
and you might even be forced to build your own version

63
00:03:11.850 --> 00:03:16.200
of Llma.cpp to use it to its fullest extent.

64
00:03:16.200 --> 00:03:19.047
That's why instead in this course, we'll use LM Studio

65
00:03:19.047 --> 00:03:22.200
and Ollama and get all the Llma.cpp benefits

66
00:03:22.200 --> 00:03:24.030
without the pain.

67
00:03:24.030 --> 00:03:27.780
For example, LM Studio is an amazing solution

68
00:03:27.780 --> 00:03:30.510
for normal users.

69
00:03:30.510 --> 00:03:35.510
It gives you a very user-friendly graphical user interface

70
00:03:35.700 --> 00:03:38.730
and no technical expertise is required

71
00:03:38.730 --> 00:03:41.190
in order to use LM Studio.

72
00:03:41.190 --> 00:03:45.270
It makes it really simple to download, configure,

73
00:03:45.270 --> 00:03:48.360
and use open models on your system

74
00:03:48.360 --> 00:03:52.140
and it gives you a nice chat interface that even allows you

75
00:03:52.140 --> 00:03:54.120
to upload and use files

76
00:03:54.120 --> 00:03:57.060
and for example, ask questions about images

77
00:03:57.060 --> 00:03:59.160
or PDF documents.

78
00:03:59.160 --> 00:04:01.530
It does also give you a server mode

79
00:04:01.530 --> 00:04:04.050
besides this graphical user interface

80
00:04:04.050 --> 00:04:07.770
so that you could also run LM Studio locally on your system

81
00:04:07.770 --> 00:04:11.550
and interact with it through code if you were building

82
00:04:11.550 --> 00:04:14.820
an AI-powered application, some internal tool

83
00:04:14.820 --> 00:04:18.330
or some automation you wanna run on your system.

84
00:04:18.330 --> 00:04:21.660
So you can also use open models programmatically

85
00:04:21.660 --> 00:04:24.000
with LM Studio, which is also something

86
00:04:24.000 --> 00:04:25.770
we'll explore in this course.

87
00:04:25.770 --> 00:04:29.550
But what's really great about LM Studio is the ease

88
00:04:29.550 --> 00:04:32.730
of use it gives you whilst also giving you

89
00:04:32.730 --> 00:04:34.620
advanced configuration options

90
00:04:34.620 --> 00:04:37.050
that are there if you need them.

91
00:04:37.050 --> 00:04:40.920
So that will be one key tool we'll explore in depth

92
00:04:40.920 --> 00:04:42.540
in this course.

93
00:04:42.540 --> 00:04:45.540
Another amazing tool would be Ollama.

94
00:04:45.540 --> 00:04:46.710
Unlike LM Studio,

95
00:04:46.710 --> 00:04:49.680
this does not give you a graphical user interface,

96
00:04:49.680 --> 00:04:52.200
but it does give you a quite user-friendly

97
00:04:52.200 --> 00:04:54.180
command line interface.

98
00:04:54.180 --> 00:04:56.670
It's therefore not aimed at users

99
00:04:56.670 --> 00:04:59.190
that have no technical experience at all.

100
00:04:59.190 --> 00:05:02.430
You need some technical expertise,

101
00:05:02.430 --> 00:05:04.890
but you don't need to know how to write code

102
00:05:04.890 --> 00:05:07.500
or be an advanced system administrator

103
00:05:07.500 --> 00:05:09.660
or anything like that in order to use it.

104
00:05:09.660 --> 00:05:14.310
It's still very user-friendly as you'll see in this course.

105
00:05:14.310 --> 00:05:16.440
It gives you that CLI mode

106
00:05:16.440 --> 00:05:19.800
and just like LM Studio, also a server mode

107
00:05:19.800 --> 00:05:23.010
where you can communicate with the open models

108
00:05:23.010 --> 00:05:25.200
that are loaded through Ollama

109
00:05:25.200 --> 00:05:29.550
and managed through Ollama programmatically, which again,

110
00:05:29.550 --> 00:05:32.820
is also something we will explore in this course.

111
00:05:32.820 --> 00:05:35.730
And just like LM Studio, it works out of the box,

112
00:05:35.730 --> 00:05:39.600
but also does give you advanced configuration options

113
00:05:39.600 --> 00:05:41.820
that are there if you need them.

114
00:05:41.820 --> 00:05:45.300
And therefore, LM Studio and Ollama are the two main options

115
00:05:45.300 --> 00:05:47.400
we will explore in this course.

116
00:05:47.400 --> 00:05:50.520
Now I also want to mention one other way

117
00:05:50.520 --> 00:05:55.520
of using open models like the Llama models provided by Meta.

118
00:05:56.340 --> 00:06:00.720
And that would through paid services like Groq.

119
00:06:00.720 --> 00:06:04.320
Now, just to be clear, when using such a paid service,

120
00:06:04.320 --> 00:06:07.410
you of course lose some of the advantages you have

121
00:06:07.410 --> 00:06:09.090
when running these models locally

122
00:06:09.090 --> 00:06:12.270
on your system or your own servers.

123
00:06:12.270 --> 00:06:17.100
But when using Groq for example, you get easy

124
00:06:17.100 --> 00:06:22.050
and quite affordable on demand API-based access

125
00:06:22.050 --> 00:06:25.770
to some of the most popular open models out there.

126
00:06:25.770 --> 00:06:29.670
And especially if you're considering renting your own server

127
00:06:29.670 --> 00:06:31.710
to host and run a model there,

128
00:06:31.710 --> 00:06:34.653
you could instead also consider using Groq

129
00:06:34.653 --> 00:06:37.170
and their API to maybe get access

130
00:06:37.170 --> 00:06:42.030
to the same model at a higher speed and a lower price

131
00:06:42.030 --> 00:06:46.530
because they are able to host these models at scale.

132
00:06:46.530 --> 00:06:48.810
Of course, you lose the full control

133
00:06:48.810 --> 00:06:50.850
and privacy you have when hosting it

134
00:06:50.850 --> 00:06:52.500
on your own servers though,

135
00:06:52.500 --> 00:06:54.420
so it's simply a trade off that may

136
00:06:54.420 --> 00:06:56.790
or may not be worth it for you.

137
00:06:56.790 --> 00:06:59.490
Now in this course, we'll definitely not focus

138
00:06:59.490 --> 00:07:00.930
on services like Groq.

139
00:07:00.930 --> 00:07:04.950
Instead, we will really take a close look at running models

140
00:07:04.950 --> 00:07:07.440
locally with LM Studio and Ollama.

141
00:07:07.440 --> 00:07:09.570
But I still wanted to mention Groq

142
00:07:09.570 --> 00:07:13.020
and services like it as a possible alternative

143
00:07:13.020 --> 00:07:15.093
to running open models locally.