WEBVTT

1
00:00:00.510 --> 00:00:04.500
<v Instructor>So there are open Large Language Models</v>

2
00:00:04.500 --> 00:00:07.290
which you can download and run on your system,

3
00:00:07.290 --> 00:00:10.260
and in this course, you will learn how exactly that works

4
00:00:10.260 --> 00:00:12.240
and you will learn about hardware requirements.

5
00:00:12.240 --> 00:00:14.880
And that indeed you can run

6
00:00:14.880 --> 00:00:18.017
many of these models on consumer hardware,

7
00:00:18.017 --> 00:00:20.490
even on low end hardware.

8
00:00:20.490 --> 00:00:22.770
But the question is, why would you want to do that?

9
00:00:22.770 --> 00:00:26.730
Why not just use proprietary large language models?

10
00:00:26.730 --> 00:00:31.730
Why not stick to ChatGPT, Google Gemini, the X AI, Grok AI

11
00:00:33.090 --> 00:00:35.580
or anything like that?

12
00:00:35.580 --> 00:00:39.360
What's the advantage of using open Large Language Models

13
00:00:39.360 --> 00:00:42.180
compared to proprietary ones?

14
00:00:42.180 --> 00:00:44.130
Well, for example,

15
00:00:44.130 --> 00:00:46.620
because they're free to use.

16
00:00:46.620 --> 00:00:48.420
They're open, you can download them

17
00:00:48.420 --> 00:00:49.620
and run them on your system,

18
00:00:49.620 --> 00:00:51.420
and that's exactly what we'll do in this course.

19
00:00:51.420 --> 00:00:53.520
Now you must respect their license,

20
00:00:53.520 --> 00:00:56.010
but private use is pretty much always allowed.

21
00:00:56.010 --> 00:00:58.440
Commercial use is also typically allowed,

22
00:00:58.440 --> 00:01:01.470
and I'll get back to the license part later.

23
00:01:01.470 --> 00:01:02.850
So you can use them for free.

24
00:01:02.850 --> 00:01:04.770
Whereas for proprietary models,

25
00:01:04.770 --> 00:01:06.270
you of course, have to pay

26
00:01:06.270 --> 00:01:08.220
either based on your usage

27
00:01:08.220 --> 00:01:09.690
or a subscription fee,

28
00:01:09.690 --> 00:01:11.274
like it's the case with ChatGPT

29
00:01:11.274 --> 00:01:15.480
and basically all these other AI chat bots.

30
00:01:15.480 --> 00:01:18.030
But of course, you happily pay for these models

31
00:01:18.030 --> 00:01:21.240
because the open models are way worse

32
00:01:21.240 --> 00:01:23.100
than the proprietary ones, right?

33
00:01:23.100 --> 00:01:26.460
The performance is worse, the results are worse.

34
00:01:26.460 --> 00:01:28.680
Well, not really.

35
00:01:28.680 --> 00:01:31.020
If you take a look at benchmarks,

36
00:01:31.020 --> 00:01:33.810
and of course, you should take those with a grain of salt

37
00:01:33.810 --> 00:01:36.510
because models can be optimized for benchmarks.

38
00:01:36.510 --> 00:01:38.370
And of course, the providers of these models

39
00:01:38.370 --> 00:01:40.830
only publish the results they like.

40
00:01:40.830 --> 00:01:43.800
But still, if you take a look at benchmark comparisons

41
00:01:43.800 --> 00:01:44.700
like this one here,

42
00:01:44.700 --> 00:01:46.950
which was published by Google,

43
00:01:46.950 --> 00:01:49.290
you can see that they're Gemma three models,

44
00:01:49.290 --> 00:01:52.830
which are their open large language models

45
00:01:52.830 --> 00:01:54.960
are not much worse

46
00:01:54.960 --> 00:01:58.710
than their best in class proprietary models.

47
00:01:58.710 --> 00:02:00.060
Now this is of course, a bit older

48
00:02:00.060 --> 00:02:01.560
by the time you are watching this,

49
00:02:01.560 --> 00:02:03.840
and actually even by the time I'm recording this,

50
00:02:03.840 --> 00:02:05.550
there are newer models available,

51
00:02:05.550 --> 00:02:07.380
but this is just an example.

52
00:02:07.380 --> 00:02:09.060
The trend of course continues.

53
00:02:09.060 --> 00:02:12.120
These open models are not much worse

54
00:02:12.120 --> 00:02:14.130
than the proprietary ones.

55
00:02:14.130 --> 00:02:15.900
If you ignore the benchmarks

56
00:02:15.900 --> 00:02:17.430
and you instead take something like

57
00:02:17.430 --> 00:02:20.460
the chatbot arena leaderboard,

58
00:02:20.460 --> 00:02:23.100
which is a leaderboard where users like you and me

59
00:02:23.100 --> 00:02:25.680
vote models up and down,

60
00:02:25.680 --> 00:02:30.180
you will see that they open models like DeepSeek,

61
00:02:30.180 --> 00:02:32.100
but all the Google's Gemma 3 model,

62
00:02:32.100 --> 00:02:34.890
which is another open model just to make that really clear,

63
00:02:34.890 --> 00:02:39.420
rank amongst the top ranks of that leaderboard.

64
00:02:39.420 --> 00:02:42.870
So clearly they're not far behind the proprietary models

65
00:02:42.870 --> 00:02:44.610
and they're free.

66
00:02:44.610 --> 00:02:47.070
Because unlike those proprietary models,

67
00:02:47.070 --> 00:02:49.530
these open models or their weights

68
00:02:49.530 --> 00:02:51.150
specifically can be downloaded

69
00:02:51.150 --> 00:02:55.500
and can be ran locally on your machine or on your server.

70
00:02:55.500 --> 00:02:59.610
And therefore you don't just get the free usage

71
00:02:59.610 --> 00:03:02.160
if you ignore hardware cost of course,

72
00:03:02.160 --> 00:03:05.340
but you also get 100% privacy.

73
00:03:05.340 --> 00:03:08.490
The data you send to that model, your prompts,

74
00:03:08.490 --> 00:03:10.170
the output it generates,

75
00:03:10.170 --> 00:03:13.200
any documents or images you might be using in the prompt,

76
00:03:13.200 --> 00:03:17.340
all that stays on your machine, it never leaves it.

77
00:03:17.340 --> 00:03:19.530
And that of course, is a huge advantage

78
00:03:19.530 --> 00:03:23.130
compared to using proprietary large language models.

79
00:03:23.130 --> 00:03:26.100
There. Your prompts or the generated output

80
00:03:26.100 --> 00:03:28.680
might be used for further training.

81
00:03:28.680 --> 00:03:30.870
Even if it's not, it might be locked.

82
00:03:30.870 --> 00:03:32.340
And even if that's not the case,

83
00:03:32.340 --> 00:03:34.380
you still have to send your data

84
00:03:34.380 --> 00:03:38.160
to the servers of AI or Google.

85
00:03:38.160 --> 00:03:40.320
So if you are using a Large Language Model

86
00:03:40.320 --> 00:03:42.360
to analyze sensitive data,

87
00:03:42.360 --> 00:03:44.970
to analyze a confidential document,

88
00:03:44.970 --> 00:03:48.180
you might not wanna do that when using a proprietary model.

89
00:03:48.180 --> 00:03:50.490
You might not even be allowed to do that

90
00:03:50.490 --> 00:03:52.590
if you're working in a company.

91
00:03:52.590 --> 00:03:56.400
So that's a huge advantage of running open models locally.

92
00:03:56.400 --> 00:03:57.660
It stays on your machine.

93
00:03:57.660 --> 00:04:00.270
You have guaranteed privacy.

94
00:04:00.270 --> 00:04:02.550
In addition, you got no vendor lock-in,

95
00:04:02.550 --> 00:04:04.320
you have full control

96
00:04:04.320 --> 00:04:06.840
over that model that's running locally.

97
00:04:06.840 --> 00:04:08.280
For the proprietary models,

98
00:04:08.280 --> 00:04:11.820
if OpenAI or Google or X, doesn't matter,

99
00:04:11.820 --> 00:04:14.190
if they decide that they wanna roll out

100
00:04:14.190 --> 00:04:15.810
a new version of a model,

101
00:04:15.810 --> 00:04:19.440
that they want to change some quotas or rate limits,

102
00:04:19.440 --> 00:04:21.060
there's nothing you can do about that.

103
00:04:21.060 --> 00:04:23.070
If the model suddenly performs worse

104
00:04:23.070 --> 00:04:25.800
than it did yesterday, you're out of luck.

105
00:04:25.800 --> 00:04:28.020
Of course, if you run it locally on your machine,

106
00:04:28.020 --> 00:04:29.400
however, that can't happen,

107
00:04:29.400 --> 00:04:32.850
you control which version of which model is running there.

108
00:04:32.850 --> 00:04:36.810
So you have full control, you got no vendor lock-in.

109
00:04:36.810 --> 00:04:39.510
Kind of related to that point,

110
00:04:39.510 --> 00:04:41.520
these locally running open models,

111
00:04:41.520 --> 00:04:43.380
of course, are offline first.

112
00:04:43.380 --> 00:04:46.140
They are running on your machine after all,

113
00:04:46.140 --> 00:04:49.410
so you got low or almost no latency

114
00:04:49.410 --> 00:04:51.600
when interacting with them,

115
00:04:51.600 --> 00:04:53.790
which is of course particularly interesting

116
00:04:53.790 --> 00:04:57.570
if you maybe build your own internal tools that leverage AI.

117
00:04:57.570 --> 00:05:00.270
You can also use an open model

118
00:05:00.270 --> 00:05:01.951
in your own AI powered applications,

119
00:05:01.951 --> 00:05:04.800
and that is one example use case,

120
00:05:04.800 --> 00:05:07.680
one of many we'll take a look at throughout this course,

121
00:05:07.680 --> 00:05:10.260
how you can use such locally running

122
00:05:10.260 --> 00:05:12.600
open models programmatically.

123
00:05:12.600 --> 00:05:15.750
Of course, with proprietary large language models,

124
00:05:15.750 --> 00:05:19.140
for example, ChatGPT, an internet connection is required.

125
00:05:19.140 --> 00:05:21.317
You can't use them from inside an airplane

126
00:05:21.317 --> 00:05:24.150
or when servers are down.

127
00:05:24.150 --> 00:05:26.640
And that's why open large language models

128
00:05:26.640 --> 00:05:29.610
when being used locally or on your own servers

129
00:05:29.610 --> 00:05:32.220
are great for many use cases.

130
00:05:32.220 --> 00:05:34.650
You can use them as regular chat bots

131
00:05:34.650 --> 00:05:36.210
because as I showed you,

132
00:05:36.210 --> 00:05:39.210
they do perform really well for that.

133
00:05:39.210 --> 00:05:41.550
So you can just use them as a replacement

134
00:05:41.550 --> 00:05:45.270
for ChatGPT or any of these other chat bots.

135
00:05:45.270 --> 00:05:46.560
And therefore you can, for example,

136
00:05:46.560 --> 00:05:48.390
also use them to generate code

137
00:05:48.390 --> 00:05:50.670
or do anything else like that.

138
00:05:50.670 --> 00:05:53.670
But they especially shine when using them

139
00:05:53.670 --> 00:05:55.800
in tools you might be building,

140
00:05:55.800 --> 00:05:57.720
when using them for tasks like

141
00:05:57.720 --> 00:06:00.510
text summarization, data analysis

142
00:06:00.510 --> 00:06:03.930
or content generation with few-shot prompting.

143
00:06:03.930 --> 00:06:06.900
And these are examples we will explore

144
00:06:06.900 --> 00:06:07.800
throughout this course

145
00:06:07.800 --> 00:06:11.370
and I'll show you how to use these open models for that.

146
00:06:11.370 --> 00:06:13.410
And therefore, it's really just the cases

147
00:06:13.410 --> 00:06:15.900
where you need cutting edge performance,

148
00:06:15.900 --> 00:06:18.630
where you need the best in class performance,

149
00:06:18.630 --> 00:06:20.190
where you have to reach

150
00:06:20.190 --> 00:06:23.130
for a proprietary Large Language Model.

151
00:06:23.130 --> 00:06:25.440
So this course is actually also not about

152
00:06:25.440 --> 00:06:29.370
replacing Chat GPT with such a locally running open model,

153
00:06:29.370 --> 00:06:31.950
though you could probably do that for many use cases,

154
00:06:31.950 --> 00:06:33.690
maybe for all use cases

155
00:06:33.690 --> 00:06:36.300
it's about getting the best of both worlds

156
00:06:36.300 --> 00:06:38.160
and it's about leveraging and using

157
00:06:38.160 --> 00:06:41.190
such a open, locally running AI model

158
00:06:41.190 --> 00:06:44.340
for all those use cases where it really shines

159
00:06:44.340 --> 00:06:46.800
and where it can offer a huge advantage.

160
00:06:46.800 --> 00:06:50.133
And we'll explore all that step by step in this course.