WEBVTT

1
00:00:00.720 --> 00:00:02.220
<v Maximilian>So what exactly</v>

2
00:00:02.220 --> 00:00:05.550
are open large language models?

3
00:00:05.550 --> 00:00:08.700
What exactly is open about them?

4
00:00:08.700 --> 00:00:11.490
Now to understand that, let's take a step back

5
00:00:11.490 --> 00:00:14.940
and let's look at what exactly defines

6
00:00:14.940 --> 00:00:16.800
a large language model.

7
00:00:16.800 --> 00:00:19.830
And in the end you could say it's two things.

8
00:00:19.830 --> 00:00:23.310
It's the training algorithm, the code that was written

9
00:00:23.310 --> 00:00:25.890
by the people that want to train such a model.

10
00:00:25.890 --> 00:00:29.550
So by the engineers working at OpenAI or Google,

11
00:00:29.550 --> 00:00:32.760
for example, and it's then that code in the end

12
00:00:32.760 --> 00:00:36.270
that influences or controls the training process.

13
00:00:36.270 --> 00:00:39.150
And as the result of that training process,

14
00:00:39.150 --> 00:00:42.510
we get a large language model that's in the end

15
00:00:42.510 --> 00:00:46.350
defined by its model weights or parameters,

16
00:00:46.350 --> 00:00:48.960
which is another common term you might have heard.

17
00:00:48.960 --> 00:00:52.890
In general, you might have heard about large language models

18
00:00:52.890 --> 00:00:57.360
being referred to as 600-billion parameter models

19
00:00:57.360 --> 00:00:58.890
or something like that.

20
00:00:58.890 --> 00:01:01.020
And by the way, if you didn't, that's no problem.

21
00:01:01.020 --> 00:01:04.650
I'll briefly explain it here, because for this course

22
00:01:04.650 --> 00:01:08.970
and in general when working with large language models,

23
00:01:08.970 --> 00:01:11.940
it's important to have at least a rough understanding

24
00:01:11.940 --> 00:01:14.220
of how these models work.

25
00:01:14.220 --> 00:01:17.490
So if you ask a model like the one that's being used

26
00:01:17.490 --> 00:01:20.160
by ChatGPT, which is not open by the way,

27
00:01:20.160 --> 00:01:24.420
but still a large language model, if you ask such a model,

28
00:01:24.420 --> 00:01:29.420
how do LLMs work, or anything else of course,

29
00:01:29.490 --> 00:01:33.180
that input, your text, gets broken down

30
00:01:33.180 --> 00:01:36.570
into smaller pieces called tokens.

31
00:01:36.570 --> 00:01:40.110
A token can be an entire word or a part of a word,

32
00:01:40.110 --> 00:01:43.140
and those models use tokens instead of words

33
00:01:43.140 --> 00:01:45.660
so that we don't have an infinite amount of words

34
00:01:45.660 --> 00:01:47.820
for all those different languages we have over

35
00:01:47.820 --> 00:01:52.380
the entire world, but instead, a finite amount of tokens

36
00:01:52.380 --> 00:01:55.170
that are building blocks of different words.

37
00:01:55.170 --> 00:01:58.140
But a token can also be an entire word.

38
00:01:58.140 --> 00:02:01.020
These tokens are then represented numerically

39
00:02:01.020 --> 00:02:04.110
by assigning a unique token ID to every token.

40
00:02:04.110 --> 00:02:05.550
And it's then these token IDs

41
00:02:05.550 --> 00:02:09.270
that are fed into the trained large language model.

42
00:02:09.270 --> 00:02:12.180
That large language model is essentially a massive,

43
00:02:12.180 --> 00:02:15.660
complex artificial neural network made up

44
00:02:15.660 --> 00:02:19.260
of interconnected layers and billions of nodes

45
00:02:19.260 --> 00:02:22.590
and connections that connect these different nodes.

46
00:02:22.590 --> 00:02:27.390
And every connection has a weight associated to it

47
00:02:27.390 --> 00:02:30.330
or a parameter, as we also call it.

48
00:02:30.330 --> 00:02:32.160
And it's these different weights

49
00:02:32.160 --> 00:02:35.490
that link all these different nodes that in the end

50
00:02:35.490 --> 00:02:39.540
influence which output token or token ID

51
00:02:39.540 --> 00:02:43.470
will be produced by the model for a given input.

52
00:02:43.470 --> 00:02:45.630
Though it's worth noting that the model

53
00:02:45.630 --> 00:02:48.540
will actually not just produce one token,

54
00:02:48.540 --> 00:02:52.080
but a list of token candidates

55
00:02:52.080 --> 00:02:56.520
where each candidate has a probability assigned to it.

56
00:02:56.520 --> 00:02:59.370
That probability is derived from all those weights

57
00:02:59.370 --> 00:03:02.910
and parameters based on the input token, the input text,

58
00:03:02.910 --> 00:03:04.830
and not just the latest token by the way,

59
00:03:04.830 --> 00:03:08.340
but the entire history of tokens, the entire history of text

60
00:03:08.340 --> 00:03:11.340
that was fed into the large language model.

61
00:03:11.340 --> 00:03:13.800
And then based on those generated probabilities,

62
00:03:13.800 --> 00:03:15.690
one of those candidates is picked.

63
00:03:15.690 --> 00:03:18.660
And if you control the application

64
00:03:18.660 --> 00:03:21.780
that's using the AI model, you can actually configure

65
00:03:21.780 --> 00:03:24.510
how that token should be picked,

66
00:03:24.510 --> 00:03:28.230
so how many tokens should be considered and so on.

67
00:03:28.230 --> 00:03:31.230
And we'll get back to that a little bit later in this course

68
00:03:31.230 --> 00:03:33.600
once we actually start using open models

69
00:03:33.600 --> 00:03:35.760
locally on our system.

70
00:03:35.760 --> 00:03:39.540
So these models are really just statistical models

71
00:03:39.540 --> 00:03:43.410
that generate output tokens and assign probabilities

72
00:03:43.410 --> 00:03:45.360
to the different output tokens

73
00:03:45.360 --> 00:03:47.760
based on the input you fed into them.

74
00:03:47.760 --> 00:03:50.220
And they produce token after token

75
00:03:50.220 --> 00:03:53.130
after token, so they're token generators,

76
00:03:53.130 --> 00:03:56.430
but they generate those tokens based on all

77
00:03:56.430 --> 00:04:00.030
the previously generated tokens and your input text,

78
00:04:00.030 --> 00:04:02.490
and that's how they're able to generate text

79
00:04:02.490 --> 00:04:06.150
that makes sense, where the different tokens and words

80
00:04:06.150 --> 00:04:09.750
and sentences are connected to each other.

81
00:04:09.750 --> 00:04:13.380
That's in a nutshell how large language models work.

82
00:04:13.380 --> 00:04:15.300
And by the way, if you wanna learn way more

83
00:04:15.300 --> 00:04:18.150
about how they work and what you can do with them

84
00:04:18.150 --> 00:04:20.310
and about prompt engineering and all that,

85
00:04:20.310 --> 00:04:23.190
my dedicated generative AI course

86
00:04:23.190 --> 00:04:25.020
is of course the way to go.

87
00:04:25.020 --> 00:04:27.600
Here in this course we won't dive deeply into that,

88
00:04:27.600 --> 00:04:30.600
but it is important to have this general understanding

89
00:04:30.600 --> 00:04:32.610
of what's going on here.

90
00:04:32.610 --> 00:04:34.560
So we have these model weights

91
00:04:34.560 --> 00:04:37.650
or parameters that describe the connections

92
00:04:37.650 --> 00:04:40.170
between these different nodes in this neural network.

93
00:04:40.170 --> 00:04:43.590
And it's these parameters that influence which output

94
00:04:43.590 --> 00:04:45.780
will be generated for a given input.

95
00:04:45.780 --> 00:04:48.030
That's the result of this training process,

96
00:04:48.030 --> 00:04:51.810
this huge list of weights and parameters

97
00:04:51.810 --> 00:04:54.210
that link the different nodes.

98
00:04:54.210 --> 00:04:57.720
And when we talk about open large language models,

99
00:04:57.720 --> 00:05:01.080
it's typically these weights or parameters

100
00:05:01.080 --> 00:05:03.660
that are publicly released under a specific

101
00:05:03.660 --> 00:05:05.850
license that controls how you are allowed

102
00:05:05.850 --> 00:05:08.700
to use these weights or parameters.

103
00:05:08.700 --> 00:05:12.930
And I'll get back to that license part a little bit later,

104
00:05:12.930 --> 00:05:14.190
but that's important.

105
00:05:14.190 --> 00:05:18.000
These open large language models typically

106
00:05:18.000 --> 00:05:21.570
are not about their training code being open.

107
00:05:21.570 --> 00:05:23.700
That's typically not the case.

108
00:05:23.700 --> 00:05:26.910
Instead, it's the result of the training process,

109
00:05:26.910 --> 00:05:28.740
these model parameters.

110
00:05:28.740 --> 00:05:30.480
That's the part that's open.

111
00:05:30.480 --> 00:05:33.090
And that's of course the important part when it comes

112
00:05:33.090 --> 00:05:34.950
to running these models locally,

113
00:05:34.950 --> 00:05:38.190
because you typically don't wanna train them locally.

114
00:05:38.190 --> 00:05:40.290
You would need the training code for that,

115
00:05:40.290 --> 00:05:41.970
but you wanna run them locally.

116
00:05:41.970 --> 00:05:45.090
And for that, you just need these parameters

117
00:05:45.090 --> 00:05:47.970
as you'll see throughout this course.

118
00:05:47.970 --> 00:05:51.750
And that's also the difference compared to closed

119
00:05:51.750 --> 00:05:54.420
or proprietary models like the ones

120
00:05:54.420 --> 00:05:56.940
that are being used by ChatGPT.

121
00:05:56.940 --> 00:06:01.200
There, you don't get access to the model parameters

122
00:06:01.200 --> 00:06:03.690
or weights, so you neither have access

123
00:06:03.690 --> 00:06:05.790
to the code that was used for training

124
00:06:05.790 --> 00:06:08.550
nor to the result of that training process.

125
00:06:08.550 --> 00:06:12.600
All you can use is the application, ChatGPT,

126
00:06:12.600 --> 00:06:14.880
OpenAI built for you,

127
00:06:14.880 --> 00:06:18.720
or you can use the model through OpenAI's API,

128
00:06:18.720 --> 00:06:23.250
but you cannot download those weights and run them locally

129
00:06:23.250 --> 00:06:25.833
or on your own server, for example.

