WEBVTT

00:00.240 --> 00:05.240
In this video, we'll cover the most important information about large language models.

00:05.240 --> 00:09.520
And this knowledge will be very essential because we'll talk about how they are built.

00:09.560 --> 00:12.120
What even is a large language model.

00:12.120 --> 00:16.760
We'll talk about the fine tuning, how large language models are creating the responses, and a lot

00:16.800 --> 00:17.400
of more.

00:17.640 --> 00:21.720
And now there is a question why do we want to learn about that?

00:21.760 --> 00:23.360
And actually let me explain.

00:23.400 --> 00:31.080
When we create our workflows automations inside here and then we are using agents which are purely LM

00:31.080 --> 00:31.680
models.

00:31.680 --> 00:36.800
So for example, we are using ChatGPT, deep seq and other different applications.

00:36.920 --> 00:40.840
And when you understand how they work, how actually it works.

00:40.840 --> 00:47.320
So LM models, you are able to create better systems, more sophisticated agents and in general better

00:47.320 --> 00:48.200
projects.

00:48.200 --> 00:53.880
So again this video can be quite long over here and will cover all of the information you need to know

00:54.040 --> 00:55.560
about LM models.

00:55.560 --> 00:57.560
And again it will be very important.

00:57.600 --> 00:59.720
So with all that being said let's dive in.

00:59.720 --> 00:59.810
Stephen.

01:00.010 --> 01:00.490
All right.

01:00.490 --> 01:02.650
So let's start with the most basic question.

01:02.650 --> 01:04.010
What is an LM.

01:04.130 --> 01:05.650
And it's a large language model.

01:05.650 --> 01:11.090
And here actually we can read LM is an advanced type of artificial intelligence that can understand

01:11.090 --> 01:16.570
and generate human like text alarms, power tools like ChatGPT, cloud Deep Seek and many others.

01:16.730 --> 01:22.050
So here for an example, we've got Grok and Google Gemini, ChatGPT and again cloud.

01:22.170 --> 01:24.130
So these are LM models.

01:24.130 --> 01:27.810
And again we are using them actually with our agents.

01:27.810 --> 01:29.610
So you know we are using these models.

01:29.610 --> 01:35.170
And here they are trained on huge amounts of text from the internet books and more.

01:35.290 --> 01:38.090
So I like to compare that to a big library.

01:38.090 --> 01:42.970
And not only one however, you know, thousands of them, or even sometimes more.

01:43.170 --> 01:47.530
So LM model is built from huge amount of the data.

01:47.730 --> 01:49.690
So it contains all of the data.

01:50.170 --> 01:51.290
It's trained on that.

01:51.290 --> 01:57.930
And actually it's very powerful because actually imagine inside LM you can have all of the information

01:58.490 --> 02:05.490
from a lot of big libraries, like any articles, any, you know, blog posts, like a lot of a lot

02:05.530 --> 02:06.170
of data.

02:06.210 --> 02:08.050
However about that also later.

02:08.450 --> 02:13.330
So they learned to answer questions, write stories, summarize translate code and much more.

02:13.650 --> 02:15.130
So again it's also powerful.

02:15.130 --> 02:21.370
You've got kind of the model that can think and also is trained actually on a huge amount of the data.

02:21.650 --> 02:28.850
And even, you know, like models like bit like people and thinking and reasoning processes and even

02:28.850 --> 02:33.850
for now, we had a lot of different, um, you know, research about the health care and diagnosing

02:33.850 --> 02:35.490
different problems and so on.

02:35.490 --> 02:40.450
So, I mean, like these like, models are so, so powerful.

02:40.810 --> 02:45.170
Um, however, you will see that like, um, you know, like later in this video.

02:45.490 --> 02:46.570
Um, let's go ahead.

02:47.290 --> 02:51.250
And here we've got another topic, however, LM builds.

02:51.290 --> 02:56.090
And I can say it's not that hard to comprehend because we've got only two files.

02:56.090 --> 02:59.740
So at its core, an LM is surprisingly simple to run.

02:59.780 --> 03:01.100
We've got only two files.

03:01.100 --> 03:02.980
The first one is parameters file.

03:03.340 --> 03:03.820
I'm actually.

03:03.820 --> 03:08.340
I will explain that more later after we cover like more topics.

03:08.340 --> 03:11.940
For example I'm how LM is creating the responses.

03:12.100 --> 03:15.740
However, this is the giant file containing everything the model knows.

03:15.740 --> 03:19.100
It's the result of training on huge data on huge datasets.

03:19.140 --> 03:23.980
For a model like llama M2 70 billion parameters.

03:23.980 --> 03:25.780
So this is 70 billion.

03:26.060 --> 03:29.620
This file can be over 140GB.

03:29.820 --> 03:31.420
So it's a huge file.

03:31.620 --> 03:35.060
And yeah like we've got a huge data sets stored.

03:35.060 --> 03:37.140
And the weight is very big.

03:37.540 --> 03:42.700
And the second file is run a runner file which is also the code file.

03:42.820 --> 03:49.060
So short script C, Python or other languages that leads the parameters and runs the model letting you

03:49.060 --> 03:50.060
interact with it.

03:50.660 --> 03:53.740
So actually you know we've got only two files.

03:53.740 --> 03:55.860
So there you've got the analogy.

03:56.060 --> 03:59.550
Think of of the parameters file as the brain.

03:59.550 --> 04:03.390
So it's the brain actually here memory and experience.

04:03.710 --> 04:05.910
And then the runner file as the body.

04:05.910 --> 04:08.230
So code to make it work.

04:08.470 --> 04:12.950
So actually the runner file actually takes you know all of the knowledge here, all of the information

04:12.950 --> 04:13.670
from there.

04:13.830 --> 04:16.950
And like does the specific actions you know.

04:17.150 --> 04:20.950
So this is very simple I think it's not that hard to comprehend.

04:20.950 --> 04:28.670
So we've got actually our database huge database data sets here inside the parameters file.

04:28.670 --> 04:34.630
And actually here the runner file which is doing some specific tasks based on the parameters file.

04:34.630 --> 04:36.630
All right let's go ahead.

04:36.830 --> 04:38.870
And now there is a very important topic.

04:38.910 --> 04:40.790
How does an LLM think.

04:41.030 --> 04:48.590
Because it's very tricky sometimes actually we can think that LLM is actually thinking as the real human.

04:48.630 --> 04:55.150
However this is not the truth because LLM can't understand what you are typing and instead they are

04:55.150 --> 04:59.600
basing on the weights on actually different, you know, things.

04:59.720 --> 05:03.400
However, let me actually explain that lambs don't think like humans.

05:03.400 --> 05:09.040
They predict the next word in a sentence, one word at a time, using everything they learned during

05:09.080 --> 05:09.760
training.

05:09.840 --> 05:13.440
So example the Eiffel Tower is located in.

05:13.440 --> 05:18.880
So here the LM analyzes the context and assigns probabilities to possible next words.

05:19.040 --> 05:21.560
So it can't understand this text however.

05:21.800 --> 05:25.840
And we've got a lot of different parameters huge data sets.

05:25.880 --> 05:28.000
And now we actually actually imagine this.

05:28.000 --> 05:30.720
By this way the Eiffel Tower is located in.

05:30.760 --> 05:35.480
So the LM analyzes the context and assigns probabilities to possible next words.

05:35.480 --> 05:37.520
And now imagine the following situation.

05:37.560 --> 05:39.840
You've seen, let's say ten articles.

05:39.840 --> 05:40.280
All right.

05:40.280 --> 05:43.480
In your career you've seen ten articles like whatever.

05:43.640 --> 05:51.160
And here every time whenever you had the word actually like this sentence, um, like you've seen the

05:51.160 --> 05:52.240
word Paris next.

05:52.240 --> 05:52.880
All right.

05:52.880 --> 05:55.840
So llms work and by this way.

05:55.840 --> 06:02.840
So they predict what is actually I can say what fits to the to the previous word and to the entire sentence,

06:02.840 --> 06:03.400
you know.

06:03.400 --> 06:10.080
So actually here, um, we've got the Eiffel Tower is located in and actually on the database and in

06:10.080 --> 06:10.960
the data sets.

06:11.160 --> 06:13.400
And mostly we had the word which is Paris.

06:13.400 --> 06:14.120
All right.

06:14.160 --> 06:18.960
And here, um, the probability of that is 97%.

06:19.280 --> 06:25.520
And the next word is France, the second is the third is London, and the fourth is New York.

06:25.840 --> 06:27.360
Um, so we've got the explanation.

06:27.360 --> 06:33.200
The model will most likely choose Paris as the next word because it learned this, um, is the best

06:33.200 --> 06:33.760
fit.

06:33.760 --> 06:37.440
So from huge amount of databases, data sets and more.

06:38.080 --> 06:43.800
Um, however, there we've got some very important actually a metric which is called temperature.

06:43.960 --> 06:50.360
So sometimes when we have like when we have our models, we can actually adjust the temperature.

06:50.520 --> 06:53.120
And it's I can say it's the creativeness.

06:53.120 --> 06:59.450
So whenever the temperature is higher, like the model is more creative and might surprise you with

06:59.490 --> 07:01.130
unusual words or phrases.

07:01.250 --> 07:06.730
So imagine that us like Paris, suddenly and has less percent.

07:06.730 --> 07:10.010
And for an example, France has like more percent, you know.

07:10.250 --> 07:16.490
So here when you have higher temperature inside the model, it produces just more creative responses,

07:16.490 --> 07:17.970
just more unusual.

07:18.010 --> 07:20.890
However, with the low temperature, it's very predictable.

07:20.890 --> 07:26.570
So the model is more conservative, always picks the most likely next word and the example.

07:26.570 --> 07:30.010
So with low temperature the cat sat on the mat.

07:30.010 --> 07:33.330
And here with high temperature the cat sat on the piano.

07:34.010 --> 07:35.330
So it works by this way.

07:35.370 --> 07:40.810
However, the most important fact is that, um, LM doesn't think like the human.

07:40.970 --> 07:46.090
However, it takes actually different weights, um, from the from the data they learned.

07:46.410 --> 07:48.410
And here, um, yeah.

07:48.610 --> 07:51.410
And here actually it decides what should be next.

07:51.410 --> 07:53.210
So again you've got the probabilities.

07:53.460 --> 07:58.700
And now let's go back to our parameters file because it's very important.

07:58.940 --> 08:04.620
And every time the model makes a prediction actually it checks if it was correct or right.

08:04.740 --> 08:11.620
And if it wasn't it slightly like tweaks um, like the parameter um, which is a weight.

08:11.820 --> 08:14.820
Um, so it gets a little better next time.

08:15.060 --> 08:17.100
So parameters are weights.

08:17.100 --> 08:18.420
This is not the text however.

08:18.460 --> 08:19.300
These are weights.

08:19.540 --> 08:24.780
And every time whenever we have like, you know, like the answer it tweaks.

08:24.940 --> 08:25.500
Um, yeah.

08:25.660 --> 08:27.580
It tweaks em to the perfection.

08:27.700 --> 08:33.380
So like the fact from that is, is that like LM models are still being improved?

08:33.460 --> 08:37.060
Um, however about that later in the fine tuning and improving the model.

08:37.220 --> 08:43.460
So let's go ahead right here to the fourth point which is training LMS.

08:43.700 --> 08:45.820
So it's like compressing the internet.

08:46.220 --> 08:50.460
Um, actually I gave you a comparison to a library.

08:50.500 --> 08:54.140
However, internet is also a great comparison.

08:54.140 --> 08:56.500
So on the internet you've got a lot of data.

08:56.500 --> 09:02.460
And here training an LLM is like compressing a huge chunk of the internet into a brain of parameters.

09:02.580 --> 09:07.660
Companies like meta, OpenAI, Google and deep sea train models using massive data, data, datasets,

09:07.700 --> 09:12.020
website books, articles, sometimes ten terabytes or more.

09:12.940 --> 09:14.580
So it's so, so great.

09:15.020 --> 09:18.060
Um, I've got for you some, um, nice actually.

09:18.060 --> 09:21.060
Note because Deep SIG did something differently.

09:21.340 --> 09:24.460
Um, however, I, I explained that in my deep SIG masterclass.

09:24.460 --> 09:26.660
Nevertheless, um, in a while about that.

09:26.860 --> 09:33.300
Um, so training requires thousands of GPUs for days or weeks, costing millions of dollars.

09:33.460 --> 09:36.140
So the cost is really, really high.

09:36.500 --> 09:38.900
The result is a huge parameters file.

09:38.900 --> 09:42.220
It's not a copy of the internet, but a lossy compression.

09:42.260 --> 09:45.100
It remembers patterns, facts and styles.

09:45.100 --> 09:46.220
And here is the note.

09:46.260 --> 09:51.550
Because actually, you know, common llms are trained on the data from internet.

09:51.590 --> 09:54.670
It takes a huge actually GPU.

09:55.110 --> 09:59.830
And also, you know, the space like the cost is very, very high.

10:00.150 --> 10:07.670
However, Deep Seek was released and it was a total surprise for everyone because what they did, they

10:07.710 --> 10:13.550
actually trained their models not on the data from the internet, articles, books, like huge data

10:13.590 --> 10:14.030
sets.

10:14.350 --> 10:18.670
However, they trained its model on other models.

10:18.670 --> 10:26.150
So actually they took, um, they work in simple words and they trained um, its model on that.

10:26.150 --> 10:33.110
So actually the cost was, um, you know, actually $6 million, which is just, um, a funny amount.

10:33.150 --> 10:33.510
All right.

10:33.550 --> 10:35.110
For training models.

10:35.390 --> 10:36.710
Um, however, let's read it.

10:36.710 --> 10:41.790
So models like Deep Six unique training methods or data sources, but the core concept remains trained

10:41.830 --> 10:45.230
on huge data to create a general purpose language engine.

10:45.950 --> 10:48.150
So again, it's very important.

10:48.350 --> 10:55.360
And I'm like training LLM LM is taking all of the data, information, putting in insight and using

10:55.360 --> 10:58.360
that, you know, actually to work do some specific tasks.

10:58.960 --> 11:01.960
Next what we have open source versus closed models.

11:01.960 --> 11:06.120
So open source models weights and code are available for everyone.

11:06.120 --> 11:10.440
You can download, read, modify or even fine tune them for custom uses.

11:10.920 --> 11:16.960
And what's very important, even when you have actually an open source model, you can download that

11:16.960 --> 11:18.080
on your computer.

11:18.200 --> 11:22.280
And that means you can even run it without internet connection.

11:22.320 --> 11:26.120
It runs on your actual computer locally.

11:26.280 --> 11:28.360
And here we've got you've got examples.

11:28.360 --> 11:32.560
So llama two, Mistral, Deep Sea Zephyr V3 Microsoft.

11:33.280 --> 11:41.400
And what is even very, very important actually when you run your model on your computer actually you

11:41.440 --> 11:42.960
keep all of the information.

11:43.160 --> 11:43.720
Um, yeah.

11:43.760 --> 11:44.480
Private.

11:44.680 --> 11:51.920
However, when you have closed models you're working on, for example, on OpenAI servers and here anthropic

11:51.920 --> 11:53.240
or even deep sea servers.

11:53.240 --> 11:56.760
And all of the data you provide is sent over to these companies.

11:56.760 --> 11:59.600
So let's say to China, United States and so on.

11:59.800 --> 12:06.480
So, um, actually, when you want to be very safe with what you type inside the model, um, you can

12:06.520 --> 12:08.280
like download Deep Seek on your computer.

12:08.320 --> 12:08.880
All right.

12:08.880 --> 12:09.760
You can do this.

12:09.760 --> 12:15.000
And then you can be 100% sure that data won't be sent anywhere.

12:15.680 --> 12:18.800
However, now let's talk about what are the close models.

12:18.800 --> 12:23.520
So you can you can access them only through a web interface or API.

12:23.880 --> 12:30.480
Code and weights are not public used via subscription or licensing examples ChatGPT cloud Gemini.

12:30.600 --> 12:35.640
So the most popular models however um, like it's very important to know the difference.

12:36.240 --> 12:41.040
Now what we have fine tuning how Llms are improved.

12:41.040 --> 12:48.360
So after initial training companies and now users can fine tune Llms to become better assistants, specialists

12:48.360 --> 12:49.450
or agents.

12:49.610 --> 12:50.530
Fine tuning.

12:50.530 --> 12:53.210
Start with a base LM general knowledge.

12:53.250 --> 12:59.690
Train further on custom datasets like Q and R pairs conversations or company documents.

12:59.730 --> 13:04.610
Also, you know models are trained on users responses.

13:04.650 --> 13:04.970
Yeah.

13:05.010 --> 13:11.530
So if you type the response, you provide the data and actually the parameters are improved and it's

13:11.530 --> 13:13.410
getting better over time.

13:13.570 --> 13:19.450
So the model becomes specialized better at helping following instructions or doing specific tasks.

13:19.570 --> 13:21.410
So imagine that as the experience.

13:21.450 --> 13:21.770
Yeah.

13:21.810 --> 13:22.850
As the experience.

13:22.850 --> 13:24.170
You work for an example.

13:24.170 --> 13:29.370
And the job for um, let's say one month or even ten years, you know.

13:29.410 --> 13:30.810
So this is the huge gap.

13:31.130 --> 13:32.650
Why is that important?

13:32.690 --> 13:38.330
It turns a generic document generator into a helpful assistant, customer support bot, coding helper

13:38.330 --> 13:40.250
or expert in a field.

13:40.410 --> 13:45.730
So it works by this way, it's very, very important to know how llms are improved.

13:45.730 --> 13:48.780
So they just get, over time more data.

13:48.780 --> 13:51.060
And by that they have like the feedback.

13:51.060 --> 13:51.660
All right.

13:51.660 --> 13:53.220
They collect the feedback.

13:53.220 --> 13:57.140
And on the feedback they get better and they produce better results.

13:57.820 --> 14:00.140
Yeah it's a similar situation.

14:00.140 --> 14:03.700
Whenever you know you've got the chat actually let me change the camera.

14:03.940 --> 14:08.580
Let's say you've got the chat with the ChatGPT and it produces for you about responses.

14:08.580 --> 14:11.860
And you say, all right, like give me something else.

14:11.860 --> 14:14.900
I don't like the, um, you know, uppercase.

14:14.900 --> 14:19.380
I don't like, let's say bullet points and then you like train the model.

14:19.420 --> 14:22.020
All right by but within your entire chat.

14:22.020 --> 14:26.940
So it's like a small comparison of how the models are trained.

14:26.980 --> 14:30.940
Of course, it's more advanced, however, you know, um, just a small digression.

14:30.980 --> 14:34.380
Llms as agents, real world applications.

14:34.660 --> 14:41.180
And now we go to the very important step, actually, because we are using llms inside Nar den.

14:41.340 --> 14:44.340
So when we create agents we use them.

14:44.460 --> 14:48.310
And now actually, you know, you have a big knowledge about them.

14:48.350 --> 14:48.750
All right.

14:48.790 --> 14:49.830
After this video.

14:50.230 --> 14:55.630
And I can say it's very important because Moodle LMS aren't just chatbots with tools like Zapier.

14:55.670 --> 14:59.350
And then another automation platforms, LMS become agents.

14:59.390 --> 15:06.070
They can connect to APIs, control apps, automate tasks, answer customer emails, summarize documents,

15:06.070 --> 15:06.910
and more.

15:07.070 --> 15:13.630
With Retrieval argument generation Rack, they can reference live documents and compound knowledge.

15:13.950 --> 15:20.710
LMS can be fine tuned or plugged into workflows, making them work for businesses, creators, developers,

15:20.710 --> 15:21.470
and more.

15:21.830 --> 15:25.870
So again, we are using LMS models such as ChatGPT.

15:26.550 --> 15:29.430
Here we are using Gemini.

15:29.590 --> 15:33.590
So again I'm kind of in the videos we are using LM models.

15:33.870 --> 15:41.110
And again it's very important to comprehend that that um yeah that it works by the way, not the other

15:41.270 --> 15:46.670
because otherwise you create just words, you know, words agents, words, Wars projects.

15:46.670 --> 15:52.990
And it's very important to again comprehend that because we based we actually based a lot of different

15:52.990 --> 15:55.030
workflows, automations on that.

15:55.750 --> 15:59.110
And last but not least, we've got the summary.

15:59.110 --> 16:06.550
So the power of LMS, LMS are advanced toward predictors powered by billions of parameters trained on

16:06.590 --> 16:09.390
huge data running from just two files.

16:09.590 --> 16:15.270
Open source LMS are democratizing AI, letting anyone experiment, run and improve models.

16:15.390 --> 16:21.390
Fine tuning and automation make LMS real world agents not just chatbots, but powerful AI tools for

16:21.390 --> 16:22.310
any task.

16:22.830 --> 16:29.830
So after this video, we have the great knowledge of what Lmx is, how they work and how they are built

16:29.870 --> 16:34.110
in general, and how they are improved and a lot more actually.

16:34.110 --> 16:35.230
I hope you enjoyed it.

16:35.230 --> 16:41.150
And now we can use this knowledge to create, again, better automations workflows inside.

16:41.190 --> 16:45.310
Nadine, thank you for watching for now and I will see you in the next video.
