WEBVTT

00:00.440 --> 00:03.680
Today we are going to talk about large language models.

00:05.440 --> 00:12.960
Before we start cooking and diving into all these exciting topics like powerful AI agents, automations

00:12.960 --> 00:19.400
that can transform your work and business, let's go back to the basics and briefly discuss large language

00:19.400 --> 00:19.960
models.

00:21.760 --> 00:22.160
Why?

00:22.200 --> 00:28.720
Because understanding how they work will help you create more advanced and powerful AI agents.

00:28.760 --> 00:30.880
Automation and AI systems.

00:32.040 --> 00:36.640
This knowledge is going to be crucial as we move into more advanced topics.

00:38.360 --> 00:41.200
Alright, so let's start with a simple question.

00:41.640 --> 00:44.120
What actually is a large language model?

00:44.760 --> 00:51.440
If you have used the ChatGPT Cloud Llama two or any other modern AI assistant.

00:52.320 --> 00:54.240
But let's take a step back.

00:55.160 --> 00:57.240
What do these models actually do?

00:57.800 --> 01:05.250
As decor, they They're just really, really good at predicting the next word in a sentence.

01:06.570 --> 01:12.610
Imagine you are texting someone and your phone suggests words to complete your sentence.

01:12.850 --> 01:15.290
Something like hey, how are you?

01:16.330 --> 01:19.690
You expect the word doing or today to appear?

01:20.250 --> 01:27.690
That's because your phone has seen common phrases and makes predictions based on what it has learned.

01:28.370 --> 01:32.330
Llms work the same way, but on a much larger scale.

01:33.730 --> 01:40.810
They've had to read massive amounts of text like books, articles, code, conversations, and use that

01:40.810 --> 01:44.770
knowledge to make incredibly accurate predictions.

01:45.890 --> 01:47.610
They don't think like humans.

01:48.130 --> 01:50.650
They don't understand like like we do.

01:51.490 --> 01:56.170
They just use probabilities to determine the most likely next word.

01:56.730 --> 02:00.070
And when you scale this up with billions of examples?

02:00.430 --> 02:02.870
You get something that feels intelligent.

02:04.870 --> 02:12.630
Now, if we strip away all the marketing buzz and hype, what's an LLM actually made of?

02:14.150 --> 02:19.070
Well, surprisingly, LLM is just two files.

02:19.390 --> 02:26.150
The first one is a parameters file which contains all the learned knowledge.

02:27.070 --> 02:32.150
And the second file is a code file which runs the model to generate text.

02:33.430 --> 02:36.150
Now let's compare it to a music player.

02:36.430 --> 02:38.710
So imagine you have a playlist of songs.

02:39.430 --> 02:47.110
And that playlist is like the parameters file because it stores all the information, the lyrics, the

02:47.110 --> 02:47.870
melodies.

02:48.750 --> 02:51.830
And now you need something to actually play the music.

02:53.230 --> 03:00.880
That's where the code file comes in, because it's the music player that reads the playlist and makes

03:00.880 --> 03:04.240
sound for lmms.

03:04.280 --> 03:12.080
The parameters file contains billions of numbers, each representing what the model has learned.

03:12.960 --> 03:17.320
So it's a compressed version of all the text it has processed.

03:18.080 --> 03:25.480
The code file is a relatively small program that takes in a prompt, applies mathematical functions,

03:26.360 --> 03:31.320
and then spits out the the next word based on probability.

03:34.440 --> 03:38.240
Now how does an LLM actually learn?

03:38.560 --> 03:42.960
How do you go from raw text to a working AI model?

03:43.440 --> 03:48.520
So the answer is a lot of data and a ridiculous amount of computing power.

03:49.880 --> 03:53.160
So first companies collect massive datasets.

03:54.240 --> 04:02.490
Think of like books, Wikipedia, news articles, scientific papers, Reddit discussions, even public

04:02.490 --> 04:03.690
code repositories.

04:04.010 --> 04:08.370
And the goal is to feed the model as much human knowledge as possible.

04:10.970 --> 04:12.370
Next comes training.

04:12.570 --> 04:17.330
So this is where the magic happens because the model starts with random numbers.

04:18.410 --> 04:25.850
Yes, actually at the start it knows nothing and we force it to predict the next word in billions of

04:25.850 --> 04:26.770
sentences.

04:28.050 --> 04:34.850
When it gets a word wrong, we update the parameters using a technique called gradient descent.

04:35.290 --> 04:37.490
And over time it gets better.

04:37.970 --> 04:47.090
So this process runs for weeks on thousands of GPUs, which are the super fast processors used for AI.

04:47.130 --> 04:55.140
Training is kind of like taking all the knowledge of the internet and squishing it down into a compressed

04:55.140 --> 04:57.660
format that fits in a single file.

05:00.020 --> 05:03.980
Okay, now let's go deeper into how these models actually generate text.

05:05.100 --> 05:07.660
So it all comes down to one simple idea.

05:08.300 --> 05:09.820
Predicting the next word.

05:10.860 --> 05:12.300
So let's take an example.

05:13.540 --> 05:15.780
The cat sat on a chair.

05:16.780 --> 05:19.180
And what word do you expect to come next.

05:19.460 --> 05:21.980
Most people would say mat.

05:22.660 --> 05:23.140
But why?

05:23.180 --> 05:26.860
Because you've seen the phrase before and it's very common.

05:27.940 --> 05:30.020
Neelam does the same thing.

05:30.340 --> 05:33.020
It looks at the words before the blank.

05:35.460 --> 05:39.340
And assigns probabilities to possible next words.

05:40.340 --> 05:43.740
So the floor is about 10% probability.

05:43.940 --> 05:47.220
So for 5% and 31%.

05:48.940 --> 05:56.720
So the model picks the most probable one and moves on to predict the next word and the next and the

05:56.720 --> 05:57.200
next.

05:57.960 --> 06:05.280
So this simple mechanism, repeated billions of times, leads to intelligent sounding conversations.

06:06.000 --> 06:09.480
Now we need to differentiate two types of models.

06:09.600 --> 06:11.600
Open source models and closed models.

06:12.600 --> 06:15.720
Because not all llms are the same.

06:16.360 --> 06:19.480
Some are open source and some are closed source.

06:20.160 --> 06:28.040
So open source models like Deepsea Llama two, Mistral Falcon are free to download and modify so you

06:28.040 --> 06:29.920
can run them on your own computer.

06:30.520 --> 06:34.080
Tweak them and fine tune them for specific tasks.

06:36.520 --> 06:45.560
On the other hand, closed source models like GPT four, Gemini Cloud are owned by companies like OpenAI

06:45.920 --> 06:46.640
and Google.

06:47.720 --> 06:54.010
You can use them, but don't get access to the architecture and how they work internally.

06:54.330 --> 06:57.090
So open source models give you more control.

06:58.130 --> 07:02.730
Now, raw llms are actually not that smart.

07:03.090 --> 07:08.290
They can generate text, but they don't naturally follow instructions well.

07:09.050 --> 07:15.210
So to turn them into useful assistants like ChatGPT, they go through a process called fine tuning.

07:16.050 --> 07:26.090
And fine tuning basically means taking a trained LLM and giving it specific examples of how it should

07:26.090 --> 07:27.610
respond to questions.

07:27.770 --> 07:36.930
Humans write sample conversations, and the model is adjusted so that it learns to mimic helpful, structured

07:37.410 --> 07:38.250
responses.

07:39.450 --> 07:41.250
So this is how ChatGPT knows.

07:41.450 --> 07:50.780
To answer questions, politely summarize articles, and explain concepts clearly because it was trained

07:50.780 --> 07:54.340
to do so after the initial training phase.

07:57.820 --> 08:05.660
Now, fine tuning takes a pre-trained LLM and specializes it for a specific task or use case.

08:05.940 --> 08:13.620
So this is done by providing the model with a curated data set, which is basically a collection of

08:14.300 --> 08:22.020
high quality human written examples that teach the model how to respond in a specific way.

08:23.820 --> 08:31.180
So fine tuned LLM that performs better in a targeted domain, whether that's answering customer service

08:31.180 --> 08:34.140
questions or writing code.

08:39.180 --> 08:47.980
Now, every few months, a new, more powerful LLM comes out because improving AI is as simple as adding

08:48.350 --> 08:56.350
more training data and more computing power so each generation of models gets bigger, faster and smarter.

08:56.670 --> 09:05.230
Just like upgrading a car engine, newer models perform better than the older ones, so the cycle will

09:05.230 --> 09:07.750
continue as AI technology evolves.

09:15.430 --> 09:17.550
So AI is no longer just a chatbot.

09:18.230 --> 09:22.990
It's evolving into an operating system for knowledge and automation.

09:23.430 --> 09:30.870
Right now, we are already seeing AI models that don't just generate text, but can search the web,

09:30.910 --> 09:36.510
run code, interact with APIs, and even control applications.

09:37.150 --> 09:44.870
So these functionalities mean AI is moving beyond passive response generation and becoming an active

09:46.360 --> 09:48.920
Kind of participant in workflows.

09:50.760 --> 09:58.440
And this is where things get really interesting, because if I can process text, images, audio and

09:58.440 --> 10:05.680
interact with external tools like systems and apps, what are we actually talking about?

10:07.320 --> 10:08.480
Are I agents?

10:11.280 --> 10:13.240
Are agents are the next step.

10:14.200 --> 10:21.400
These are intelligent systems that don't just generate responses but take actions on their own.

10:21.920 --> 10:28.240
They can schedule tasks, automate workflows, analyze real time data, and even make decisions based

10:28.240 --> 10:29.240
on context.

10:30.160 --> 10:34.200
So now let's shift gears and dive into this next evolution.

10:36.240 --> 10:37.120
Agent.

10:37.960 --> 10:43.120
So we'll go through how they work and how they are shaping the future of automation.

10:47.300 --> 10:49.620
Let's quickly recap what we have covered today.

10:50.180 --> 10:56.060
So we start with LMS, which predicts words based on vast amounts of training data.

10:56.500 --> 11:03.100
Then we explored how fine tuning transfers these raw models into useful AI systems.

11:04.740 --> 11:06.820
But AI is evolving beyond just text.

11:06.820 --> 11:14.620
Generation is becoming multi-modal and understanding not just words, but images, audio, and real

11:14.620 --> 11:15.420
world data.

11:17.500 --> 11:24.740
And this shift is turning AI into an operating system for automation, where models don't just answer

11:24.740 --> 11:28.260
questions but interact with apps, tools, and the internet.

11:29.860 --> 11:38.460
And now we are entering the era of agents for systems that don't just generate responses, but actually

11:38.500 --> 11:40.420
take actions on their own.

11:42.180 --> 11:44.620
Thanks for your attention and see you in the next lesson.