WEBVTT

00:00.880 --> 00:01.520
Hi everyone.

00:01.520 --> 00:02.400
Welcome back.

00:03.120 --> 00:10.400
Today we are diving into RAC which stands for Retrieval Augmented Generation and is a powerful technique

00:10.400 --> 00:13.360
that makes AI agents and automation way smarter.

00:14.240 --> 00:20.720
By the end of this lesson, you will know what the RAC is and why it's used so useful, how it retrieves

00:20.720 --> 00:26.120
and generates information dynamically, and how to build a RAC powered workflow in an A-10.

00:26.880 --> 00:28.120
So let's jump right in.

00:32.200 --> 00:35.160
First, what is retrieval augmented generation?

00:36.400 --> 00:37.560
Think of it like this.

00:38.240 --> 00:42.120
Large language models only use what they have been trained on.

00:42.560 --> 00:49.200
So if you ask them about something new, they might just guess or worse, hallucinate.

00:50.280 --> 00:54.920
So RAC fixes this by adding a retrieval step before generating a response.

00:56.240 --> 01:03.730
So retrieval means it fetches the most relevant up to date info from databases, documents, sheets,

01:04.330 --> 01:05.650
APIs, etc..

01:07.010 --> 01:13.810
Generation means the AI then uses that info to create a better, more accurate response.

01:14.570 --> 01:22.290
For example, imagine you ask a powered assistant about today's stock prices.

01:22.930 --> 01:32.570
A normal chatbot like ChatGPT might give you outdated info even if you are using ChatGPT search.

01:33.010 --> 01:35.610
It might still make mistakes.

01:36.770 --> 01:40.050
That's why it's better to use perplexity for searching the web.

01:44.050 --> 01:45.730
Now why is that important?

01:47.130 --> 01:51.170
So most llms have a fixed knowledge base.

01:52.210 --> 02:00.070
They don't know anything after the last training update, even if they have access to tools and the

02:00.070 --> 02:00.750
internet.

02:01.270 --> 02:04.790
They don't have access to your internal private documents, right?

02:05.470 --> 02:13.910
So if rack, we can use external sources of data instead of relying on outdated training data, you

02:13.910 --> 02:21.910
can pull in real time information like stock prices, news or customer support data.

02:23.230 --> 02:27.790
And rack makes AI more accurate by reducing hallucinations.

02:29.230 --> 02:29.670
All right.

02:29.670 --> 02:33.030
So now let's break down how rack works in simple terms.

02:33.390 --> 02:37.270
Imagine you ask your rack powered AI assistant a question.

02:37.870 --> 02:42.510
Let's say what what's the latest refund policy for my company?

02:44.310 --> 02:45.350
First the LLM.

02:46.510 --> 02:54.030
So your AI assistant brain analyzes your question and figures out what information it needs.

02:54.630 --> 03:02.560
Then instead of making an answer straight away, it searches through connected data sources, and this

03:02.560 --> 03:07.000
could be internal documentation, a database or a knowledge base.

03:09.160 --> 03:14.560
Then the system retrieves the most relevant information related to your question.

03:15.520 --> 03:23.240
And then the LLM combines that information with its own language understanding to generate a factual

03:23.280 --> 03:25.200
and relevant response.

03:26.800 --> 03:33.760
And finally, the AI gives you an answer that's backed by real data and in some cases even includes

03:33.760 --> 03:37.280
references to the sources it used.

03:38.440 --> 03:45.880
For example, instead of saying I'm not sure, an AI assistant followed by RAC for customer support

03:46.160 --> 03:54.040
retrieves the exact refund policy from company documents and provides an accurate response to the customer.

03:54.440 --> 03:57.960
It will be much easier to understand by looking at this diagram.

03:57.960 --> 04:10.250
Um, so let's say a customer asks chatbot, what's the warranty policy for my new laptop and assistant,

04:10.250 --> 04:14.130
powered by rack handles it step by step.

04:18.850 --> 04:22.370
So the customer types the question into the chat.

04:23.530 --> 04:29.490
Then the chatbot processes the request and determines it needs warranty details.

04:29.690 --> 04:36.930
So instead of relying on old pre-trained responses, it performs a real time search.

04:37.770 --> 04:48.650
So it retrieves the latest warranty policy from the company's internal database and if needed and if

04:48.650 --> 04:53.450
needed, it checks the company's online help center for updates.

04:53.930 --> 05:00.180
It can even search the web for manufacturer Enhancements or regulation changes.

05:00.620 --> 05:04.980
So the assistant gathers the most relevant data.

05:05.500 --> 05:09.740
And then generates an accurate response based on the latest information.

05:10.220 --> 05:16.980
So the customer instantly gets the most up to date answer without needing to call support or search

05:17.020 --> 05:17.740
manually.

05:19.020 --> 05:26.100
It's really helpful because if the company updated its return policy yesterday, the AI assistant powered

05:26.100 --> 05:30.300
by RAC will instantly reflect the changes.

05:31.140 --> 05:33.980
Now, how does RAC work in your company?

05:34.540 --> 05:42.020
For example, imagine you build an AI assistant for your company's customer support and sales team.

05:42.500 --> 05:43.820
And is powered by RAC.

05:45.460 --> 05:50.100
And in this scenario, a customer asks what are your refund policies?

05:50.540 --> 05:53.900
Instead of answering I don't know, like ChatGPT might.

05:53.940 --> 06:03.640
If it lacks the data and it's not fine tuned, or it can hallucinate and give you a random response.

06:06.360 --> 06:14.760
Instead of that, your AI assistant powered by RAC retrieves the latest refund policy from your company's

06:14.800 --> 06:17.280
knowledge base or internal documentation.

06:17.600 --> 06:27.480
So it summarizes the key points like refund eligibility, like required steps, etc. and the customer

06:27.520 --> 06:30.320
gets precise and up to date answer.

06:31.000 --> 06:33.240
And it really reduces confusion.

06:34.200 --> 06:40.040
Now let's jump into N810 and explore the components of RAC Powered assistant.

06:41.720 --> 06:46.240
You can build this RAC assistant relatively easily in N810.

06:46.440 --> 06:48.880
And I will show you exactly how in the next lesson.

06:49.880 --> 06:53.200
It might look complex, but trust me, it's super simple.

06:53.920 --> 06:58.490
And this particular example is very practical and very useful.

07:00.210 --> 07:04.250
So here is how it works step by step.

07:06.170 --> 07:09.170
Assistant which is our tools agent.

07:09.370 --> 07:10.970
This is the core of the workflow.

07:11.250 --> 07:20.530
So inside it has a system prompt which is a set of instructions that tell our assistant what to do.

07:21.010 --> 07:24.290
It decides how to process a user queries.

07:24.290 --> 07:28.570
So basically what we expect from this assistant to do.

07:29.690 --> 07:31.770
Next we have chat model.

07:31.770 --> 07:33.210
So OpenAI chat model.

07:33.490 --> 07:36.490
This node is the brain of our assistant.

07:37.250 --> 07:39.170
Inside this node we can select.

07:41.210 --> 07:42.730
Between different llms.

07:43.090 --> 07:48.050
And this is the AI that actually generates responses.

07:49.770 --> 07:54.650
It doesn't just guess it waits until it has the right information before answering.

07:55.690 --> 08:00.180
Not based on the ChatGPT general knowledge.

08:03.020 --> 08:05.100
Now Postgres chat memory.

08:05.100 --> 08:09.300
So this acts as our assistance persistent memory.

08:09.300 --> 08:15.100
So it helps our assistant remember past interactions with users, which makes conversations feel more

08:15.100 --> 08:18.660
natural because it knows the context of the conversation.

08:20.420 --> 08:22.900
Now we have Supabase vector store.

08:22.940 --> 08:25.940
This is where all the company knowledge is stored.

08:26.220 --> 08:29.660
Basically, this is part of a vector database.

08:30.500 --> 08:32.740
We'll dive into vector databases later.

08:33.300 --> 08:43.940
For now, just know that it acts like an AI powered search engine and helps our LLM retrieve the most

08:43.940 --> 08:47.940
relevant information before the chatbot responds.

08:50.020 --> 08:51.140
Now embeddings.

08:51.940 --> 09:00.480
This node is basically the setup which converts text into a format that the AI model.

09:00.720 --> 09:04.040
So large language model can search through quickly.

09:04.360 --> 09:08.200
It turns human language into numerical representations.

09:08.720 --> 09:16.800
So when a user asks a question llms search through embeddings to find the most relevant data in the

09:16.800 --> 09:17.600
vector store.

09:18.760 --> 09:21.920
So we'll explore embeddings in more, more detail later.

09:22.240 --> 09:24.760
But for now, just know that they help.

09:24.800 --> 09:28.960
I understand and compare text efficiently.

09:29.880 --> 09:34.440
Now let's talk about how react keeps AI models.

09:34.480 --> 09:36.920
So large language models are useful.

09:37.840 --> 09:45.800
So most AI assistants either guess answers or rely on old pre-trained data.

09:46.760 --> 09:50.040
That's a problem if you need real time accurate info.

09:50.480 --> 09:55.120
So react solves this by pulling data from the right place at the right time.

09:57.050 --> 10:00.290
Basically it search internal documents.

10:00.610 --> 10:04.770
Let's say an employee asks how many sick days do I get?

10:05.050 --> 10:14.570
So instead of just making some random response, our assistant checks the official HR handbook and delivers

10:14.570 --> 10:16.850
the exact policy details.

10:19.170 --> 10:22.210
Our RAC powered assistant can retrieve web data.

10:22.210 --> 10:28.770
So, for example, if a customer asks what's the latest price for this laptop?

10:29.890 --> 10:31.450
Our assistant doesn't guess.

10:31.450 --> 10:36.770
It pulls live pricing from the store's website and provides the real answer.

10:38.010 --> 10:44.370
Now, it also adapts to fast changing industries like a finance assistant.

10:45.650 --> 10:51.650
Let's say it gets asked what's the new tax rule for freelancers?

10:51.970 --> 11:00.620
So instead of giving outdated info, It retrieves the latest IRS update and provides an accurate response.

11:01.220 --> 11:08.660
So in most cases, instead of guessing or saying I don't know, it fetches the answer from the right

11:08.700 --> 11:10.940
source, which is really useful.

11:11.780 --> 11:17.980
So now these are the most popular use cases for using crack.

11:19.060 --> 11:21.300
Where does crack actually make a difference?

11:22.540 --> 11:30.180
First of all, in customer support, because instead of a chatbot giving a generic check our website

11:30.180 --> 11:33.780
response, it fetches real answers from past tickets.

11:34.380 --> 11:35.740
Frequently asked questions.

11:36.580 --> 11:37.900
Past conversations.

11:38.700 --> 11:40.300
Customer history, etc..

11:41.620 --> 11:45.220
It is also very useful in chatbots and virtual assistants.

11:46.020 --> 11:50.500
Think of a virtual assistant that doesn't just repeat old info.

11:50.540 --> 11:57.990
It grabs live updates on news, weather or company policies before responding.

12:01.070 --> 12:03.350
It's often used in education and research.

12:04.510 --> 12:12.670
So for students looking for a concept or a lawyer searching for the latest case law, react can retrieve

12:12.670 --> 12:15.590
the most relevant documents very quickly.

12:15.950 --> 12:22.670
It is also very useful in e-commerce, so it can check livestock availability and even suggest personalized

12:22.670 --> 12:25.270
recommendations based on past orders.

12:29.910 --> 12:35.150
It's well known in finance and banking, so customers don't need to dig through statements.

12:35.190 --> 12:41.990
I can instantly retrieve account balances, low options or interest rates in real time.

12:43.470 --> 12:50.150
Next, healthcare and insurance so it can pull up your actual policy info before answering.

12:51.670 --> 12:53.890
And lastly, legal and compliance.

12:54.810 --> 13:02.530
So instead of providing outdated info, it can grab the latest regulations and contract updates before

13:02.890 --> 13:04.410
giving an advice.

13:04.930 --> 13:08.090
All right, so let's wrap up what we have covered today.

13:09.050 --> 13:18.410
RAC retrieves real data before generating answers, which makes Nlm's responses way more accurate.

13:18.890 --> 13:20.490
It can pull from internal.

13:21.210 --> 13:29.490
It can pull data from internal documents, websites and databases, keeping responses more accurate

13:29.730 --> 13:30.610
and up to date.

13:32.610 --> 13:40.890
Assistance powered by RAC can give you real, useful information, and you can build RAC powered assistance

13:40.890 --> 13:43.690
yourself using N810 relatively easily.

13:45.130 --> 13:47.770
The next lesson we'll dive into N810.

13:48.890 --> 13:51.090
So stick around and let's start cooking.

13:52.090 --> 13:53.010
See you in the next one!
