WEBVTT

00:00.120 --> 00:02.680
So here we are in the edit fields node.

00:02.680 --> 00:05.640
And and this is such a core node.

00:05.640 --> 00:11.320
This node used to be called the set uh node is this is where a data engineer lives.

00:11.320 --> 00:14.040
This is all about mapping data from one format to another.

00:14.080 --> 00:17.960
It's got nothing to do with AI, although of course working with data is all about AI.

00:18.120 --> 00:23.800
But, uh, but it's absolutely crucial for this kind of, of data engineering work.

00:24.000 --> 00:27.920
And you can see that there are some, some special, clever ways to do it with JSON.

00:27.920 --> 00:32.440
But we're going to do manual mapping, which I think is most most common.

00:32.440 --> 00:39.680
And I'm going to to explain that what we're going to want to do is create a data set which has two fields,

00:39.880 --> 00:45.120
a field called content, which is going to contain text, which is going to be best to go in our knowledge

00:45.120 --> 00:47.760
base language, which can be vectorized.

00:47.760 --> 00:52.160
And then another field called category which will say, what category of thing is this?

00:52.160 --> 00:56.440
And that's because that feels like that's useful metadata that we'd like to tag everything with.

00:56.440 --> 00:57.920
However it gets vectorized.

00:58.040 --> 01:02.320
Um, and that's just we're not we're probably not going to make huge use of that, but it's a great

01:02.320 --> 01:07.880
practice to have that kind of metadata, and it could be used to filter on the results in different

01:07.880 --> 01:08.320
ways.

01:08.640 --> 01:12.760
So add field I press and this allows me to to come up with with some fields.

01:12.760 --> 01:15.360
And I said we're going to have one field that's called content.

01:15.360 --> 01:18.880
And we're going to have another field that's called category.

01:18.880 --> 01:25.080
So that everything that comes out, every every record, every record in the inputs, every, every

01:25.080 --> 01:31.760
one of these items is going to end up with a content and a category being built on the right hand side.

01:32.000 --> 01:32.280
Okay.

01:32.280 --> 01:33.680
Let's do content first.

01:33.720 --> 01:34.160
Okay.

01:34.200 --> 01:39.560
So again we're writing here the what should go in a field called content which is going to come over

01:39.560 --> 01:40.000
here.

01:40.000 --> 01:41.840
And we can choose what type it is.

01:41.880 --> 01:48.720
Is it a string a number true or false, an array a list, an object or just some some something else,

01:48.760 --> 01:51.120
some something like a PDF document or something.

01:51.320 --> 01:53.760
And we are going to say string.

01:53.760 --> 01:58.240
And what's going to go in here is going to be the text associated with this information.

01:58.520 --> 02:00.160
And, and really this is up to us.

02:00.160 --> 02:05.750
We want to describe something that if this ends up going to an MLM, it will be it will be useful.

02:05.750 --> 02:10.870
So we might as well just try and turn this information here into something which will be handy.

02:10.870 --> 02:14.710
So we will we will say something like a product name colon.

02:14.870 --> 02:18.790
And now we want and sorry I want this to be an expression not not fixed.

02:19.030 --> 02:20.470
So product name.

02:20.590 --> 02:22.790
And then we will have the name of the product.

02:22.790 --> 02:24.470
So remember how we're going to do this.

02:24.510 --> 02:28.630
It's dollar JSON dot name.

02:28.950 --> 02:33.030
And that you can see in the result below that says product name colon.

02:33.030 --> 02:34.990
And then the the keyboard.

02:35.030 --> 02:38.910
All right let's put an empty line let's say category.

02:40.350 --> 02:42.830
And let's put in the category dollar Jason.

02:44.190 --> 02:49.190
And then category spelt with a capital C because that's how it is in the table that's coming in.

02:49.750 --> 02:51.030
And then we will.

02:51.070 --> 02:51.390
Why not.

02:51.430 --> 02:57.630
We'll just we'll just put everything basically we'll say this the SKU which is dollar Jason dot SKU

02:59.510 --> 03:01.670
and then we'll give it the price.

03:05.020 --> 03:08.660
Dollar Jason Price.

03:08.900 --> 03:10.100
Now let's see how that comes out.

03:10.140 --> 03:14.860
Do you see if you look, if you look down at the at the result, you'll see the prices come there just

03:14.860 --> 03:17.020
as it came in without a dollar sign.

03:17.020 --> 03:18.980
Because that's how it is in the table.

03:18.980 --> 03:25.180
And maybe it will be helpful for us to put a dollar sign there so that that's what goes to the model.

03:25.180 --> 03:30.180
And it knows in this content that that it's getting something that is that is a dollar value.

03:30.500 --> 03:31.700
Uh, okay.

03:31.860 --> 03:37.500
And then finally and most importantly description, uh, spell it right.

03:37.500 --> 03:38.700
Description.

03:38.860 --> 03:46.980
And this is where I'm going to put in the, um, the description of it, of course coming in from the

03:46.980 --> 03:47.660
input.

03:47.820 --> 03:48.780
There it goes.

03:49.060 --> 03:49.700
Okay.

03:49.820 --> 03:52.300
And so this is a field called content.

03:52.300 --> 03:57.100
And this is what, what what it evaluates to it comes to this bunch of text down here.

03:57.380 --> 04:01.380
Now to someone that's used to data engineering and mapping things carefully, this might seem a bit

04:01.380 --> 04:02.020
janky.

04:02.100 --> 04:07.300
We're just sort of describe some text here, and I came up with product name because I thought it sounded

04:07.300 --> 04:08.700
good for an LLM.

04:08.860 --> 04:14.900
And and this is really the sort of where data data engineering meets data science, where it becomes

04:14.940 --> 04:16.220
more of an AI concern.

04:16.220 --> 04:18.020
We want to try and come up with a string here.

04:18.020 --> 04:21.620
That's going to be most reasonable if that's sent to an LLM.

04:21.700 --> 04:26.380
So it gives it the most relevant context to answer a question about this product.

04:26.380 --> 04:29.340
And so, you know, there's no right or wrong way of doing it.

04:29.340 --> 04:30.660
You want to experiment.

04:30.700 --> 04:36.300
You want to try different ways that you can make this content be rich for the LLM and get that right.

04:36.300 --> 04:41.900
And the best way to find out is to experiment, trial and error until you get good outcomes.

04:41.900 --> 04:45.460
But this seems like a perfectly reasonable starting point to me.

04:45.500 --> 04:45.900
Okay.

04:45.940 --> 04:51.340
And now next up we've got this category field that I have a feeling is going to be useful.

04:51.340 --> 04:56.780
And we want that to be basically the contents of this field right here, which is also called category,

04:56.780 --> 04:59.500
but with a capital C, because that's how it is in the table.

04:59.780 --> 05:00.340
Okay.

05:00.540 --> 05:03.140
So uh, we should be able to do this quite easily.

05:03.140 --> 05:06.620
Now we click on expression and we could drag and drop from here.

05:06.620 --> 05:08.420
But now we're getting more familiar with this.

05:08.460 --> 05:12.660
We know it's JSON dot category with a capital C.

05:12.940 --> 05:13.660
There it is.

05:13.660 --> 05:16.620
Keyboard that seems to work perfectly.

05:16.820 --> 05:22.820
We've got ourselves a mapping we have mapped from this stuff here on the left to something new, which

05:22.820 --> 05:25.620
is going to have two fields content and category.

05:25.620 --> 05:30.980
Content is going to be some text that we feel is going to be useful for an LLM category is going to

05:30.980 --> 05:34.140
be whether it's a keyboard, whether it's a monitor or whatever it happens to be.

05:34.460 --> 05:39.180
All right, let's give this a try so we can simply press the execute step button right here.

05:39.180 --> 05:42.300
Just press that and look what happens on the right.

05:42.300 --> 05:45.740
It shows us what's the result of mapping this.

05:45.740 --> 05:47.940
Through this we get this.

05:47.980 --> 05:50.260
We get something that's perfectly civilized.

05:50.420 --> 05:53.580
Uh, product name, category, keyboard, price, description.

05:53.580 --> 05:56.700
That's what we'll be providing to the LLM as context.

05:56.700 --> 06:00.580
And we also have this useful extra piece of information category is keyboard.

06:00.900 --> 06:01.860
Excellent.

06:01.900 --> 06:04.060
We have just done some data mapping.

06:04.100 --> 06:07.530
And if I now go back to the prior screen and I press Execute Workflow.

06:07.570 --> 06:08.810
It does it for the whole lot.

06:08.850 --> 06:16.650
All 60 rows have just been mapped to the new data format, and that is the beginning of some data pipes.

06:16.690 --> 06:17.370
Okay.

06:17.410 --> 06:20.970
At this point I'd like you please to go to Super Bass.

06:21.250 --> 06:27.130
This is Super bass.com, and if you haven't already, please set up an account.

06:27.130 --> 06:29.410
You need to to create an organization.

06:29.410 --> 06:32.450
As I said yesterday my mine is called Donna Research.

06:32.490 --> 06:34.010
Feel free to call it whatever you want.

06:34.050 --> 06:35.490
Don't call it Donna research though.

06:36.930 --> 06:41.410
Uh, and then once you've set it up, you should find yourself pretty quickly at like a projects page.

06:41.410 --> 06:42.770
You won't have any projects yet.

06:42.890 --> 06:47.410
If not, then click on this projects in the left nav or click on the name of your organization and come

06:47.410 --> 06:47.770
here.

06:48.250 --> 06:52.690
You should be able to find your way to this screen, where there should be a new project button that

06:52.690 --> 06:58.330
you can press and come on in and let's call the project right as the name of the project.

06:58.370 --> 07:00.090
The compute size can be tiny.

07:00.090 --> 07:02.090
You should be able to be on the free plan.

07:02.090 --> 07:04.810
If you're on a paid plan, then this costs a cent an hour.

07:04.810 --> 07:06.530
And of course you can bring this down.

07:06.650 --> 07:09.730
Delete it after we're done with this activity.

07:10.050 --> 07:11.450
Um, and I will.

07:11.490 --> 07:14.490
Then I will come up with a database password region.

07:14.490 --> 07:15.730
You should pick the region closest to you.

07:15.770 --> 07:18.730
Mine would normally be in the Americas, but I am in the UK today.

07:19.170 --> 07:21.610
So Europe is, I guess, right for me right now.

07:21.850 --> 07:28.170
Uh, but I think I'm going to go Americas anyway because I probably want to be closest to where my An

07:28.210 --> 07:28.770
is running.

07:28.890 --> 07:33.250
Um, and, uh, once you're ready with that, we'll press create new project.

07:33.290 --> 07:33.730
Okay.

07:33.770 --> 07:35.850
So I've just put in a strong password there.

07:35.850 --> 07:40.530
I've kept a record of my password, but safely in a password manager, as should you.

07:40.730 --> 07:43.210
And now I'm pressing Create New Project.

07:43.210 --> 07:46.090
And it's telling me that that because I'm on a paid plan, it will increase my costs.

07:46.090 --> 07:46.370
But.

07:46.370 --> 07:51.610
But I think this should be something that will be free for you as part of the the trial free account.

07:51.810 --> 07:54.370
And it's been set up and here we go.

07:54.570 --> 07:55.930
Uh, it's called rag.

07:55.970 --> 07:56.890
It's getting set up.

07:56.890 --> 08:01.090
It will take a few minutes while it while it gets things together, including your API endpoints.

08:01.090 --> 08:03.450
This is terminology that you're so familiar with.

08:03.610 --> 08:08.610
And then we will be the proud owners of our first super base project.
