WEBVTT

00:00.160 --> 00:00.920
Okay.

00:01.360 --> 00:06.360
And now we go back to our workflow and we'll give this another try.

00:06.400 --> 00:11.640
We'll we'll refresh this and we'll say hi there and we'll see what happens.

00:12.200 --> 00:12.800
Off it goes.

00:12.800 --> 00:13.680
It's gone to Gemini.

00:13.680 --> 00:14.680
It's converted to speech.

00:14.720 --> 00:16.680
It all workflow ran successfully.

00:16.680 --> 00:18.280
It didn't say anything to me.

00:18.520 --> 00:21.040
Maybe you're expecting that, but it couldn't do that.

00:21.040 --> 00:26.000
All we did was we called an API that generated a file which is an audio file.

00:26.000 --> 00:26.640
Here it is.

00:26.640 --> 00:28.880
We can now download it and listen to it.

00:28.880 --> 00:32.640
I'm going to press the download button and it's just downloaded something.

00:32.680 --> 00:35.400
And then we'll we'll go and listen okay.

00:35.440 --> 00:39.400
So I've got like a I've just launched that that thing I've got this little player here.

00:39.400 --> 00:40.760
Let's see what we got.

00:40.960 --> 00:41.440
Hello.

00:41.480 --> 00:42.560
How can I help you today.

00:43.840 --> 00:45.080
Wow okay.

00:45.320 --> 00:46.920
She's very chirpy today.

00:47.160 --> 00:47.600
Hello.

00:47.640 --> 00:48.680
How can I help you today?

00:49.360 --> 00:54.480
That, I guess, is Miss Walker, uh, speaking, uh, with much enthusiasm to us.

00:54.480 --> 00:55.840
And that is how we got.

00:55.880 --> 00:57.320
We got through this API.

00:57.360 --> 01:00.600
Let's just try changing this to another person.

01:00.600 --> 01:01.720
Let's try.

01:01.760 --> 01:05.120
Jason, I think we need someone to counteract that energy level.

01:05.160 --> 01:07.520
Someone calm, meditative and soothing.

01:07.560 --> 01:11.360
Again, this is going to be making an API request to 11 labs.

01:11.480 --> 01:12.440
Back we go.

01:12.600 --> 01:16.560
We will reset the chat session and we will say hi there.

01:17.760 --> 01:19.000
Okay, here we go.

01:19.040 --> 01:21.800
It goes across, it goes to the agent and it's converted.

01:21.800 --> 01:22.920
That was all super quick.

01:23.160 --> 01:24.840
We got back a bunch of stuff.

01:24.960 --> 01:26.880
Now we come back into this node.

01:27.400 --> 01:28.320
Here it is.

01:28.360 --> 01:36.240
We download this file to and then I'm going to try right clicking on this.

01:36.920 --> 01:38.360
Uh I'm going to show it in finder.

01:38.360 --> 01:42.280
And then we will I'll bring up the player as before and we'll listen to this okay.

01:42.280 --> 01:43.200
Here we go.

01:43.240 --> 01:44.160
Let's listen.

01:44.560 --> 01:44.920
Hello.

01:44.960 --> 01:46.160
How can I help you today?

01:46.920 --> 01:48.880
Definitely a more soothing Jason.

01:48.880 --> 01:49.560
Thank you.

01:49.840 --> 01:54.520
And so that gives us a good sense of how it works to generate audio.

01:54.560 --> 01:56.200
Doesn't feel very satisfying yet.

01:56.200 --> 01:57.560
We've got a little bit more to go.

01:57.600 --> 02:01.880
Okay, so next up we're going to go back to our workflow.

02:01.880 --> 02:03.520
And we're going to do some new things.

02:03.520 --> 02:06.200
We're first of all going to get rid of this guy.

02:06.320 --> 02:08.600
No more coming in that way.

02:08.640 --> 02:12.200
Instead, we're going to come in a completely different way.

02:12.360 --> 02:15.480
We are going to add a web hook.

02:15.840 --> 02:16.880
So I'm going to go.

02:16.920 --> 02:24.000
Plus I'm going to say on webhook call, this is saying I want to be triggered by a webhook.

02:24.200 --> 02:29.440
Now we have already experienced this a few times because we've used things like Slack and Telegram as

02:29.440 --> 02:29.720
well.

02:29.760 --> 02:37.360
Both of them allowed us to set up a webhook using KNN nodes specifically designed for slack and for

02:37.360 --> 02:38.080
telegram.

02:38.080 --> 02:41.520
In this case, we're using the generic one, the one I mentioned that's generic.

02:41.560 --> 02:48.520
This is the generic on webhook call node that allows us to configure any webhook, anyone that hits

02:48.560 --> 02:55.480
this endpoint right here for test or this endpoint for production will trigger this particular webhook.

02:55.480 --> 03:01.160
So it's going to sit there listening on this, and anyone that hits it will trigger this once we press

03:01.160 --> 03:02.440
the listen for event button.

03:02.720 --> 03:04.520
And that's that's how it works.

03:04.520 --> 03:05.280
It's that simple.

03:05.320 --> 03:08.320
There's no more configuration than that which which I just love.

03:08.320 --> 03:09.200
It's so easy.

03:09.520 --> 03:14.630
So So when something comes in, it's going to come in and hit this webhook.

03:14.670 --> 03:15.390
All right.

03:15.430 --> 03:16.590
So what are we going to do with this?

03:16.630 --> 03:24.070
Well, if I, if I go back back to the canvas again, um, let's see where does it put that that that

03:24.070 --> 03:25.030
webhook over here.

03:25.030 --> 03:27.510
I don't know why it's over there, but we're going to move it right here.

03:27.510 --> 03:29.390
So in comes that webhook.

03:29.790 --> 03:30.430
All right.

03:30.430 --> 03:35.630
And you can see that by the way, it has get uh, because uh, you can just look at it again.

03:35.670 --> 03:40.550
There's this HTTP method that can be get or there are other things as well.

03:40.550 --> 03:44.830
But I mentioned get and post are the most common things.

03:44.830 --> 03:50.190
I'm actually going to change it from get to post because we're going to want this to be pushing information

03:50.190 --> 03:50.830
to us.

03:51.150 --> 03:51.710
Okay.

03:51.990 --> 03:54.270
So so that is uh, that is set up.

03:54.430 --> 04:00.390
And now what I'm going to do is whatever gets posted in is actually we're going to set it up so that

04:00.390 --> 04:02.350
we can post some audio.

04:02.550 --> 04:05.990
I'm going to want to send that to 11 labs again.

04:06.230 --> 04:08.750
So here we're going to add in 11 labs.

04:09.150 --> 04:16.670
This time it's something that's going to take audio and convert audio into text or it's going to transcribe.

04:16.990 --> 04:20.110
So we come up here and we're going to say we want text.

04:20.150 --> 04:25.670
We want a speech to text audio to take audio and turn it into text.

04:26.190 --> 04:27.750
So this is what we're going to do.

04:27.990 --> 04:32.470
And uh, this is going to connect with our 11 labs account.

04:32.470 --> 04:34.430
It defaults to that which is ideal.

04:34.750 --> 04:37.670
And now I'll go back to the canvas and we have this.

04:37.990 --> 04:43.390
And once this has gone to text I'm simply going to want to plug that into the agent.

04:43.590 --> 04:44.150
Wow.

04:44.510 --> 04:45.830
Could it be as simple as that.

04:45.910 --> 04:50.790
So it's basically the webhook we're going to we're going to have a URL that we'll be able to post audio

04:50.790 --> 04:51.270
to.

04:51.310 --> 04:54.590
When we do, it's going to call 11 labs to turn that into text.

04:54.590 --> 04:57.110
And it's going to send the text to the agent.

04:57.510 --> 05:02.390
Now you're, you're you're you're spotting that I'm doing something bad here.

05:02.430 --> 05:03.830
There's something I'm missing.

05:04.270 --> 05:06.590
And we will we will try this out.

05:06.630 --> 05:08.070
We'll see that fail.

05:08.070 --> 05:10.750
And then we will fix it so that we can get it just right.

05:10.750 --> 05:16.190
So the main thing we've done wrong is, you probably guessed, is that we haven't we haven't configured

05:16.190 --> 05:22.070
these different steps to be correctly looking at their inputs and figuring out how they should interpret

05:22.070 --> 05:24.030
their inputs to do the job they got to do.

05:24.150 --> 05:30.270
So in this case, with this node, we haven't told it how to read into the input data and pluck out

05:30.270 --> 05:31.190
what it needs.

05:31.190 --> 05:33.950
And similarly for the AI agent, remember this thing.

05:33.990 --> 05:36.550
It's expecting to be connected to a chat trigger.

05:36.550 --> 05:37.590
And it's not.

05:37.710 --> 05:39.510
And so it's going to get confused.

05:39.510 --> 05:41.270
It's not going to find chat input.

05:41.510 --> 05:47.630
The easiest way to fix this is to actually try it, see the data that comes in and then correct it.

05:47.670 --> 05:48.950
And that's what we're going to do.

05:49.030 --> 05:53.190
But first there's one more step to this flow that we have to add in.

05:53.310 --> 05:59.030
And I'm just going to show you that now the last step we have to add in is all about what happens when

05:59.030 --> 06:05.230
this webhook is called, what gets returned to the web page that called this webhook or the caller.

06:05.230 --> 06:07.750
Whoever the caller is, it is going to be a web page, but it needn't be.

06:07.790 --> 06:08.750
It could be something.

06:08.750 --> 06:13.830
Right now, the way we've got this set up is to respond immediately, which means when you call this

06:13.830 --> 06:16.950
webhook, it's going to respond with with like, okay.

06:17.270 --> 06:19.790
And it's meanwhile going to be kicking off this flow.

06:19.830 --> 06:23.470
Now, that's kind of useless because what we want to come back to the web page is going to be the audio.

06:23.510 --> 06:25.750
We're going to want it to say something back.

06:25.990 --> 06:28.630
So we don't want to respond immediately.

06:28.750 --> 06:35.190
You can there's a few different options here, but we're going to do respond using respond to webhook

06:35.190 --> 06:36.150
node.

06:36.630 --> 06:37.190
Okay.

06:37.390 --> 06:42.430
And what that means is that it's going to want us to add another node to the end of this.

06:42.590 --> 06:44.070
Let me just hide all of this for a second.

06:44.070 --> 06:45.070
Give ourselves some more room.

06:45.070 --> 06:47.510
We're going to want to add in a node right here.

06:47.510 --> 06:51.870
And this is going to be of type respond to webhook.

06:52.030 --> 06:52.910
Here it is.

06:53.150 --> 06:58.110
So this is something that can respond with whatever came back here.

06:58.390 --> 07:01.950
And we can respond with first incoming item.

07:01.950 --> 07:08.670
What we actually want to respond with is we want to respond with the, uh, with with the field data.

07:08.670 --> 07:13.870
You see, the way this comes in with data, we're going to want to respond with a binary file, which

07:13.870 --> 07:16.510
is the data that comes back here.

07:16.710 --> 07:18.510
So that's what we want it to do.

07:19.070 --> 07:19.710
Okay.

07:20.070 --> 07:22.260
Uh, potentially we might need to add some stuff here.

07:22.300 --> 07:24.060
We'll see if this this hangs together.

07:24.060 --> 07:27.660
We're gonna we're gonna fix this up once we've got the flow all in place.

07:27.940 --> 07:32.420
So to recap, if we come back here, let's do the tidy button to tidy this all up.

07:32.420 --> 07:33.500
But very nice.

07:33.500 --> 07:34.700
Here we see what's happening.

07:34.700 --> 07:38.020
We've set up a webhook that little lightning bolt represents.

07:38.020 --> 07:44.580
It's a trigger when someone hits the URL that we can collect from this from here, it will then turn.

07:44.580 --> 07:46.260
It will then collect some audio.

07:46.420 --> 07:50.500
Turn that into text by calling to 11 labs.

07:50.540 --> 07:52.180
Put that into our agent.

07:52.180 --> 07:58.580
It will then convert the output back into speech which produces an audio file, and it will use that

07:58.580 --> 08:05.300
as the thing that gets responded back to this webhook, to this URL that someone else has hit.

08:05.740 --> 08:06.660
You got that?

08:06.860 --> 08:10.540
Click around, make sure you understand what each of these different nodes does.

08:10.820 --> 08:14.300
And then we're actually going to to test this thing out.

08:14.300 --> 08:20.020
So on my desktop I've just got a file that is called voice HTML.

08:20.180 --> 08:23.460
And I will put that in the course resources so you can get it too.

08:23.620 --> 08:26.100
And it's a very simple web page.

08:26.260 --> 08:27.500
And here it is.

08:27.500 --> 08:34.300
It is one where you type in a URL and then you press Start recording to record in the web page in,

08:34.300 --> 08:35.860
in this case a Chrome browser.

08:36.020 --> 08:36.980
And then you press stop.

08:36.980 --> 08:42.700
And when you press send to N810, it will take this URL and it will post that audio.

08:42.900 --> 08:46.940
And then with what comes back you'll be able to play that.

08:47.140 --> 08:51.580
And it's simple HTML, it's just a raw HTML file.

08:51.580 --> 08:56.820
And I could tell you that I painstakingly wrote this HTML file, but it would be a complete lie.

08:57.180 --> 09:02.500
I simply asked in this case I usually use Claude code, but I just said to ChatGPT, can you make this

09:02.500 --> 09:03.380
HTML page?

09:03.460 --> 09:04.420
And it just did.

09:04.460 --> 09:08.740
It knew that and it even put nice stuff like tip, use the URL while you're building and then switch

09:08.740 --> 09:09.700
to the production URL.

09:09.740 --> 09:11.780
I didn't ask it to do that, but it's quite right.

09:11.780 --> 09:13.140
So good for it.

09:13.140 --> 09:15.940
And you could also write this from scratch if you wanted.

09:15.940 --> 09:16.940
Or just use mine.

09:16.940 --> 09:17.980
I'll make that available.

09:18.140 --> 09:19.060
Uh, okay.

09:19.100 --> 09:24.620
And so with that we're now going to go to and find this URL, put it in here and give it a whirl.

09:24.620 --> 09:27.740
And it's not going to work to start with because we're going to need to fix up the workflow.

09:27.740 --> 09:28.900
But let's at least try that.
