WEBVTT

00:00.080 --> 00:01.640
Look in this journey.

00:01.640 --> 00:06.880
Some days are more challenging days and some days are more satisfying days.

00:07.160 --> 00:07.840
Today.

00:08.480 --> 00:10.040
Today is both.

00:10.200 --> 00:11.600
Today is a bit of both.

00:11.600 --> 00:13.000
I've got something challenging.

00:13.000 --> 00:14.800
I've got something satisfying.

00:15.040 --> 00:18.680
Today we are integrating 11 labs with Nan.

00:18.840 --> 00:21.520
We're putting them both together and it's going to be fun.

00:21.560 --> 00:21.760
Okay.

00:21.800 --> 00:27.960
So obviously we're going to be using Nan and 11 labs together in order to have voice agents that have

00:27.960 --> 00:29.440
workflows in Nan.

00:29.760 --> 00:35.600
Now there are actually two different ways to do this, two completely different integration patterns.

00:35.760 --> 00:37.000
And we're going to do both.

00:37.280 --> 00:39.280
Uh, one of them is definitely the better.

00:39.280 --> 00:43.240
And I will explain which and why, but we're still going to do both because it's good practice and it's

00:43.240 --> 00:45.760
going to help you really understand what's going on.

00:45.920 --> 00:52.040
So the first of them is where N is the boss, Nan is running the show.

00:52.040 --> 00:54.200
It is orchestrating everything.

00:54.200 --> 00:58.560
The end to end workflow is described in Nan with all the different steps.

00:58.560 --> 01:07.900
And one of those steps is calling out to the 11 labs API to turn text into speech, to text to speech.

01:07.940 --> 01:09.740
That's when we want the thing to talk to us.

01:09.740 --> 01:12.700
And we'll use 11 labs for generating the audio.

01:12.980 --> 01:19.140
And a different step, probably earlier in the process is start speech to text.

01:19.180 --> 01:24.060
This is when we take some audio and turn it into text, which would go into an LLM.

01:24.420 --> 01:29.300
And so these are steps on the workflow which is where Nam would call that to 11 labs.

01:29.500 --> 01:32.980
Hopefully that makes complete sense to you that that's how it can come together.

01:33.460 --> 01:34.780
So what's the other way.

01:34.820 --> 01:38.820
So the other way, as you probably guessed, is where 11 labs is running the show.

01:38.860 --> 01:39.860
It's the boss.

01:39.900 --> 01:44.420
The orchestration is happening driven by 11 labs first and foremost.

01:44.460 --> 01:49.740
You basically build a voice agent in 11 labs, much as we've already done.

01:49.980 --> 01:52.700
And you give that voice agent a tool.

01:52.820 --> 01:59.060
And we we didn't actually use tools, but I showed it to you, and you use that tool to make a call

01:59.060 --> 02:02.780
into nen to carry out some sub workflow.

02:02.980 --> 02:08.360
And basically you have your sort of business logic, your business workflow, be part of an A10.

02:08.400 --> 02:11.520
So, so in this way, the two of them kind of collaborating together.

02:11.520 --> 02:14.760
But 11 labs is the overall end to end process.

02:14.760 --> 02:16.600
It sort of it sort of owns the show.

02:16.760 --> 02:19.000
So what are the reasons for doing it each way?

02:19.040 --> 02:25.800
Well, the benefit of orchestration on the left is obviously that it is simple.

02:26.000 --> 02:33.560
It's clear you get all of your workflow in one place, you've got one canvas, and on it you see everything

02:33.560 --> 02:35.600
that's going on and that's great.

02:35.960 --> 02:38.360
It also gives you a more switchable API.

02:38.400 --> 02:44.800
You've got you've got a link to 11 labs, and you could swap that out for any other API that offers

02:44.960 --> 02:47.680
text to speech and speech to text.

02:47.760 --> 02:52.600
You just have to replace those, those two APIs and bam, you're onto a different vendor.

02:52.600 --> 02:54.200
And that's super convenient.

02:54.200 --> 02:57.880
So those are the benefits of an orchestration.

02:57.920 --> 03:02.080
And with that, you're probably wondering what could possibly be the benefits of 11 labs orchestration

03:02.080 --> 03:04.040
because it's clearly more complex.

03:04.040 --> 03:04.760
It's not really.

03:04.800 --> 03:06.450
It's kind of both orchestration.

03:06.490 --> 03:10.490
You're you're first and foremost starting in 11 labs, but then you've got some workflow and then it

03:10.490 --> 03:10.690
ends.

03:10.690 --> 03:13.050
So it just sounds more complicated all in all.

03:13.050 --> 03:14.890
So why would you pick this ever?

03:15.090 --> 03:16.450
Well, look, here's the thing.

03:16.730 --> 03:17.730
Latency.

03:17.770 --> 03:19.410
It's about latency.

03:19.730 --> 03:24.050
If you're using Nand to orchestrate, then you first have to collect the audio.

03:24.170 --> 03:28.530
And then you have to call out to get that audio turned into text.

03:28.690 --> 03:32.970
If you're using 11 labs, it is a voice agent platform.

03:32.970 --> 03:36.970
And that means it's all built to be real time from the get go.

03:37.170 --> 03:40.890
As you're speaking, it's automatically converting that into text.

03:40.890 --> 03:43.210
That's not it's not a linear process.

03:43.210 --> 03:47.170
And it's designed to be great at this voice agent interaction.

03:47.170 --> 03:48.330
That's what it does.

03:48.490 --> 03:53.130
And from that perspective, it's great to be using it as much as we can.

03:53.690 --> 03:58.370
And in many ways, using them together like this gives us the ability to have the best of both.

03:58.370 --> 04:04.450
We can use all of the features and functions that you get in 11 labs when it comes to voices, and then

04:04.450 --> 04:07.070
also take advantage of ten for the orchestration.

04:07.070 --> 04:11.790
So yes, it's more complex, but it does mean you can use both platforms to the max.

04:11.790 --> 04:17.670
And as part of that, it's worth mentioning that 11 labs has some advanced functions, like like it

04:17.670 --> 04:25.190
has features to to for example, use um, a telephone like have it connect to to a phone and be able

04:25.190 --> 04:27.470
to speak on the phone to one of your voice agents.

04:27.470 --> 04:29.230
And this way you get that.

04:29.230 --> 04:34.270
Otherwise you're you're using n810 to collect the audio, and you don't get the advantage of all of

04:34.270 --> 04:36.750
these pro features in 11 labs.

04:36.750 --> 04:40.350
So there's some pretty big benefits on the right hand side as well.

04:40.350 --> 04:43.430
So which do you think I'm going to recommend as being the right way?

04:43.910 --> 04:46.510
It's definitely the right hand side.

04:46.710 --> 04:49.110
I you know, I like to say simple is always better.

04:49.110 --> 04:50.190
That's one of my mantras.

04:50.190 --> 04:54.590
I'm always trying to push you to be to take things, make things simpler, not more complex.

04:54.590 --> 05:00.630
But this is a clear case where you get you get a lot for the complexity and it's worth it.

05:00.790 --> 05:06.590
Latency is everything when it comes to voice agents, having to wait is agonizing if you're talking

05:06.630 --> 05:07.910
to to an agent.

05:07.910 --> 05:12.130
And so, yes, you need the voice agent from 11 labs.

05:12.130 --> 05:14.490
You need to use the right hand pattern.

05:14.530 --> 05:17.410
We're going to do both so we can experiment with both.

05:17.450 --> 05:21.690
Often, often people start with the left hand pattern because it's the more obvious.

05:21.690 --> 05:27.210
It's the sort of go to uh, but very quickly we'll discover the difference between them and why it is

05:27.210 --> 05:29.250
that the right one is much more compelling.

05:29.410 --> 05:30.930
But but as I say, we'll try both.

05:30.970 --> 05:32.170
Okay, now, don't hate me.

05:32.170 --> 05:37.370
But before we go and build all of this and we've got some juicy building to do today, but before we

05:37.410 --> 05:44.730
do need to do a little bit of refresher, a couple of things, API stuff, but, uh, it's good stuff.

05:44.730 --> 05:45.650
It's good to get through this.

05:45.650 --> 05:50.330
So first of all, just to quickly remind you about this thing webhook that we encountered a couple of

05:50.330 --> 05:56.410
days ago, this is this idea that you could have a node and it can be calling out to an endpoint, making

05:56.450 --> 05:59.770
like a web call to some URL, some web address.

05:59.890 --> 06:04.170
And that's that's one way you can think about making an API request to get information.

06:04.530 --> 06:06.690
And and this is it's like request response.

06:06.690 --> 06:09.330
You're hitting an HTTP route as it's called.

06:09.370 --> 06:11.030
Like a, like a like a web address.

06:11.510 --> 06:16.950
Uh, and then there's this other idea that someone else could call you.

06:16.990 --> 06:22.750
You could make a web endpoint available and say, hey, if someone if someone calls this web endpoint,

06:22.950 --> 06:27.790
which is like saying, if someone asks me if I can give them a web page at this address, I'm going

06:27.830 --> 06:30.870
to treat that like that's them telling me an event.

06:30.870 --> 06:36.430
They're telling me something's happened so I can say, hey, if you if you hit me on this web link,

06:36.470 --> 06:38.670
I'm going to know that you want to give me information.

06:38.670 --> 06:39.990
You want to tell me something?

06:40.030 --> 06:41.150
You want to tell me that?

06:41.150 --> 06:42.270
That a message has arrived.

06:42.270 --> 06:45.030
An email has arrived, a slack message has arrived.

06:45.030 --> 06:48.790
And when you do it that way, it's described as as having a webhook.

06:48.990 --> 06:54.150
And the webhook is being called by the external system, like slack in this case.

06:54.430 --> 06:57.990
So that was the the idea behind a webhook that we met before.

06:58.030 --> 06:58.350
Okay.

06:58.390 --> 07:01.070
And now now I'm gonna before with next.

07:01.070 --> 07:02.550
After this we're going to go and build stuff.

07:02.550 --> 07:09.390
But I want to give you some terminology, some, some technical stuff, which I guess what the usual

07:09.390 --> 07:12.850
story for some of you, this is like yeah, yeah, yeah, come on I know all this, but for some of

07:12.850 --> 07:14.970
you, this is going to be hairy.

07:15.130 --> 07:18.730
And I want to just say you don't need to remember anything I'm about to say.

07:18.730 --> 07:21.850
If it doesn't mean anything to you, just just just listen.

07:21.890 --> 07:23.210
Like, get the gist.

07:23.210 --> 07:27.330
I'm going to repeat this one a few times, you know, so you let it sink in over time.

07:27.450 --> 07:32.090
But just to get you more comfortable with the terminology, let's dig in.

07:32.090 --> 07:35.210
So I've used the expression calling an API a fair bit.

07:35.210 --> 07:40.650
And I'm talking about calling a web API the most standard kind of APIs.

07:40.770 --> 07:45.890
And yeah, the way it's achieved is you're making a web request, you're using this, this protocol,

07:45.890 --> 07:50.770
this, this standard called HTTP, which is basically the way that web pages are exchanged.

07:50.770 --> 07:56.330
So you're using this HTTP standard to make a request over the internet to a web address.

07:56.490 --> 07:58.970
And that's what we're calling calling an API.

07:59.210 --> 08:05.770
And the URL that you use, the web address that you're going to request or hit is called an endpoint.

08:05.770 --> 08:07.370
As you know, you've heard this.

08:07.370 --> 08:12.410
And so the language that you might you might hear if you're saying it properly is I'm calling an API

08:12.530 --> 08:19.820
by making an HTTP request to an endpoint, calling an API by making an HTTP request to an endpoint.

08:19.820 --> 08:21.540
And that should just connect with you.

08:21.580 --> 08:24.020
You should be like, okay, I hear that language.

08:24.020 --> 08:25.140
I know what it means.

08:25.500 --> 08:31.540
If you're on the receiving end of this, if you want someone to call a URL to tell you something, to

08:31.620 --> 08:35.900
give you some information, like an event has happened, I need to trigger something.

08:36.100 --> 08:39.620
Then that endpoint you would describe as a webhook, that's that's what it means.

08:39.620 --> 08:45.500
So you might say, hey, I've set up a webhook make you can make an HTTP request to that webhook to

08:45.540 --> 08:51.860
tell me about X like you've, you've received an email or something like that or the other way I could

08:51.860 --> 08:57.260
say, hey, can you please tell me your webhook and I will call your webhook to to inform you that an

08:57.260 --> 08:59.380
event has happened that you need to know about.

08:59.380 --> 09:04.740
So that's the kind of language you might use the terminology to talk about webhooks.

09:04.740 --> 09:11.660
And then just to give you one more detail, there are in fact different types of these HTTP calls,

09:11.780 --> 09:13.020
different flavors of them.

09:13.020 --> 09:14.760
It's actually known as the method.

09:14.760 --> 09:22.040
Different methods of an HTTP call, sometimes called the verb, are the most common are get and post.

09:22.200 --> 09:27.280
So you sometimes hear people say an http get and an HTTP post.

09:27.320 --> 09:33.040
Two different types of HTTP request and usually a get.

09:33.040 --> 09:36.800
An HTTP get is used when you're trying to retrieve information.

09:36.800 --> 09:42.000
You want to find something you want to get information about it, you're doing an HTTP get, and the

09:42.000 --> 09:48.160
information will come back in the form of JSON as the response to your HTTP get.

09:48.640 --> 09:55.960
And an HTTP post is usually used to send information to the third party, and you send that information

09:55.960 --> 09:57.840
in the form of JSON.

09:58.000 --> 10:01.680
So that is an http get and an http post.

10:01.680 --> 10:06.120
And I say this because we will be selecting get and post in a couple of places.

10:06.120 --> 10:08.840
So this is your first time of hearing about it.

10:08.840 --> 10:10.720
Or maybe you know this stuff back to front.

10:10.760 --> 10:13.560
Anyways that that is the terminology.

10:13.600 --> 10:14.800
We're done with it.

10:14.840 --> 10:16.880
It's time to do some building.
