WEBVTT

00:00.210 --> 00:01.650
Eden Marco: Hey there, Eden here.

00:01.650 --> 00:02.700
And in this video,

00:02.700 --> 00:05.550
I want to address a commonly asked question

00:05.550 --> 00:08.460
about using LLMs in production.

00:08.460 --> 00:12.420
So, this is the debate whether to use an open source LLM,

00:12.420 --> 00:15.870
like Deepseek, like Llama 3.2,

00:15.870 --> 00:18.870
or to use managed large language models

00:18.870 --> 00:22.860
like GPT-4o mini by a OpenAI,

00:22.860 --> 00:25.230
or using Sonnet by Anthropic,

00:25.230 --> 00:27.090
or using Google Gemini.

00:27.090 --> 00:29.550
And my point of view is going to be the point of view

00:29.550 --> 00:31.800
of enterprise organizations

00:31.800 --> 00:33.900
and what do I suggest them doing?

00:33.900 --> 00:36.990
Of course, there is no one size fits all,

00:36.990 --> 00:38.430
and I think every use case

00:38.430 --> 00:40.590
should be considered independently.

00:40.590 --> 00:43.860
And I'm going to give you my general opinion

00:43.860 --> 00:46.440
and I will try to touch on as many aspects

00:46.440 --> 00:48.813
as I can when it comes to this debate.

00:50.580 --> 00:53.430
Just a very important disclaimer,

00:53.430 --> 00:55.500
and this disclaimer is super important.

00:55.500 --> 00:56.610
I'm not a lawyer.

00:56.610 --> 00:58.920
This is not a legal advice,

00:58.920 --> 01:01.620
and you should consult with your legal team

01:01.620 --> 01:06.090
and privacy team before integrating any LLM-based solution

01:06.090 --> 01:07.890
in your enterprise.

01:07.890 --> 01:08.940
There are a lot of rules

01:08.940 --> 01:12.270
and a lot of regulations that I'm not aware of,

01:12.270 --> 01:14.640
and the topic of data retention

01:14.640 --> 01:17.910
and privacy is very, very sensitive

01:17.910 --> 01:20.310
and should have appropriate handling.

01:20.310 --> 01:23.880
I'm also not representing any LLM vendor here

01:23.880 --> 01:25.800
and I'm not giving legal advice

01:25.800 --> 01:28.980
and every LLM vendor is going to have a EULA,

01:28.980 --> 01:31.470
an end user license agreement

01:31.470 --> 01:33.690
where they have their terms of services

01:33.690 --> 01:36.930
and they specify how they handle your data.

01:36.930 --> 01:41.160
And it's a legal document that you should look into, okay?

01:41.160 --> 01:43.170
I'm just giving you my 2 cents here.

01:43.170 --> 01:46.680
And again, I'm not a lawyer and this is not a legal advice.

01:46.680 --> 01:49.260
So, this is a very important disclaimer.

01:49.260 --> 01:52.950
You should always talk to your legal team and privacy team

01:52.950 --> 01:55.950
and to act according to what they say.

01:55.950 --> 01:57.810
This video is for educational purposes,

01:57.810 --> 02:01.170
and I'm going to give you my 2 cents on this topic.

02:01.170 --> 02:03.600
So, you should take everything I say in this video

02:03.600 --> 02:05.010
with a grain of salt

02:05.010 --> 02:07.140
and you should really do your own research

02:07.140 --> 02:08.673
when it comes to this topic.

02:12.990 --> 02:16.410
All right, so let me start by saying that right now,

02:16.410 --> 02:19.020
at least in 2025,

02:19.020 --> 02:21.720
open source models have gone a long way

02:21.720 --> 02:24.150
and they are becoming quite good.

02:24.150 --> 02:26.040
And we have models like Deepseek,

02:26.040 --> 02:29.250
which has amazing results in benchmarks

02:29.250 --> 02:32.520
and can actually outperform managed models.

02:32.520 --> 02:35.160
And in the future, it is inevitable

02:35.160 --> 02:38.010
that we're going to have more and more open source models

02:38.010 --> 02:41.730
which outperform those managed large language models.

02:41.730 --> 02:42.810
Okay, so let's start

02:42.810 --> 02:45.390
about using open source models in production.

02:45.390 --> 02:49.530
So, first of all, allegedly it's going to be cost effective

02:49.530 --> 02:51.510
because they're free,

02:51.510 --> 02:54.210
and they're cheaper sometimes to operate

02:54.210 --> 02:56.130
because they're smaller,

02:56.130 --> 02:57.780
they proprietary models.

02:57.780 --> 03:00.240
So, this is going to be with a question mark

03:00.240 --> 03:03.420
because I don't believe in large scale

03:03.420 --> 03:06.240
that they're going to be actually much cheaper.

03:06.240 --> 03:07.073
All right.

03:07.073 --> 03:08.700
And let's talk about customizations

03:08.700 --> 03:12.090
because organizations can fine-tune those models

03:12.090 --> 03:14.490
for specific tasks or domains

03:14.490 --> 03:16.770
that potentially can outperform

03:16.770 --> 03:20.970
those general purpose proprietary large language models.

03:20.970 --> 03:23.490
And I think one of the major advantages

03:23.490 --> 03:26.820
of using open source models in production for organizations

03:26.820 --> 03:29.340
is the control and privacy.

03:29.340 --> 03:33.060
So, companies can host their models, their sales,

03:33.060 --> 03:35.160
on their internal servers,

03:35.160 --> 03:37.950
so the data doesn't leave their servers

03:37.950 --> 03:41.220
and they can keep it safe and private.

03:41.220 --> 03:43.410
And I think this is especially true

03:43.410 --> 03:46.500
for companies which are highly regulated,

03:46.500 --> 03:48.810
like banks, maybe hospitals

03:48.810 --> 03:51.870
or organizations dealing with health data

03:51.870 --> 03:55.770
which are under compliance and under heavy regulation.

03:55.770 --> 03:59.430
All right, so let me now tackle those advantages.

03:59.430 --> 04:04.430
So, cost effectiveness is actually not true in my opinion

04:04.530 --> 04:07.740
because this means even though the model is free,

04:07.740 --> 04:10.950
because it's open source and we can actually use it

04:10.950 --> 04:12.720
or locally in our machine,

04:12.720 --> 04:16.110
but deploying it so that it can actually be served

04:16.110 --> 04:21.110
to customers in large scale is a very, very hard task.

04:21.360 --> 04:24.990
We need to handle availability, durability,

04:24.990 --> 04:27.360
scalability, all the agility.

04:27.360 --> 04:28.890
We need to handle security

04:28.890 --> 04:31.470
and we need to handle a lot of things.

04:31.470 --> 04:33.420
And now our task becomes,

04:33.420 --> 04:36.420
rather than developing an LLM application,

04:36.420 --> 04:41.130
but making an LLM model be able to serve our customers

04:41.130 --> 04:42.690
and to handle our scale.

04:42.690 --> 04:46.230
So, I think this really derails the goal

04:46.230 --> 04:48.030
of using open source models

04:48.030 --> 04:51.540
because it shifts all the responsibility

04:51.540 --> 04:55.500
and all the work to start handling a lot of operations.

04:55.500 --> 04:59.160
And one might argue that we can use a managed services

04:59.160 --> 05:01.350
that host those open source models,

05:01.350 --> 05:03.570
so to use services like Grok.

05:03.570 --> 05:06.990
So, they are correct, but once we do that,

05:06.990 --> 05:09.330
we actually lose a lot of the benefits

05:09.330 --> 05:11.220
of using open source models

05:11.220 --> 05:13.650
because we wanted them to be on our service

05:13.650 --> 05:15.060
so they can be private

05:15.060 --> 05:19.230
and we'll have control over what's going to be generated.

05:19.230 --> 05:23.970
And once we use those services like Grok, then we lose that.

05:23.970 --> 05:25.770
And let's talk about pricing.

05:25.770 --> 05:29.430
Whether we deploy it ourself, it's going to cost a lot.

05:29.430 --> 05:31.800
If it's not going to cost a lot of compute

05:31.800 --> 05:34.080
and GPUs to serve those models,

05:34.080 --> 05:36.300
it's going to cost a lot because we need

05:36.300 --> 05:38.700
to pay for engineers to deploy them

05:38.700 --> 05:42.210
and for operations team to handle them and to monitor them.

05:42.210 --> 05:45.810
So, a lot of costs is involved of deploying those LLMs,

05:45.810 --> 05:47.100
open source of LLMs.

05:47.100 --> 05:50.010
And if we'll go even to managed services like Grok,

05:50.010 --> 05:51.180
to use them,

05:51.180 --> 05:54.360
then the pricing is not that compelling

05:54.360 --> 05:58.860
and it's not that cheaper from those managed LLMs,

05:58.860 --> 06:02.460
proprietary LLMS from vendors like OpenAI,

06:02.460 --> 06:05.820
Anthropic, and Google and many other more.

06:05.820 --> 06:08.760
And the trend of those first party models

06:08.760 --> 06:11.700
are going to be that they're getting better,

06:11.700 --> 06:15.840
faster, and cheaper as we go in time.

06:15.840 --> 06:18.600
All right, so let me talk about benefits

06:18.600 --> 06:21.060
of using the managed LLMs.

06:21.060 --> 06:25.440
So, those proprietary models that are offered as a service.

06:25.440 --> 06:27.840
And let's talk about the advantages.

06:27.840 --> 06:29.700
So, first of all, the ease of use.

06:29.700 --> 06:31.680
They're super easy to use

06:31.680 --> 06:33.450
and they're simple to integrate

06:33.450 --> 06:35.670
and we don't need to handle deployment.

06:35.670 --> 06:38.070
So, it reduced the time to market

06:38.070 --> 06:41.070
because we can simply plug and play.

06:41.070 --> 06:43.980
They're very reliable and offer support

06:43.980 --> 06:47.280
because the vendors provide professional support and updates

06:47.280 --> 06:50.040
and optimizations as we go in time.

06:50.040 --> 06:51.660
And regarding compliance.

06:51.660 --> 06:55.980
So, most LLMs are actually compliant for SOC 2

06:55.980 --> 06:57.720
and for HIPAA,

06:57.720 --> 07:01.530
and every vendor can tell you if they're compliant or not

07:01.530 --> 07:04.740
and to which compliance issues.

07:04.740 --> 07:08.370
All right, and let's talk now also about performance.

07:08.370 --> 07:10.260
So, those,

07:10.260 --> 07:13.440
managed LLMs of the big vendors

07:13.440 --> 07:18.440
are usually quite good and they offer very quality results.

07:18.510 --> 07:20.790
And let's talk about the elephant in the room.

07:20.790 --> 07:24.990
So, sending a sensitive data to a third party service,

07:24.990 --> 07:29.310
it may not be suitable for all organizations,

07:29.310 --> 07:31.920
however many organizations are cloud-based.

07:31.920 --> 07:33.360
And if they're cloud-based,

07:33.360 --> 07:36.120
then their data is already in the cloud.

07:36.120 --> 07:38.800
So, they're storing their databases in AWS

07:39.690 --> 07:43.200
or in Google Cloud, and the data is already there.

07:43.200 --> 07:47.280
So, why is it so scary sending some prompts

07:47.280 --> 07:49.260
to those vendors?

07:49.260 --> 07:51.120
Because for example, let's say Anthropic.

07:51.120 --> 07:53.670
So, Anthropic, has their model available

07:53.670 --> 07:57.510
also on AWS Bedrock and on Google Cloud.

07:57.510 --> 08:00.480
So, those customers which deploy their services

08:00.480 --> 08:03.870
on those clouds can actually consume the service.

08:03.870 --> 08:07.140
So, it's going to stay on the cloud already.

08:07.140 --> 08:09.810
And if you're example using Gemini in Google

08:09.810 --> 08:11.790
and you're already deployed on Google Cloud,

08:11.790 --> 08:15.270
then it doesn't differ than using another database

08:15.270 --> 08:17.070
or another managed services.

08:17.070 --> 08:17.903
All right.

08:17.903 --> 08:20.250
And I also want to address a fine-tuning.

08:20.250 --> 08:25.020
So, you can actually fine-tune proprietary models

08:25.020 --> 08:29.160
and the vendors actually offer this capability.

08:29.160 --> 08:33.630
However, I'm not personally a big fan of fine-tuning

08:33.630 --> 08:35.940
because in my opinion, in most cases,

08:35.940 --> 08:39.360
we don't really need it and it's simply going to cost us

08:39.360 --> 08:43.140
whether it's time creating our dataset for fine-tuning

08:43.140 --> 08:45.810
or the actual compute to fine-tune the model.

08:45.810 --> 08:49.110
And I think because models are so much better now,

08:49.110 --> 08:50.610
and with the right prompt

08:50.610 --> 08:53.280
and right few shot examples,

08:53.280 --> 08:56.910
we can achieve awesome results which are acceptable

08:56.910 --> 09:00.210
and won't require us to go and fine-tune a model.

09:00.210 --> 09:01.710
Let me know if you want me

09:01.710 --> 09:04.413
to make a video elaborating on this topic.