1
00:00:00,100 --> 00:00:02,740
In this video, we'll be optimizing the image

2
00:00:02,740 --> 00:00:05,520
search endpoint with a simple but powerful

3
00:00:05,520 --> 00:00:09,600
Gemini API feature, the Media Resolution feature.

4
00:00:09,600 --> 00:00:12,280
The Gemini API provides a Media Resolution

5
00:00:12,280 --> 00:00:14,680
setting that you can add to the generation

6
00:00:14,680 --> 00:00:16,040
configuration.

7
00:00:16,040 --> 00:00:18,020
This setting helps control the processing

8
00:00:18,020 --> 00:00:21,540
quality applied to the image and video inputs

9
00:00:21,540 --> 00:00:24,720
sent to a multimodal Gemini model.

10
00:00:24,720 --> 00:00:27,200
The primary benefit of this is the optimization

11
00:00:27,200 --> 00:00:30,700
trade-off between cost and latency and detail

12
00:00:30,700 --> 00:00:33,880
slash accuracy in the image interpretation.

13
00:00:33,880 --> 00:00:36,400
For our travel companion, we can easily get

14
00:00:36,400 --> 00:00:38,580
away with processing our images at medium

15
00:00:38,580 --> 00:00:42,160
quality instead of high quality resolutions.

16
00:00:42,160 --> 00:00:44,660
This is because it is easy for the model to

17
00:00:44,660 --> 00:00:47,500
easily detect a landmark without needing too

18
00:00:47,500 --> 00:00:51,640
much granular details in the image it is given.

19
00:00:51,640 --> 00:00:53,740
The benefits of having our images processed

20
00:00:53,740 --> 00:00:57,420
at lower resolutions include cost control.

21
00:00:57,420 --> 00:00:59,560
Using a lower resolution setting generally

22
00:00:59,560 --> 00:01:02,560
results in lower input token counts.

23
00:01:02,560 --> 00:01:05,140
This directly translates to lower API costs

24
00:01:05,140 --> 00:01:08,740
for processing image and video input.

25
00:01:08,740 --> 00:01:10,620
Next is latency reduction.

26
00:01:10,620 --> 00:01:12,740
Processing lower resolution images and videos

27
00:01:12,740 --> 00:01:16,480
requires less computational time for the model.

28
00:01:16,480 --> 00:01:18,560
This leads to faster response times, which

29
00:01:18,560 --> 00:01:21,620
is especially crucial in real-time applications.

30
00:01:21,620 --> 00:01:25,520
And finally, we have performance optimization.

31
00:01:25,520 --> 00:01:28,180
For tasks that don't require eye detail, for

32
00:01:28,180 --> 00:01:30,220
example, identifying the general subject of

33
00:01:30,220 --> 00:01:32,900
an image or asking a question about a video's

34
00:01:32,900 --> 00:01:36,380
scene change, using a lower resolution avoids

35
00:01:36,380 --> 00:01:38,640
unnecessary processing overhead while still

36
00:01:38,640 --> 00:01:42,280
maintaining sufficient accuracy.

37
00:01:42,280 --> 00:01:44,160
With a good understanding of what this setting

38
00:01:44,160 --> 00:01:47,600
is all about, let us implement it on our image

39
00:01:47,600 --> 00:01:49,200
prompt request.

40
00:01:49,200 --> 00:01:52,420
Let's open up that handler and if we scroll

41
00:01:52,420 --> 00:01:56,540
down to the request code in the config section

42
00:01:56,540 --> 00:01:59,380
of the image request, we can simply set the

43
00:01:59,380 --> 00:02:02,520
image processing resolution to operate at

44
00:02:02,520 --> 00:02:03,980
medium resolution.

45
00:02:03,980 --> 00:02:06,400
So let's do that.

46
00:02:06,400 --> 00:02:13,260
Here I'm going to simply set a media resolution

47
00:02:13,260 --> 00:02:24,140
setting and that is set to types.mediaResolution

48
00:02:24,140 --> 00:02:26,240
and the media resolution we're going to select

49
00:02:26,240 --> 00:02:31,880
is the media resolution medium.

50
00:02:31,880 --> 00:02:33,160
Awesome.

51
00:02:33,160 --> 00:02:35,120
Now all our images are going to be processed

52
00:02:35,120 --> 00:02:37,820
at medium resolution.

53
00:02:37,820 --> 00:02:39,280
Now let's test the application to make sure

54
00:02:39,280 --> 00:02:42,280
that everything works fine as expected.

55
00:02:42,280 --> 00:02:44,260
We are also not going to be seeing any change

56
00:02:44,260 --> 00:02:47,160
in the frontend but now we can know that tokens

57
00:02:47,160 --> 00:02:48,920
are being saved.

58
00:02:48,920 --> 00:02:53,280
Let's just pull up our command line, start

59
00:02:53,280 --> 00:02:58,160
up our server, and now let's head over to

60
00:02:58,160 --> 00:02:59,460
the browser.

61
00:02:59,460 --> 00:03:01,460
Now here in the browser, let us go to our

62
00:03:01,460 --> 00:03:03,780
image search section.

63
00:03:03,780 --> 00:03:06,440
Select our London Eye image, which is of high

64
00:03:06,440 --> 00:03:12,160
resolution, 4272 by 2848.

65
00:03:12,160 --> 00:03:16,360
And search as we have always done.

66
00:03:16,360 --> 00:03:19,040
After a few seconds, we get our response back,

67
00:03:19,040 --> 00:03:20,300
just as before.

68
00:03:20,300 --> 00:03:22,740
Now while nothing stands out in how the application

69
00:03:22,740 --> 00:03:25,720
works, behind the scenes, we are rest assured

70
00:03:25,720 --> 00:03:28,360
that images are being processed at medium

71
00:03:28,360 --> 00:03:29,520
resolution.

72
00:03:29,520 --> 00:03:32,960
So, even if a user uploads a very high-resolution

73
00:03:32,960 --> 00:03:35,820
image, just like we've done, the processing

74
00:03:35,820 --> 00:03:38,700
is done on a lower resolution and with this

75
00:03:38,700 --> 00:03:42,000
simple setting, we are saving a lot of tokens.