WEBVTT

00:00.060 --> 00:02.220
-: Hey. So in this video we're gonna learn about

00:02.220 --> 00:03.600
Anthropics computer use

00:03.600 --> 00:06.726
and how to set up their example code inside of GitHub.

00:06.726 --> 00:09.870
Anthropics made a fine tuned version of Claude

00:09.870 --> 00:11.070
and its other models,

00:11.070 --> 00:14.070
where it can click on specific elements within the desktop

00:14.070 --> 00:16.380
using an age agentic-based system.

00:16.380 --> 00:18.360
Claude can now use computers.

00:18.360 --> 00:19.410
This is really impactful,

00:19.410 --> 00:21.270
and we'll see this in the next couple of years

00:21.270 --> 00:25.410
where LLMs will be able to use entire computer systems,

00:25.410 --> 00:28.740
anywhere from coding editors, text editors, browsers.

00:28.740 --> 00:30.870
This is gonna give a range

00:30.870 --> 00:33.870
and plethora of opportunities for software engineers

00:33.870 --> 00:36.450
for automating tasks that were traditionally hard to do

00:36.450 --> 00:38.250
because they didn't have a REST API,

00:38.250 --> 00:41.220
and building web scrapers by hand is quite brittle

00:41.220 --> 00:42.480
and often bespoke work.

00:42.480 --> 00:44.760
We'll need to visit github.com,

00:44.760 --> 00:46.350
go to /anthropics,

00:46.350 --> 00:49.350
and go to /anthropic/quizstarts,

00:49.350 --> 00:50.183
and then you'll see

00:50.183 --> 00:51.660
there's a range of different quickstarts here.

00:51.660 --> 00:53.280
The one that we are interested in using

00:53.280 --> 00:55.230
is the computer-use-demo.

00:55.230 --> 00:58.080
My suggestion is to git clone this repository

00:58.080 --> 00:59.610
onto your local desktop,

00:59.610 --> 01:03.180
and then we can go into the computer-use-demo,

01:03.180 --> 01:05.550
and then we'll have a look at the setups for that.

01:05.550 --> 01:08.940
My suggestion would be to install Docker on your machine,

01:08.940 --> 01:09.840
and then what we're gonna do

01:09.840 --> 01:12.840
is then export the Anthropic API key,

01:12.840 --> 01:15.240
and then we're gonna run this Docker run command.

01:15.240 --> 01:19.380
My API key is already exposed as this environment variable,

01:19.380 --> 01:21.750
so I'm just gonna copy this Docker run command,

01:21.750 --> 01:22.800
and then what that's gonna do

01:22.800 --> 01:25.740
is it's going to download all the images

01:25.740 --> 01:28.590
and it's going to start setting those up.

01:28.590 --> 01:31.860
It will be installing a Linux virtual machine,

01:31.860 --> 01:34.950
and we can see that if we look inside of the Docker file.

01:34.950 --> 01:37.680
They actually use a Debian based image,

01:37.680 --> 01:39.690
and then they install a range of different things

01:39.690 --> 01:42.330
such as some UI requirements.

01:42.330 --> 01:43.800
They install Python.

01:43.800 --> 01:45.810
They also install some network tools.

01:45.810 --> 01:48.000
And also, they provide some other things,

01:48.000 --> 01:50.850
such as the libraries such as Mozilla,

01:50.850 --> 01:53.370
and also installing Office as well.

01:53.370 --> 01:56.130
And as well as that, then they also do some other things

01:56.130 --> 01:58.410
to make sure that Python is set up correctly,

01:58.410 --> 02:02.130
and they make sure that the relevant screen height and width

02:02.130 --> 02:07.130
is set to 768 pixels and has a width of 1,024 pixels.

02:08.970 --> 02:10.860
After a while, that should then initialize

02:10.860 --> 02:13.410
and you can go to localhost 8080.

02:13.410 --> 02:16.530
And then for example, in here on the left hand side,

02:16.530 --> 02:19.140
you've got a Streamlet app, which will accept commands.

02:19.140 --> 02:21.930
So I can say open Office,

02:21.930 --> 02:24.510
and then what it's gonna do is it's gonna run those commands

02:24.510 --> 02:27.060
against a tool functioning agent,

02:27.060 --> 02:28.620
and it's decided that it's going to pick

02:28.620 --> 02:30.750
the image of a screenshot.

02:30.750 --> 02:33.930
And then after that, you can see that the Libre Calc icon

02:33.930 --> 02:35.190
is on the desktop.

02:35.190 --> 02:36.450
And then you've got here

02:36.450 --> 02:38.550
where it's actually got this tip of the day.

02:38.550 --> 02:41.460
And then it's gonna go on and mouse move onto coordinate,

02:41.460 --> 02:44.250
and then it's decided to click on that,

02:44.250 --> 02:45.720
and it's clicked on that okay button.

02:45.720 --> 02:48.120
So you can see it's able to load things like Office,

02:48.120 --> 02:49.590
and this is really, really great

02:49.590 --> 02:52.020
because we can actually utilize this.

02:52.020 --> 02:53.370
What it's also decided to do,

02:53.370 --> 02:54.780
it's decided to left click

02:54.780 --> 02:57.840
and it's gone into the top left column here.

02:57.840 --> 02:59.760
So again, we do things like

02:59.760 --> 03:04.760
close this Office application and open a browser,

03:06.360 --> 03:10.980
and go to Google and search for cats.

03:10.980 --> 03:15.000
And so, it's basically deciding what next steps to do

03:15.000 --> 03:17.700
by taking a screenshot every one or two seconds,

03:17.700 --> 03:20.700
and then what it does is it uses a range of different tools.

03:20.700 --> 03:24.210
So it can use keyboard, it can use mouse movements,

03:24.210 --> 03:26.760
and all of this is done inside of the GitHub repository.

03:26.760 --> 03:29.730
And now it's going and it's initializing Firefox.

03:29.730 --> 03:31.950
And after it's initialized Firefox,

03:31.950 --> 03:34.110
it's then decided that it wants to left click,

03:34.110 --> 03:36.900
and it's gonna then enter into the search address.

03:36.900 --> 03:40.020
The tool use from the computer is to then type in text,

03:40.020 --> 03:42.060
and it's gone to Google, typed in text,

03:42.060 --> 03:44.160
and then it's come back with cats.

03:44.160 --> 03:47.190
My one caveat here is you might get rate limited.

03:47.190 --> 03:49.080
So depending upon your Anthropic key,

03:49.080 --> 03:51.330
this might be something that you experience.

03:51.330 --> 03:53.640
But the computer use is a demo.

03:53.640 --> 03:56.400
It's in beta, so it's very experimental,

03:56.400 --> 03:59.910
but it's definitely a direction that is happening.

03:59.910 --> 04:01.590
And within the next couple of years,

04:01.590 --> 04:02.880
I can see this being a way

04:02.880 --> 04:05.670
that you can interact directly with computers

04:05.670 --> 04:08.100
so that you can automate repetitive tasks

04:08.100 --> 04:11.040
where you have data across different types of applications

04:11.040 --> 04:13.170
that are potentially on that machine.

04:13.170 --> 04:15.370
Cool. Alright, I'll see you in the next one.
