WEBVTT

00:00:00.080 --> 00:00:03.980
Giving AI agents ways of verifying their
work on their own,

00:00:04.040 --> 00:00:07.060
like automated tests or browser access,
is very

00:00:07.140 --> 00:00:10.760
useful. You can kind of take that to the

00:00:10.800 --> 00:00:14.360
extreme by building a loop where you

00:00:14.400 --> 00:00:17.480
essentially don't interact with Claude
code at all

00:00:17.540 --> 00:00:21.460
anymore, and that pattern is also known

00:00:21.560 --> 00:00:25.180
as the Rolf loop. It's named after

00:00:25.340 --> 00:00:29.300
Ralf Wiggum,
the character from The Simpsons, for its

00:00:29.600 --> 00:00:33.380
naive persistence.
The ultimate idea behind that Rolf loop

00:00:33.480 --> 00:00:37.430
is pretty simple.
You create a shell script, and you find

00:00:37.500 --> 00:00:41.420
this updated starting project,
including the shell script attached,

00:00:41.440 --> 00:00:45.380
you wanna try it. And that shell script,
in the end, sets

00:00:45.480 --> 00:00:49.300
up a loop, a loop
that goes through a maximum amount of

00:00:49.360 --> 00:00:52.960
iterations specified by you, and
that maximum amount of

00:00:53.000 --> 00:00:56.860
iterations is simply an input that's
expected to ensure

00:00:56.900 --> 00:01:00.400
that the loop doesn't keep on running
forever, but that it has some

00:01:00.760 --> 00:01:04.540
exit condition. But whilst the loop
is running,

00:01:04.599 --> 00:01:07.800
Claude code is invoked over and over

00:01:07.860 --> 00:01:11.720
again, with all permissions allowed,
and therefore, you

00:01:11.780 --> 00:01:14.230
should configure it to run in sandbox
mode.

00:01:14.280 --> 00:01:18.040
More on that in a second.
And we're not running Claude code

00:01:18.120 --> 00:01:21.700
in the interactive mode in
which we used it throughout the course,

00:01:21.740 --> 00:01:25.280
instead with the -p flag,
we can pass in a prompt

00:01:25.380 --> 00:01:29.210
directly. And the prompt we do pass in
is this one here, where we

00:01:29.240 --> 00:01:32.870
point at some files,
which I'll show you in a second,

00:01:33.040 --> 00:01:36.680
ask Claude code to pick a task from a list
of

00:01:36.740 --> 00:01:40.590
tasks where passes is set to false,
and you'll

00:01:40.600 --> 00:01:44.470
see that list in a second,
and where we then want it to tackle that

00:01:44.520 --> 00:01:48.160
task and also verify its changes by

00:01:48.200 --> 00:01:51.610
running tests and by also visiting the

00:01:51.700 --> 00:01:55.430
website by using the Playwright MCP,
so by testing

00:01:55.460 --> 00:01:58.760
those changes in the browser,
as you saw over the last

00:01:58.800 --> 00:02:02.600
lectures. Now, that list of tasks
is simply a JSON

00:02:02.660 --> 00:02:06.620
document with a bunch of tasks,
and the format is totally up to you.

00:02:06.660 --> 00:02:10.300
This will be handled by an AI, so there
is no fixed format.

00:02:10.320 --> 00:02:13.980
This is just one convention you often see,
that you have a

00:02:14.000 --> 00:02:17.760
description of the task, some steps
that make up the task, and a

00:02:17.820 --> 00:02:21.120
passes flag, which initially
is set to false for all

00:02:21.180 --> 00:02:25.060
tasks. And the idea is
that the agent changes it to true

00:02:25.120 --> 00:02:28.420
task by task once it worked through the
tasks.

00:02:28.460 --> 00:02:32.030
The idea is also
that the agent doesn't have to go in

00:02:32.140 --> 00:02:35.240
chooses on its own the next task it should

00:02:35.320 --> 00:02:39.230
tackle. Now,
this task list in my case here was derived

00:02:39.280 --> 00:02:43.060
from this spec document,
which I created early on in this

00:02:43.100 --> 00:02:47.040
course,
and I used AI for deriving this list of

00:02:47.080 --> 00:02:50.900
tasks here. And you can, of course,
define any tasks you want and be

00:02:51.000 --> 00:02:54.860
as granular as you want.
You should try to not be too

00:02:54.920 --> 00:02:58.580
granular or too general, though. The tasks

00:02:58.800 --> 00:03:02.280
should typically be focusing on one main
problem

00:03:02.460 --> 00:03:06.260
each. But then the idea
is really just that you

00:03:06.340 --> 00:03:10.280
keep Claude code running on its own,
task by task,

00:03:10.380 --> 00:03:14.280
and that it verifies its work on its own
with tests, with browser

00:03:14.380 --> 00:03:17.650
access,
and you as a human don't do anything.

00:03:17.700 --> 00:03:21.240
You keep Claude working
and you come back once it's done to

00:03:21.360 --> 00:03:24.480
verify the overall result.
That's the idea.

00:03:24.540 --> 00:03:27.580
Now, important,
you have to grant Claude code

00:03:28.180 --> 00:03:32.120
broad permissions,
because otherwise it would need your

00:03:32.180 --> 00:03:35.860
example, edit a file
or to send an HTTP request, and the

00:03:35.900 --> 00:03:39.760
idea is that it can go on its own.
So since that's the case, you

00:03:39.840 --> 00:03:42.860
must make sure that you get this sandbox
mode

00:03:43.120 --> 00:03:46.320
enabled. You could also try running Docker

00:03:46.620 --> 00:03:50.460
Sandbox here,
run Claude with Docker Sandbox,

00:03:50.520 --> 00:03:54.210
earlier,
but I consistently ran into various

00:03:54.280 --> 00:03:57.800
there when it, for example,
came to Claude code trying to use the

00:03:57.820 --> 00:04:01.340
browser,
which is why I'm using this built-in

00:04:01.440 --> 00:04:05.160
instead.
This is a very important setting for this

00:04:05.200 --> 00:04:08.920
loop.
You don't wanna grant it broad permissions

00:04:08.940 --> 00:04:12.660
It could delete your hard drive.
Just as before,

00:04:12.720 --> 00:04:16.700
you also still use other features like
agents or

00:04:16.760 --> 00:04:20.160
skills or a Claude MD file with

00:04:20.200 --> 00:04:24.010
instructions because you're still using
Claude code, just in

00:04:24.060 --> 00:04:27.920
that autopilot mode. Well,
and then the idea simply is that you

00:04:27.960 --> 00:04:31.640
invoke this Rolf shell script here,
and you give it a

00:04:31.700 --> 00:04:35.680
maximum number of iterations, like 10
or whatever, and then it

00:04:35.780 --> 00:04:39.680
starts.
It takes a look at this prd.json file,

00:04:39.690 --> 00:04:43.380
task, and goes to work.
And it'll report back whenever

00:04:43.440 --> 00:04:47.320
it finished a task,
and it will progress through

00:04:47.380 --> 00:04:51.050
task,
and let you know once it's done overall,

00:04:51.100 --> 00:04:54.010
max number of iterations expired here,
of course.

00:04:54.080 --> 00:04:57.740
If you wanna see what it's doing,
you can always, by the way,

00:04:57.840 --> 00:05:01.060
run a new Claude session in a separate
terminal window.

00:05:01.120 --> 00:05:04.840
Obviously,
you must keep this one here open

00:05:04.900 --> 00:05:08.840
claude -c to continue the last session
that was started, which

00:05:09.140 --> 00:05:13.040
should be the session started by the Rolf
loop, and then you can see

00:05:13.140 --> 00:05:16.690
what it's doing. So here, for example,
it's writing a database initialization

00:05:16.760 --> 00:05:19.490
script now. Now, this
is not live updating though.

00:05:19.520 --> 00:05:22.920
You'll have to run it again to see the
latest state, and you could also

00:05:22.940 --> 00:05:26.820
interfere here, but that, of course,
is not really the idea of

00:05:26.860 --> 00:05:30.620
the Rolf loop. Now, I did actually let

00:05:30.660 --> 00:05:34.280
Rolf work on that same project we built
manually throughout the

00:05:34.360 --> 00:05:38.300
course, or manually with AI, I guess,
and the

00:05:38.340 --> 00:05:41.240
result looks different from what we built
in this course.

00:05:41.280 --> 00:05:45.270
But I have a working application where I

00:05:45.320 --> 00:05:48.040
can sign up and log in and create

00:05:48.100 --> 00:05:50.240
notes,

00:05:53.700 --> 00:05:56.800
which supports rich text editing, public

00:05:56.820 --> 00:06:00.180
sharing, saving, viewing, viewing

00:06:00.260 --> 00:06:03.924
publicly-shared notes.And also

00:06:03.964 --> 00:06:06.814
deleting,
including this confirmation dialogue.

00:06:06.824 --> 00:06:10.664
And all that was built automatically by
that Rolf loop,

00:06:10.904 --> 00:06:14.704
I did not interfere at any time. In
that project I shared

00:06:14.764 --> 00:06:18.504
with you, here it, for example,
finished the first task now.

00:06:18.584 --> 00:06:22.364
So, the question therefore
is should you only use the Rolf

00:06:22.424 --> 00:06:25.964
loop from now on? And I'd say it depends.

00:06:26.044 --> 00:06:29.614
It can definitely yield good results,
like the application I built you.

00:06:29.644 --> 00:06:33.604
It's not a super complex app, of course,
but it works and

00:06:33.664 --> 00:06:37.204
it works because Cloud code
was able to test the

00:06:37.244 --> 00:06:40.904
changes on its own with the playwright
MCP, with automated

00:06:40.914 --> 00:06:43.594
tests, and that feedback loop
is important.

00:06:43.594 --> 00:06:47.554
So especially for prototyping
or building utility software or

00:06:47.624 --> 00:06:50.614
internal tools, it can be amazing.

00:06:50.644 --> 00:06:54.464
You can just offload work to it
and then work on something else.

00:06:54.564 --> 00:06:58.213
Obviously,
you'll need to plan ahead though,

00:06:58.284 --> 00:07:02.274
document. You need to derive those tasks,
you should take a look

00:07:02.304 --> 00:07:05.444
at them and not just blindly trust what AI
generated here for

00:07:05.564 --> 00:07:09.244
you. So it does have some upfront cost
or some

00:07:09.284 --> 00:07:13.164
upfront effort, but it can then,
of course, do its job and

00:07:13.264 --> 00:07:17.024
you can work on something else. It can,
however, potentially

00:07:17.084 --> 00:07:21.064
cost a lot of tokens because since the AI
does

00:07:21.124 --> 00:07:25.064
work on its own and since it does need to
verify everything by running tests,

00:07:25.104 --> 00:07:29.034
by generating all these browser usage
commands, which all cost a

00:07:29.064 --> 00:07:32.864
lot of tokens,
you do spend a lot of tokens, and that

00:07:32.944 --> 00:07:36.684
aware of.
So you can burn through your tokens in

00:07:36.744 --> 00:07:39.354
Cloud code subscription quite quickly.

00:07:39.364 --> 00:07:40.984
And there's a trade-off, of course.

00:07:41.024 --> 00:07:44.584
When using this autonomous loop,
you have less

00:07:44.594 --> 00:07:48.584
control. You only see what it's doing
when you

00:07:48.644 --> 00:07:52.264
check in from time to time or
when you run the application from time to

00:07:52.304 --> 00:07:55.824
time. Whilst it's working,
you have no insight into what's

00:07:55.884 --> 00:07:59.564
happening. You don't use plan mode
and see what it plans to

00:07:59.604 --> 00:08:03.144
do, you don't have
that live stream of what it's

00:08:03.154 --> 00:08:07.124
doing. And of course, if you come back
and decide that something doesn't work

00:08:07.164 --> 00:08:11.053
or look the way you want it to look,
well then a lot of tokens were

00:08:11.124 --> 00:08:14.854
spent for nothing. But of course,
on the other hand,

00:08:14.904 --> 00:08:18.873
other work and you can of course work on
multiple projects in parallel by

00:08:18.944 --> 00:08:21.954
having multiple such loops running in
parallel.

00:08:21.964 --> 00:08:25.324
It's also worth noting though
that AI code quality will

00:08:25.544 --> 00:08:28.764
vary. A working application
is not the same as a

00:08:28.804 --> 00:08:32.564
production-ready application.
The code generated can have bugs,

00:08:32.584 --> 00:08:35.844
including serious performance
or security bugs.

00:08:35.904 --> 00:08:39.624
You always must verify and review
that code

00:08:39.703 --> 00:08:43.044
before you push it to production,
especially if it's public

00:08:43.144 --> 00:08:45.644
and/or you're charging money for it.

00:08:45.704 --> 00:08:49.124
Don't blindly trust it.
Don't fall into the vibe coding

00:08:49.224 --> 00:08:53.084
trap. And AI can also get stuck.
I had it more

00:08:53.104 --> 00:08:56.964
than once that the Rolf loop would simply
get stuck and

00:08:57.064 --> 00:09:01.044
I had to restart it manually
or I had to help

00:09:01.084 --> 00:09:04.284
it get unstuck in other ways by installing
an extra

00:09:04.364 --> 00:09:08.134
dependency. That might not happen for you,
that might not happen with

00:09:08.344 --> 00:09:12.304
future versions of Cloud code.
It definitely did happen from time to

00:09:12.344 --> 00:09:15.904
time for me though,
and therefore the Rolf loop can

00:09:15.944 --> 00:09:19.604
be useful. It does not replace you as a

00:09:19.624 --> 00:09:23.444
developer evaluating the results though
and steering the AI

00:09:23.464 --> 00:09:27.364
into the right direction.
You must create a good

00:09:27.424 --> 00:09:31.244
plan. You must derive good tasks. You must

00:09:31.304 --> 00:09:35.124
have a solid project setup where tests
are enabled,

00:09:35.184 --> 00:09:38.954
where you give the AI just the right
skills and so on.

00:09:39.004 --> 00:09:42.824
So there is some upfront work involved
which you have to go

00:09:42.844 --> 00:09:46.033
through in order to get good results.

00:09:46.044 --> 00:09:49.954
But then, the Rolf loop can be useful,
especially as mentioned

00:09:50.044 --> 00:09:53.924
in my case for prototyping
or simple products.