WEBVTT

1
00:00:00.180 --> 00:00:01.800
<v ->Hey there, Eden here.</v>

2
00:00:01.800 --> 00:00:02.880
And in this video,

3
00:00:02.880 --> 00:00:06.960
we're going to discuss the bad side of agentic coding.

4
00:00:06.960 --> 00:00:09.630
And specifically, we are going to discuss

5
00:00:09.630 --> 00:00:12.780
the security quality of code

6
00:00:12.780 --> 00:00:14.910
generated by coding agents.

7
00:00:14.910 --> 00:00:17.070
So this is code generated by Cursor,

8
00:00:17.070 --> 00:00:18.960
by Claude Code, Codex,

9
00:00:18.960 --> 00:00:21.660
Gemini CLI, Antigravity.

10
00:00:21.660 --> 00:00:24.630
Every coding agent is going to artifact code

11
00:00:24.630 --> 00:00:28.200
and we can evaluate the security quality of that code.

12
00:00:28.200 --> 00:00:30.480
So we are going to review this blog,

13
00:00:30.480 --> 00:00:33.630
Bad Vibes: Comparing the Secure Coding Capabilities

14
00:00:33.630 --> 00:00:36.630
of Popular Coding Agents by Tenzai.

15
00:00:36.630 --> 00:00:40.790
And just to give you a bit of background of who is Tenzai.

16
00:00:40.790 --> 00:00:45.240
So Tenzai is a company which does AI hacking.

17
00:00:45.240 --> 00:00:48.030
So they use AI agents to do red teaming

18
00:00:48.030 --> 00:00:51.750
and to penetration test websites and applications

19
00:00:51.750 --> 00:00:53.760
and they offer a security product.

20
00:00:53.760 --> 00:00:57.180
So they're in a really interesting and growing domain,

21
00:00:57.180 --> 00:01:00.240
utilizing AI agents for security purposes.

22
00:01:00.240 --> 00:01:02.520
And just to give you a bit of context,

23
00:01:02.520 --> 00:01:06.540
they raised $75 million in seed

24
00:01:06.540 --> 00:01:08.190
to build their platform.

25
00:01:08.190 --> 00:01:09.990
And this huge seed round

26
00:01:09.990 --> 00:01:13.230
was due to the fact that those guys are second timers.

27
00:01:13.230 --> 00:01:15.480
So they're the co-founders of Guardicore,

28
00:01:15.480 --> 00:01:20.130
which sold to Akamai for $600 million a couple of years ago,

29
00:01:20.130 --> 00:01:22.530
so this startup is well funded.

30
00:01:22.530 --> 00:01:24.030
Now, just to give you a quick disclaimer,

31
00:01:24.030 --> 00:01:26.130
I do actually know the team,

32
00:01:26.130 --> 00:01:28.680
so I have worked with them in the past,

33
00:01:28.680 --> 00:01:30.360
and the team is super talented.

34
00:01:30.360 --> 00:01:32.520
All right, so let's go and review the blog here.

35
00:01:32.520 --> 00:01:35.550
So what I did here is a security benchmark

36
00:01:35.550 --> 00:01:37.710
of popular AI coding agents,

37
00:01:37.710 --> 00:01:41.430
Cursor, Claude Code, Codex, Replit, and Devin,

38
00:01:41.430 --> 00:01:45.660
found 69 vulnerabilities across 15 apps.

39
00:01:45.660 --> 00:01:48.060
Every agent shipped vulnerable code,

40
00:01:48.060 --> 00:01:52.650
broken auth, SSRF, missing controls, and more.

41
00:01:52.650 --> 00:01:55.200
Here's what broke and white matters.

42
00:01:55.200 --> 00:01:59.460
So this blog is actually going to explore the artifacts

43
00:01:59.460 --> 00:02:01.050
of all of those coding agents.

44
00:02:01.050 --> 00:02:02.460
So in this diagram,

45
00:02:02.460 --> 00:02:05.010
we can see all the coding agents right over here

46
00:02:05.010 --> 00:02:07.110
and the number of vulnerabilities

47
00:02:07.110 --> 00:02:09.390
they produced in this benchmark.

48
00:02:09.390 --> 00:02:11.280
So we can see the number of critical,

49
00:02:11.280 --> 00:02:14.130
the number of high vulnerabilities and low/medium.

50
00:02:14.130 --> 00:02:16.620
So we can see right here that every coding agent

51
00:02:16.620 --> 00:02:19.050
resulted us with vulnerable code.

52
00:02:19.050 --> 00:02:22.290
So they say vibe coding has fundamentally changed

53
00:02:22.290 --> 00:02:23.670
how we create software.

54
00:02:23.670 --> 00:02:26.310
While coding agents deliver enormous benefits,

55
00:02:26.310 --> 00:02:30.540
their rapid adoption raises many important questions

56
00:02:30.540 --> 00:02:34.140
As end-to-end AI generate applications become common,

57
00:02:34.140 --> 00:02:36.210
are vibe coded applications secure?

58
00:02:36.210 --> 00:02:38.340
So what they did actually here

59
00:02:38.340 --> 00:02:41.100
is sort of build those applications end-to-end

60
00:02:41.100 --> 00:02:43.590
with one, two, maybe three prompts.

61
00:02:43.590 --> 00:02:46.770
So this is not the regular workflow

62
00:02:46.770 --> 00:02:48.930
where a developer in an enterprise

63
00:02:48.930 --> 00:02:50.430
is usually going to iterate.

64
00:02:50.430 --> 00:02:51.900
And in this use case here,

65
00:02:51.900 --> 00:02:54.060
when we one shot those applications,

66
00:02:54.060 --> 00:02:57.600
we actually leave a lot of room for the AI,

67
00:02:57.600 --> 00:02:59.700
for the agents to decide what to do.

68
00:02:59.700 --> 00:03:01.920
So we don't narrow them

69
00:03:01.920 --> 00:03:03.870
and we don't scope them

70
00:03:03.870 --> 00:03:05.880
and we really let them go wild here.

71
00:03:05.880 --> 00:03:08.580
So of course, we're going to get some bad results here.

72
00:03:08.580 --> 00:03:09.930
And in this example,

73
00:03:09.930 --> 00:03:12.870
we're going to get very bad and vulnerable code.

74
00:03:12.870 --> 00:03:15.720
So in this post, we explore the security challenges

75
00:03:15.720 --> 00:03:17.790
introduced by vibe coding.

76
00:03:17.790 --> 00:03:20.820
We set out to compare five popular coding agents

77
00:03:20.820 --> 00:03:23.760
and assess their ability to write secure code.

78
00:03:23.760 --> 00:03:25.740
We tested the following coding agents

79
00:03:25.740 --> 00:03:28.650
with their default models during December 2025.

80
00:03:28.650 --> 00:03:31.290
And here, we can see all the coding agents that were tested.

81
00:03:31.290 --> 00:03:33.060
And to compare them accurately,

82
00:03:33.060 --> 00:03:34.470
we tasked each agent

83
00:03:34.470 --> 00:03:37.560
with building a series of identical applications

84
00:03:37.560 --> 00:03:40.170
using the same prompts and tech stack.

85
00:03:40.170 --> 00:03:41.730
Our goal was to replicate

86
00:03:41.730 --> 00:03:44.370
a typical iterative development process,

87
00:03:44.370 --> 00:03:48.180
simulating a user building an application from ground up,

88
00:03:48.180 --> 00:03:50.010
one of the most common use cases

89
00:03:50.010 --> 00:03:52.050
for AI coding agents.

90
00:03:52.050 --> 00:03:53.970
So it wasn't one shotted,

91
00:03:53.970 --> 00:03:55.200
it was actually iterative,

92
00:03:55.200 --> 00:03:57.240
so I take what I said backwards.

93
00:03:57.240 --> 00:04:00.120
All right, so here we can see we have prompt number one,

94
00:04:00.120 --> 00:04:03.330
which includes the tech stacks and basic design.

95
00:04:03.330 --> 00:04:06.480
Prompt number two asked to implement RBAC,

96
00:04:06.480 --> 00:04:08.250
role-based access control,

97
00:04:08.250 --> 00:04:10.080
and additional functionality.

98
00:04:10.080 --> 00:04:11.370
And prompt number three

99
00:04:11.370 --> 00:04:14.340
is going to implement additional roles in the RBAC

100
00:04:14.340 --> 00:04:16.200
and additional functionality.

101
00:04:16.200 --> 00:04:18.510
And here we can see we gave all those prompts

102
00:04:18.510 --> 00:04:20.640
to Codex, Claude Code, Cursor and Replit,

103
00:04:20.640 --> 00:04:22.500
and we got back an application.

104
00:04:22.500 --> 00:04:24.270
Once we had our application,

105
00:04:24.270 --> 00:04:26.280
we turned to the question of security.

106
00:04:26.280 --> 00:04:28.020
Using Tenzai's agent,

107
00:04:28.020 --> 00:04:31.650
we analyzed each of the apps to identify vulnerabilities.

108
00:04:31.650 --> 00:04:34.470
This resulted in a small and very interesting dataset

109
00:04:34.470 --> 00:04:38.280
containing a total of 69 vulnerabilities.

110
00:04:38.280 --> 00:04:41.970
And obviously, this blog was meant to promote their product,

111
00:04:41.970 --> 00:04:45.810
so their AI agents which do a penetration testing,

112
00:04:45.810 --> 00:04:48.750
but still we have some very interesting findings here,

113
00:04:48.750 --> 00:04:51.930
and we have some very important discussions in this blog.

114
00:04:51.930 --> 00:04:53.970
So here we can see the table from above,

115
00:04:53.970 --> 00:04:56.820
all the vulnerabilities divided by the coding agents

116
00:04:56.820 --> 00:04:59.820
and we can see the level of vulnerabilities they introduced.

117
00:04:59.820 --> 00:05:01.200
After analyzing the results,

118
00:05:01.200 --> 00:05:02.670
we uncovered common behaviors,

119
00:05:02.670 --> 00:05:03.900
recurring failure patterns,

120
00:05:03.900 --> 00:05:05.820
and finally an answer to the question,

121
00:05:05.820 --> 00:05:08.430
which agent wrote the most secure code?

122
00:05:08.430 --> 00:05:10.770
And I'll give you a quick spoiler,

123
00:05:10.770 --> 00:05:12.330
there isn't really a winner here,

124
00:05:12.330 --> 00:05:15.420
only who lost the most.

125
00:05:15.420 --> 00:05:17.100
Let's start with the good news.

126
00:05:17.100 --> 00:05:18.750
Based on our experimentation,

127
00:05:18.750 --> 00:05:21.150
coding agents appear to be quite effective

128
00:05:21.150 --> 00:05:23.880
at avoiding certain classes of bugs.

129
00:05:23.880 --> 00:05:26.760
A notable example were notorious categories

130
00:05:26.760 --> 00:05:28.050
of injection attacks.

131
00:05:28.050 --> 00:05:30.450
Across all of the applications we developed,

132
00:05:30.450 --> 00:05:34.620
we didn't encounter a single exploitable SQL injection

133
00:05:34.620 --> 00:05:37.590
or cross-site scripting vulnerability,

134
00:05:37.590 --> 00:05:39.870
two bug classes that have been staple

135
00:05:39.870 --> 00:05:42.150
of the OWASP Top 10 for years.

136
00:05:42.150 --> 00:05:46.590
And I think this is mostly because all of those attacks

137
00:05:46.590 --> 00:05:48.480
are almost nonexistent

138
00:05:48.480 --> 00:05:50.580
because every framework that we're going to be using

139
00:05:50.580 --> 00:05:52.590
to implement our web application

140
00:05:52.590 --> 00:05:56.040
or an ORM to talk to our database

141
00:05:56.040 --> 00:05:57.840
is going to handle it for us.

142
00:05:57.840 --> 00:06:00.240
And there has been tons of research

143
00:06:00.240 --> 00:06:03.900
and tons of best practices for those attacks,

144
00:06:03.900 --> 00:06:06.660
for SQL injection and cross-site scripting.

145
00:06:06.660 --> 00:06:07.493
And they continue,

146
00:06:07.493 --> 00:06:10.530
our observation is that coding agents perform well

147
00:06:10.530 --> 00:06:11.730
when the vulnerability class

148
00:06:11.730 --> 00:06:13.950
has well defined built-in protections.

149
00:06:13.950 --> 00:06:15.330
For SQL injections,

150
00:06:15.330 --> 00:06:18.240
agents consistently used parameterize queries,

151
00:06:18.240 --> 00:06:19.470
and this is the best practice

152
00:06:19.470 --> 00:06:21.780
when writing code for a database,

153
00:06:21.780 --> 00:06:24.540
resulting in secure database interactions,

154
00:06:24.540 --> 00:06:26.460
as can be seen in the following code.

155
00:06:26.460 --> 00:06:30.030
So here, we can see an example of where we get here

156
00:06:30.030 --> 00:06:33.090
the username from the user,

157
00:06:33.090 --> 00:06:37.680
and here we prepare a SQL statement.

158
00:06:37.680 --> 00:06:41.400
So here we're sending the SQL statement to the database,

159
00:06:41.400 --> 00:06:42.870
but we're not executing it,

160
00:06:42.870 --> 00:06:46.170
and here we can see we have a placeholder for the username.

161
00:06:46.170 --> 00:06:48.450
And when we want to execute this query,

162
00:06:48.450 --> 00:06:51.150
we send the name

163
00:06:51.150 --> 00:06:53.340
that we took from the user.

164
00:06:53.340 --> 00:06:56.460
Notice, we don't sanitize here the user input,

165
00:06:56.460 --> 00:06:59.550
but because we divided our query to first send the templates

166
00:06:59.550 --> 00:07:00.383
to the database

167
00:07:00.383 --> 00:07:02.730
and only then send the value from the user,

168
00:07:02.730 --> 00:07:05.610
then we protect ourselves from SQL injection

169
00:07:05.610 --> 00:07:08.070
because if we were to simply go

170
00:07:08.070 --> 00:07:09.750
and execute right from the get go

171
00:07:09.750 --> 00:07:10.980
instead of this placeholder

172
00:07:10.980 --> 00:07:13.800
to put here the unsanitized user input,

173
00:07:13.800 --> 00:07:17.010
everything here can be evaluated

174
00:07:17.010 --> 00:07:18.330
according to the user input,

175
00:07:18.330 --> 00:07:21.270
and this is going to be vulnerable to SQL injection,

176
00:07:21.270 --> 00:07:24.150
so the division here is actually what's going to protect us.

177
00:07:24.150 --> 00:07:26.430
And of course, if we're going to be using an ORM,

178
00:07:26.430 --> 00:07:28.560
it's going to do all the work for us here.

179
00:07:28.560 --> 00:07:29.880
With cross-site scripting,

180
00:07:29.880 --> 00:07:32.790
the agents' code often didn't sanitize input,

181
00:07:32.790 --> 00:07:35.310
but it used frontend frameworks properly,

182
00:07:35.310 --> 00:07:38.280
which prevented vulnerability from becoming exploitable.

183
00:07:38.280 --> 00:07:39.570
In the following example,

184
00:07:39.570 --> 00:07:43.680
Tenzai's agent identifies a potential XSS vulnerability

185
00:07:43.680 --> 00:07:47.760
where the API returns raw stored XSS payload,

186
00:07:47.760 --> 00:07:51.270
but determines that the issue is not currently exploitable

187
00:07:51.270 --> 00:07:53.370
because the frontend properly escapes it.

188
00:07:53.370 --> 00:07:54.660
And in this example,

189
00:07:54.660 --> 00:07:56.160
we can see we don't have

190
00:07:56.160 --> 00:07:58.800
any cross-site scripting vulnerabilities

191
00:07:58.800 --> 00:07:59.730
which are exploitable.

192
00:07:59.730 --> 00:08:00.930
So just to remind you,

193
00:08:00.930 --> 00:08:04.650
cross-site scripting is where an attacker

194
00:08:04.650 --> 00:08:09.000
is going to be able to run malicious code on our browser.

195
00:08:09.000 --> 00:08:13.140
So here, we can see that they actually managed

196
00:08:13.140 --> 00:08:17.700
to manipulate the server to return malicious code scripts

197
00:08:17.700 --> 00:08:19.530
to execute on the browser.

198
00:08:19.530 --> 00:08:22.440
However, the browser eventually did not run it

199
00:08:22.440 --> 00:08:25.560
because it was implemented by a framework

200
00:08:25.560 --> 00:08:28.980
which escaped the answer from the server.

201
00:08:28.980 --> 00:08:31.830
So instead of running a script,

202
00:08:31.830 --> 00:08:34.890
it escaped those characters here

203
00:08:34.890 --> 00:08:39.120
and those are simply shown as a string here.

204
00:08:39.120 --> 00:08:40.470
So in this example,

205
00:08:40.470 --> 00:08:44.370
we don't really have a vulnerable cross-site scripting here

206
00:08:44.370 --> 00:08:47.100
because we weren't able to run code in the browser.

207
00:08:47.100 --> 00:08:49.980
And they say while they might occasionally slip up,

208
00:08:49.980 --> 00:08:52.800
agents are more likely to avoid vulnerability classes

209
00:08:52.800 --> 00:08:55.650
that come with clear-cut dos don'ts.

210
00:08:55.650 --> 00:08:59.340
So this is SQL injection and XSS vulnerabilities.

211
00:08:59.340 --> 00:09:00.900
All right, let's talk about the bad.

212
00:09:00.900 --> 00:09:04.170
While coding agents did relatively well with vulnerabilities

213
00:09:04.170 --> 00:09:06.600
that have clear and generic solutions,

214
00:09:06.600 --> 00:09:09.510
they struggled with issues that didn't have one.

215
00:09:09.510 --> 00:09:11.610
Let's examine some common pitfalls,

216
00:09:11.610 --> 00:09:13.500
and here they talk about authorization.

217
00:09:13.500 --> 00:09:17.190
So let's recap on authentication and authorization.

218
00:09:17.190 --> 00:09:20.610
So authentication is the process of you verifying

219
00:09:20.610 --> 00:09:23.250
who the person who is doing stuff in the website.

220
00:09:23.250 --> 00:09:25.200
So this is the user logging in

221
00:09:25.200 --> 00:09:27.300
and you know who the user is.

222
00:09:27.300 --> 00:09:29.280
Authorization is the process

223
00:09:29.280 --> 00:09:31.590
where we are going to limit the things

224
00:09:31.590 --> 00:09:33.780
that the user can do depending on their role.

225
00:09:33.780 --> 00:09:35.280
And we often do it with something

226
00:09:35.280 --> 00:09:38.070
which is called RBAC, role-based access control.

227
00:09:38.070 --> 00:09:40.710
And here they go, coding agents did very poorly

228
00:09:40.710 --> 00:09:43.320
in terms of properly enforcing authorization.

229
00:09:43.320 --> 00:09:46.080
So this is limiting the users

230
00:09:46.080 --> 00:09:48.150
to do stuff or not to do stuff.

231
00:09:48.150 --> 00:09:50.820
They manage basic requirements reasonably well,

232
00:09:50.820 --> 00:09:52.590
but struggled significantly

233
00:09:52.590 --> 00:09:56.040
as authorization logic became more complex,

234
00:09:56.040 --> 00:10:00.030
despite clear and detailed guidance in our prompts.

235
00:10:00.030 --> 00:10:02.460
One of the most common issues we encounter

236
00:10:02.460 --> 00:10:06.270
was improper authorization when accessing APIs here.

237
00:10:06.270 --> 00:10:10.470
In one case, we had the agent create a shopping site.

238
00:10:10.470 --> 00:10:13.620
Codex introduced a critical authorization flaw,

239
00:10:13.620 --> 00:10:18.540
an order API checks if shopper are viewing their own orders,

240
00:10:18.540 --> 00:10:20.670
but completely skips the validation

241
00:10:20.670 --> 00:10:22.950
for users with any other role.

242
00:10:22.950 --> 00:10:26.820
As a result, users with a different role like seller

243
00:10:26.820 --> 00:10:29.400
can access any order in the system.

244
00:10:29.400 --> 00:10:32.190
So here, we can see the graphql query,

245
00:10:32.190 --> 00:10:34.500
and here we can see the authorization check.

246
00:10:34.500 --> 00:10:38.370
So it checks that if we have a shopper

247
00:10:38.370 --> 00:10:41.310
and that shopper doesn't have the same ID

248
00:10:41.310 --> 00:10:44.100
as the same ID in the request we want to see,

249
00:10:44.100 --> 00:10:45.540
so this means that a user

250
00:10:45.540 --> 00:10:47.700
is trying to see somebody else's information,

251
00:10:47.700 --> 00:10:49.530
we want to throw a forbidden error,

252
00:10:49.530 --> 00:10:50.830
but if not, we want to return the order.

253
00:10:50.830 --> 00:10:53.220
And of course, we have here a flaw

254
00:10:53.220 --> 00:10:56.100
because we have other roles except for user.

255
00:10:56.100 --> 00:10:58.980
So for example, if the role is going to be seller,

256
00:10:58.980 --> 00:11:00.780
so we are going to be skipping this

257
00:11:00.780 --> 00:11:02.310
and we're going to return the order.

258
00:11:02.310 --> 00:11:05.160
So in fact, all of the sellers has the capability

259
00:11:05.160 --> 00:11:07.710
to get all of the orders of all of the users.

260
00:11:07.710 --> 00:11:10.380
And of course, this is something we do not want to do,

261
00:11:10.380 --> 00:11:12.090
which is really, really bad here.

262
00:11:12.090 --> 00:11:14.940
Now, I don't know if this was a result

263
00:11:14.940 --> 00:11:16.950
of the one-shot prompt it did

264
00:11:16.950 --> 00:11:19.860
or they asked to implement authorization.

265
00:11:19.860 --> 00:11:21.630
However, in my opinion,

266
00:11:21.630 --> 00:11:23.490
when we use agentic coding tools,

267
00:11:23.490 --> 00:11:24.990
those are the kind of things

268
00:11:24.990 --> 00:11:26.910
we have a shared responsibilities.

269
00:11:26.910 --> 00:11:28.830
Sorry, it's not shared responsibility,

270
00:11:28.830 --> 00:11:31.770
it's the responsibility of us as developers

271
00:11:31.770 --> 00:11:34.410
to make sure that the authorization process

272
00:11:34.410 --> 00:11:35.610
is handled properly.

273
00:11:35.610 --> 00:11:36.780
So to be honest,

274
00:11:36.780 --> 00:11:39.960
we can't really expect the coding agents

275
00:11:39.960 --> 00:11:41.460
to perform this well for us.

276
00:11:41.460 --> 00:11:44.370
And while, of course, it would be great for coding agents

277
00:11:44.370 --> 00:11:46.590
to implement everything properly,

278
00:11:46.590 --> 00:11:49.110
but I'm not expecting them to handle my security.

279
00:11:49.110 --> 00:11:50.700
When it comes to security,

280
00:11:50.700 --> 00:11:53.250
I as a developer want all the control,

281
00:11:53.250 --> 00:11:55.080
and I want to tell the coding agents

282
00:11:55.080 --> 00:11:57.030
what to implement as far security

283
00:11:57.030 --> 00:11:58.620
and what not to implement.

284
00:11:58.620 --> 00:11:59.970
In another case,

285
00:11:59.970 --> 00:12:03.600
Claude mistakenly allowed unauthenticated access

286
00:12:03.600 --> 00:12:05.730
to an order deletion API.

287
00:12:05.730 --> 00:12:08.100
If the requesting user is authenticated,

288
00:12:08.100 --> 00:12:10.590
the code performed an ownership test.

289
00:12:10.590 --> 00:12:13.950
But if a request was unauthenticated,

290
00:12:13.950 --> 00:12:15.240
the test was skipped

291
00:12:15.240 --> 00:12:16.568
and the file was deleted.

292
00:12:16.568 --> 00:12:18.750
(laughs) So this is actually very funny.

293
00:12:18.750 --> 00:12:20.790
So here, they check that there is a user

294
00:12:20.790 --> 00:12:21.927
which is logged in,

295
00:12:21.927 --> 00:12:23.910
and here they limit the division

296
00:12:23.910 --> 00:12:27.270
only for admins and for sellers,

297
00:12:27.270 --> 00:12:29.550
of course, only on their product.

298
00:12:29.550 --> 00:12:30.450
And if it's not,

299
00:12:30.450 --> 00:12:32.550
they're going to return failed to delete

300
00:12:32.550 --> 00:12:34.110
and then they go to delete it.

301
00:12:34.110 --> 00:12:36.150
(laughs) Yeah, but this check here,

302
00:12:36.150 --> 00:12:39.600
if user then this means that the user is authenticated.

303
00:12:39.600 --> 00:12:42.330
So basically if we have an unauthenticated user,

304
00:12:42.330 --> 00:12:43.230
they can do anything,

305
00:12:43.230 --> 00:12:46.920
so this is like having a door with a lock

306
00:12:46.920 --> 00:12:49.380
but leaving the door open.

307
00:12:49.380 --> 00:12:50.213
Very funny.

308
00:12:50.213 --> 00:12:52.860
So Tenzai's agent identified this vulnerability

309
00:12:52.860 --> 00:12:55.170
by methodically testing different APIs.

310
00:12:55.170 --> 00:12:58.830
So here they can see testing it with tokens

311
00:12:58.830 --> 00:13:00.300
and without tokens.

312
00:13:00.300 --> 00:13:02.220
Eventually they managed to delete it.

313
00:13:02.220 --> 00:13:03.840
And while the root cause varied,

314
00:13:03.840 --> 00:13:05.280
the pattern is consistent,

315
00:13:05.280 --> 00:13:06.990
coding agents frequently introduce

316
00:13:06.990 --> 00:13:08.670
authorization vulnerabilities.

317
00:13:08.670 --> 00:13:10.530
So yeah, just to summarize,

318
00:13:10.530 --> 00:13:12.570
we want to handle this ourself.

319
00:13:12.570 --> 00:13:14.520
We want to handle authorization

320
00:13:14.520 --> 00:13:16.170
and we want this full control,

321
00:13:16.170 --> 00:13:19.530
so we can't really expect the agent to do our security.

322
00:13:19.530 --> 00:13:20.970
Okay, let's continue.

323
00:13:20.970 --> 00:13:22.860
Business logic vulnerabilities.

324
00:13:22.860 --> 00:13:25.947
Agents seem very prone to business logic vulnerabilities.

325
00:13:25.947 --> 00:13:27.420
And this actually makes sense

326
00:13:27.420 --> 00:13:29.880
because they don't know all the business logic

327
00:13:29.880 --> 00:13:31.980
and they don't have it in the context.

328
00:13:31.980 --> 00:13:35.640
And of course, if we are going to one shot an application,

329
00:13:35.640 --> 00:13:39.690
they're going to give us lots of business logic issues.

330
00:13:39.690 --> 00:13:43.440
So again, we want this control to tell exactly the agents

331
00:13:43.440 --> 00:13:44.670
what to do and what not to do.

332
00:13:44.670 --> 00:13:47.610
While human developers bring intuitive understanding

333
00:13:47.610 --> 00:13:50.700
that helps them grasp how workflows should operate,

334
00:13:50.700 --> 00:13:52.500
agents lack these common sense

335
00:13:52.500 --> 00:13:55.170
and depend mainly on explicit instructions.

336
00:13:55.170 --> 00:13:57.720
So this is something we all know from LLMs

337
00:13:57.720 --> 00:13:58.980
and agents in general.

338
00:13:58.980 --> 00:14:01.320
With sufficiently detailed specification,

339
00:14:01.320 --> 00:14:03.810
agents can easily overlook important nuances.

340
00:14:03.810 --> 00:14:06.060
For example, when we didn't specify

341
00:14:06.060 --> 00:14:10.620
that the quantity of items in a shop order must be positive,

342
00:14:10.620 --> 00:14:14.310
four out of five agents did not verify it

343
00:14:14.310 --> 00:14:18.090
and allow attackers to create orders with negative total.

344
00:14:18.090 --> 00:14:20.520
So here we can see an example of a shopping cart

345
00:14:20.520 --> 00:14:24.270
where we have the quantity which is not positive here.

346
00:14:24.270 --> 00:14:26.850
Yeah, so of course, we should have some validation here,

347
00:14:26.850 --> 00:14:29.550
which this kind of validation is trivial,

348
00:14:29.550 --> 00:14:31.290
and I'm pretty sure that every developer

349
00:14:31.290 --> 00:14:33.750
which is going to implement something like this

350
00:14:33.750 --> 00:14:35.670
is going to be making a check

351
00:14:35.670 --> 00:14:37.590
that the number is going to be positive,

352
00:14:37.590 --> 00:14:38.790
it's going to be an integer.

353
00:14:38.790 --> 00:14:40.560
And again, I have to say

354
00:14:40.560 --> 00:14:44.280
that I am actually quite surprised on this finding

355
00:14:44.280 --> 00:14:47.820
because those LLMs which the agents are using

356
00:14:47.820 --> 00:14:49.680
are based on code

357
00:14:49.680 --> 00:14:51.720
and probably the code comes from GitHub,

358
00:14:51.720 --> 00:14:53.370
from open source projects.

359
00:14:53.370 --> 00:14:57.330
And I'm pretty sure that for most open source projects

360
00:14:57.330 --> 00:14:59.370
and the code that it was trained on,

361
00:14:59.370 --> 00:15:00.330
this kinds of issue

362
00:15:00.330 --> 00:15:04.950
to make sure that a quantity is going to be an integer

363
00:15:04.950 --> 00:15:08.640
is something which is pretty obvious.

364
00:15:08.640 --> 00:15:10.650
And I don't think there is a lot of code out there

365
00:15:10.650 --> 00:15:14.160
with those kinds of bugs and vulnerability.

366
00:15:14.160 --> 00:15:15.270
Let's continue.

367
00:15:15.270 --> 00:15:17.760
So similarly, three out of five agents

368
00:15:17.760 --> 00:15:20.880
allowed product to be created with a negative price.

369
00:15:20.880 --> 00:15:22.950
Looking at Replit's implementation,

370
00:15:22.950 --> 00:15:26.400
we can see that the API responsible for the product creation

371
00:15:26.400 --> 00:15:28.470
takes the price directly from the user input

372
00:15:28.470 --> 00:15:30.090
without any validation.

373
00:15:30.090 --> 00:15:31.620
So yeah, here we can see,

374
00:15:31.620 --> 00:15:33.120
it simply takes the input

375
00:15:33.120 --> 00:15:35.640
and simply makes the SQL query,

376
00:15:35.640 --> 00:15:37.320
simply plugs in the volume

377
00:15:37.320 --> 00:15:39.510
without checking, sanitizing it.

378
00:15:39.510 --> 00:15:41.730
And to be honest, what's surprising me here

379
00:15:41.730 --> 00:15:44.820
is that in order to check that the price

380
00:15:44.820 --> 00:15:46.740
of an item is not negative,

381
00:15:46.740 --> 00:15:49.080
then we have two places to do it,

382
00:15:49.080 --> 00:15:50.700
so we need to do it in the frontend

383
00:15:50.700 --> 00:15:52.590
and we need to do it in the backend.

384
00:15:52.590 --> 00:15:55.830
So in both cases, the agent blew it.

385
00:15:55.830 --> 00:15:58.230
Tenzai's agent identified this vulnerability

386
00:15:58.230 --> 00:15:59.640
through static code analysis

387
00:15:59.640 --> 00:16:01.440
and then dynamically validated it.

388
00:16:01.440 --> 00:16:03.510
So here we can see they making the request

389
00:16:03.510 --> 00:16:06.030
of injection negative number here.

390
00:16:06.030 --> 00:16:07.620
So this is actually very cool to see

391
00:16:07.620 --> 00:16:10.530
because here Tenzai is also doing static code analysis,

392
00:16:10.530 --> 00:16:12.420
so they're reviewing the code

393
00:16:12.420 --> 00:16:13.710
and then based on that,

394
00:16:13.710 --> 00:16:16.170
they're going to do the penetration testing,

395
00:16:16.170 --> 00:16:18.300
so interesting thing.

396
00:16:18.300 --> 00:16:20.010
These are relative simple examples,

397
00:16:20.010 --> 00:16:23.040
yet nearly all agents failed to implement them correctly.

398
00:16:23.040 --> 00:16:26.190
In more complex scenarios involving nuanced business logic,

399
00:16:26.190 --> 00:16:28.470
this pattern will likely worsen.

400
00:16:28.470 --> 00:16:30.360
All right, now let's talk about

401
00:16:30.360 --> 00:16:32.490
unsolved vulnerability classes.

402
00:16:32.490 --> 00:16:33.810
As aforementioned,

403
00:16:33.810 --> 00:16:36.990
coding agents handle solve vulnerabilities pretty well

404
00:16:36.990 --> 00:16:39.780
like SQL injection or cross-site scripting

405
00:16:39.780 --> 00:16:43.020
where frameworks provide robust built-in protections.

406
00:16:43.020 --> 00:16:44.430
With injection attacks,

407
00:16:44.430 --> 00:16:47.220
the boundary between safe and vulnerable is clear:

408
00:16:47.220 --> 00:16:49.530
data should never be evaluated as code,

409
00:16:49.530 --> 00:16:51.060
and we saw an example of it,

410
00:16:51.060 --> 00:16:53.670
and the clear boundary enables generic solutions

411
00:16:53.670 --> 00:16:55.800
that prevent vulnerabilities in more scenarios.

412
00:16:55.800 --> 00:16:58.650
Now this picture changes dramatically

413
00:16:58.650 --> 00:17:01.560
with unsolved vulnerability classes,

414
00:17:01.560 --> 00:17:03.510
where that clear boundary dissolved.

415
00:17:03.510 --> 00:17:06.300
And here they're talking about SSRF,

416
00:17:06.300 --> 00:17:07.950
server-side request forgery,

417
00:17:07.950 --> 00:17:10.410
and SSRF is when the server

418
00:17:10.410 --> 00:17:12.420
is going to be making the requests

419
00:17:12.420 --> 00:17:14.520
to places it shouldn't make requests to,

420
00:17:14.520 --> 00:17:17.100
and this was not the intention of the server developer.

421
00:17:17.100 --> 00:17:19.020
There is no universal rule

422
00:17:19.020 --> 00:17:21.660
for distinguishing legitimate URL fetches

423
00:17:21.660 --> 00:17:23.130
from malicious ones.

424
00:17:23.130 --> 00:17:25.200
The line between safe and dangerous

425
00:17:25.200 --> 00:17:27.420
depends heavily on the context,

426
00:17:27.420 --> 00:17:29.910
making generic solutions impossible.

427
00:17:29.910 --> 00:17:32.940
And to test the agent's handle this type of vulnerability,

428
00:17:32.940 --> 00:17:35.970
we included an SSRF pitfall

429
00:17:35.970 --> 00:17:37.380
in one of the application,

430
00:17:37.380 --> 00:17:41.370
a link preview feature that fetches user-provided URLs.

431
00:17:41.370 --> 00:17:42.900
And this is a pitfall

432
00:17:42.900 --> 00:17:45.930
because in order to show that preview,

433
00:17:45.930 --> 00:17:49.080
we need our server to make a request to the URL,

434
00:17:49.080 --> 00:17:52.740
and the URL is something which is given by the user.

435
00:17:52.740 --> 00:17:55.290
So if the user is going to put here something malicious,

436
00:17:55.290 --> 00:17:58.500
we are going to be vulnerable to SSRF.

437
00:17:58.500 --> 00:18:01.890
We gave the agents no security guidelines whatsoever.

438
00:18:01.890 --> 00:18:03.900
The result was unanimous,

439
00:18:03.900 --> 00:18:06.960
all five agents introduced an SSRF vulnerability,

440
00:18:06.960 --> 00:18:10.440
allowing attackers to invoke requests to arbitrary URLs.

441
00:18:10.440 --> 00:18:12.990
Tenzai's agent identified the missing filter

442
00:18:12.990 --> 00:18:16.110
and created the PoC Python script to confirm exploitation

443
00:18:16.110 --> 00:18:17.790
by mapping internal services.

444
00:18:17.790 --> 00:18:19.110
So here in this vulnerability,

445
00:18:19.110 --> 00:18:22.170
the malicious script is going to enumerate the network,

446
00:18:22.170 --> 00:18:23.550
and of course it can do more

447
00:18:23.550 --> 00:18:26.160
because we basically have here remote code execution.

448
00:18:26.160 --> 00:18:29.430
Ask an agent explicitly to implement an allowlist,

449
00:18:29.430 --> 00:18:32.550
and it's likely to succeed and prevent SSRF.

450
00:18:32.550 --> 00:18:35.730
But leave the security approach to the agent's discretion

451
00:18:35.730 --> 00:18:38.070
when no known solution exists,

452
00:18:38.070 --> 00:18:39.930
and it will almost certainly fail.

453
00:18:39.930 --> 00:18:43.710
So again, this feature of the preview,

454
00:18:43.710 --> 00:18:45.300
to be honest for me it's pretty trivial

455
00:18:45.300 --> 00:18:46.980
that it's not going to come safe,

456
00:18:46.980 --> 00:18:49.050
for me seeing this feature

457
00:18:49.050 --> 00:18:51.060
that they're going to request the agent.

458
00:18:51.060 --> 00:18:53.100
And this can be solved, of course,

459
00:18:53.100 --> 00:18:56.370
of simply prompting it again and handling this

460
00:18:56.370 --> 00:18:57.900
allow this mechanism

461
00:18:57.900 --> 00:18:59.100
or to make it secure.

462
00:18:59.100 --> 00:19:01.800
So I'm pretty sure if you're a developer

463
00:19:01.800 --> 00:19:04.260
which are aware of those attacks,

464
00:19:04.260 --> 00:19:06.180
which is super, super important by the way,

465
00:19:06.180 --> 00:19:08.310
so even when using coding agents,

466
00:19:08.310 --> 00:19:11.760
you as a developer will not let this happen.

467
00:19:11.760 --> 00:19:13.050
I know at least I won't.

468
00:19:13.050 --> 00:19:14.730
All right, let's talk about the ugly,

469
00:19:14.730 --> 00:19:17.100
and the most concerning findings from the research

470
00:19:17.100 --> 00:19:20.910
wasn't the vulnerabilities in code that agents wrote,

471
00:19:20.910 --> 00:19:23.460
but ones that were introduced by code

472
00:19:23.460 --> 00:19:25.080
the agents didn't write.

473
00:19:25.080 --> 00:19:28.170
All of the agents across every test they performed

474
00:19:28.170 --> 00:19:32.490
failed miserably when it came to security controls.

475
00:19:32.490 --> 00:19:35.670
It wasn't that they implemented them incorrectly,

476
00:19:35.670 --> 00:19:37.890
in almost all cases they didn't.

477
00:19:37.890 --> 00:19:39.300
So all of the coding agents,

478
00:19:39.300 --> 00:19:42.540
they didn't implement CSRF protection by default,

479
00:19:42.540 --> 00:19:46.170
and CSRF is cross-site request forgery.

480
00:19:46.170 --> 00:19:50.520
And this is basically when we are logged into our bank,

481
00:19:50.520 --> 00:19:51.720
for example,

482
00:19:51.720 --> 00:19:54.150
and we then visit a malicious website.

483
00:19:54.150 --> 00:19:55.710
And the malicious website

484
00:19:55.710 --> 00:19:58.170
is going to run code in our browser,

485
00:19:58.170 --> 00:20:03.170
which is going to be sending requests to APIs

486
00:20:03.390 --> 00:20:04.920
with our credentials,

487
00:20:04.920 --> 00:20:07.530
for example, to our bank where we were logged in.

488
00:20:07.530 --> 00:20:08.820
So this is very, very bad

489
00:20:08.820 --> 00:20:11.040
and it can be enforced in the server,

490
00:20:11.040 --> 00:20:14.100
so the frameworks didn't go and implement it by default.

491
00:20:14.100 --> 00:20:17.040
Also, security headers, they didn't implement,

492
00:20:17.040 --> 00:20:19.560
they didn't implement login rate limiting.

493
00:20:19.560 --> 00:20:21.240
Except for one case,

494
00:20:21.240 --> 00:20:23.280
every application included a login page

495
00:20:23.280 --> 00:20:26.580
with zero rate limiting or account lockout mechanisms,

496
00:20:26.580 --> 00:20:28.950
enabled password bruteforce attacks.

497
00:20:28.950 --> 00:20:31.320
And rate limiting is very, very important,

498
00:20:31.320 --> 00:20:33.270
and again, I think this is something

499
00:20:33.270 --> 00:20:37.980
we as the developers needs to explicitly say to the agent.

500
00:20:37.980 --> 00:20:40.380
And when I'm working with coding agents,

501
00:20:40.380 --> 00:20:43.200
I am not expecting them to do my security.

502
00:20:43.200 --> 00:20:46.680
And implementing rate limiting in every page,

503
00:20:46.680 --> 00:20:47.640
not online in login,

504
00:20:47.640 --> 00:20:50.280
this is something I must do as a developer.

505
00:20:50.280 --> 00:20:52.620
Of course, I can use coding agents to help me do it,

506
00:20:52.620 --> 00:20:54.870
but again, thinking about those things

507
00:20:54.870 --> 00:20:57.750
is really the responsibility of the developers here.

508
00:20:57.750 --> 00:20:58.920
So just to reiterate,

509
00:20:58.920 --> 00:21:00.480
I just want to say that

510
00:21:00.480 --> 00:21:03.060
of course what we're seeing is concerning,

511
00:21:03.060 --> 00:21:07.170
but I really believe that the responsibility of security

512
00:21:07.170 --> 00:21:08.910
is on the developer,

513
00:21:08.910 --> 00:21:11.910
not the agent, not the LLM,

514
00:21:11.910 --> 00:21:13.110
only the developer.

515
00:21:13.110 --> 00:21:14.490
And this is really the difference

516
00:21:14.490 --> 00:21:18.870
between outsourcing our reasoning to the LLM.

517
00:21:18.870 --> 00:21:21.120
I'm not expecting the LLM to do all of that.

518
00:21:21.120 --> 00:21:24.480
I'm not expecting the LLM to do the thinking instead of me.

519
00:21:24.480 --> 00:21:29.010
I'm using the LLM and the agents to help me code faster,

520
00:21:29.010 --> 00:21:31.710
but I am really not leaving

521
00:21:31.710 --> 00:21:34.200
all of those important things for the LLM to do,

522
00:21:34.200 --> 00:21:36.690
especially without me seeing everything

523
00:21:36.690 --> 00:21:38.730
that they wrote and validating it.

524
00:21:38.730 --> 00:21:41.040
All right, let's go and continue.

525
00:21:41.040 --> 00:21:42.540
In another example,

526
00:21:42.540 --> 00:21:43.950
in a single use case Claude Code

527
00:21:43.950 --> 00:21:45.660
actually implement rate-limiting,

528
00:21:45.660 --> 00:21:48.810
Tenzai's agents quickly realized that it was flawed

529
00:21:48.810 --> 00:21:51.780
and could bypass using X-Forwarded-For header.

530
00:21:51.780 --> 00:21:53.760
So here you can see in this example

531
00:21:53.760 --> 00:21:56.370
that we are first being rate limiting

532
00:21:56.370 --> 00:22:00.060
and then they simply added the X-Forward header

533
00:22:00.060 --> 00:22:04.200
and they're simply making the request from another IP,

534
00:22:04.200 --> 00:22:06.870
and boom, it's going to be,

535
00:22:06.870 --> 00:22:09.720
and boom, they bypass this rate-limiting here.

536
00:22:09.720 --> 00:22:10.950
And the pattern is clear:

537
00:22:10.950 --> 00:22:14.670
coding agents built what they are explicitly asked for,

538
00:22:14.670 --> 00:22:16.590
often in reasonably secure ways,

539
00:22:16.590 --> 00:22:19.590
but completely failed to grasp the bigger picture.

540
00:22:19.590 --> 00:22:21.480
They lack the security mindsets

541
00:22:21.480 --> 00:22:24.090
to proactively introduce defensive mechanisms

542
00:22:24.090 --> 00:22:26.010
that weren't explicitly requested.

543
00:22:26.010 --> 00:22:28.443
So yeah, it looks like they agree with me,

544
00:22:29.520 --> 00:22:31.530
and those are the states.

545
00:22:31.530 --> 00:22:33.510
And I think this blog is important

546
00:22:33.510 --> 00:22:37.290
because it shows you the state of the security quality

547
00:22:37.290 --> 00:22:39.360
of coding agents today.

548
00:22:39.360 --> 00:22:41.700
Pretty sure that one year from now,

549
00:22:41.700 --> 00:22:43.080
maybe two or three years from now,

550
00:22:43.080 --> 00:22:44.700
things are going to get better.

551
00:22:44.700 --> 00:22:46.080
But in my opinion,

552
00:22:46.080 --> 00:22:48.420
always the responsibility of security

553
00:22:48.420 --> 00:22:50.130
is going to be on the developer.

554
00:22:50.130 --> 00:22:52.020
And they say the winner is?

555
00:22:52.020 --> 00:22:53.430
After gathering all the results,

556
00:22:53.430 --> 00:22:56.280
we compared the number of exploitable vulnerabilities.

557
00:22:56.280 --> 00:23:00.390
And here we can see Claude Code has 16,

558
00:23:00.390 --> 00:23:02.490
Devin has 14,

559
00:23:02.490 --> 00:23:04.530
and the list goes on and on.

560
00:23:04.530 --> 00:23:06.630
And there isn't really a winner here,

561
00:23:06.630 --> 00:23:09.450
all the coding agents introduced vulnerabilities,

562
00:23:09.450 --> 00:23:11.040
which we do not want.

563
00:23:11.040 --> 00:23:11.910
So as you can see,

564
00:23:11.910 --> 00:23:14.460
all agents introduced significant amount of vulnerabilities

565
00:23:14.460 --> 00:23:16.200
across different applications.

566
00:23:16.200 --> 00:23:17.550
Based on our results,

567
00:23:17.550 --> 00:23:18.810
consistent with the findings

568
00:23:18.810 --> 00:23:20.730
from the broader security research community,

569
00:23:20.730 --> 00:23:22.680
as of today, it doesn't really matter

570
00:23:22.680 --> 00:23:23.790
which agent we're using,

571
00:23:23.790 --> 00:23:27.030
vulnerabilities are almost certainly going to be introduced

572
00:23:27.030 --> 00:23:30.030
if we're going to be using them without cautious.

573
00:23:30.030 --> 00:23:31.290
So this raises the question,

574
00:23:31.290 --> 00:23:33.540
what can developers do to improve security

575
00:23:33.540 --> 00:23:35.550
in their AI-generated code?

576
00:23:35.550 --> 00:23:37.680
Just do your own security.

577
00:23:37.680 --> 00:23:39.990
And here they talk about vibing secure code.

578
00:23:39.990 --> 00:23:41.940
The first option that might come in mind

579
00:23:41.940 --> 00:23:44.460
would be to target the prompt itself.

580
00:23:44.460 --> 00:23:45.960
Can we refine the instruction

581
00:23:45.960 --> 00:23:48.273
to make agents more security-aware?

582
00:23:49.350 --> 00:23:52.530
So this is basically saying write the implementation

583
00:23:52.530 --> 00:23:53.670
in a secure way,

584
00:23:53.670 --> 00:23:55.980
maybe to help a bit in the prompt here,

585
00:23:55.980 --> 00:23:58.320
and it's not going to help us enough.

586
00:23:58.320 --> 00:24:01.710
And here they talk about a study comparing several methods:

587
00:24:01.710 --> 00:24:03.630
generic security instructions,

588
00:24:03.630 --> 00:24:05.730
having the LLM to identify security risks

589
00:24:05.730 --> 00:24:07.140
before implementation,

590
00:24:07.140 --> 00:24:08.580
and even explicit directions

591
00:24:08.580 --> 00:24:10.530
to avoid specific vulnerability types.

592
00:24:10.530 --> 00:24:14.010
Surprisingly, none of these techniques proved effective

593
00:24:14.010 --> 00:24:16.800
at meaningfully reducing vulnerabilities.

594
00:24:16.800 --> 00:24:18.930
Based on our testing and recent research,

595
00:24:18.930 --> 00:24:22.680
no comprehensive solution to this issue currently exists.

596
00:24:22.680 --> 00:24:25.020
This makes it critical for developers

597
00:24:25.020 --> 00:24:28.110
to understand the common pitfalls of coding agents

598
00:24:28.110 --> 00:24:29.640
and prepare accordingly,

599
00:24:29.640 --> 00:24:32.730
and for developers to do their own security, of course.

600
00:24:32.730 --> 00:24:34.590
As models change so rapidly,

601
00:24:34.590 --> 00:24:36.570
our precise results may be outdated

602
00:24:36.570 --> 00:24:38.220
by the time you finish reading this.

603
00:24:38.220 --> 00:24:41.940
Despite that, the key lessons from the experience remain.

604
00:24:41.940 --> 00:24:44.070
Coding agents cannot be trusted

605
00:24:44.070 --> 00:24:46.260
to design secure applications.

606
00:24:46.260 --> 00:24:50.010
While they may produce secure code some of the time,

607
00:24:50.010 --> 00:24:52.110
agents consistently fail to implement

608
00:24:52.110 --> 00:24:55.380
critical security controls without explicit guidance.

609
00:24:55.380 --> 00:24:58.800
Don't expect your coding agents to implement CSRF protection

610
00:24:58.800 --> 00:25:00.960
unless you explicitly ask for it,

611
00:25:00.960 --> 00:25:02.490
exactly what I told you.

612
00:25:02.490 --> 00:25:04.410
Don't be surprised if they leave out

613
00:25:04.410 --> 00:25:06.390
critical vulnerability headers.

614
00:25:06.390 --> 00:25:09.240
When clear guardrails exist, agents deliver.

615
00:25:09.240 --> 00:25:11.730
If there's well-established definition of secure

616
00:25:11.730 --> 00:25:14.430
versus insecure baked in the framework,

617
00:25:14.430 --> 00:25:16.920
agents tend to get it right.

618
00:25:16.920 --> 00:25:18.750
Vulnerabilities with clear solutions

619
00:25:18.750 --> 00:25:21.570
like SQL injection and cross-site scripting

620
00:25:21.570 --> 00:25:24.810
are less likely to appear in your vibe-coded app.

621
00:25:24.810 --> 00:25:27.540
But in ambiguous context, they falter.

622
00:25:27.540 --> 00:25:29.730
Where boundaries are not clear-cut,

623
00:25:29.730 --> 00:25:32.250
business logic workflows, authorization rules,

624
00:25:32.250 --> 00:25:34.680
and other nuance security decisions,

625
00:25:34.680 --> 00:25:36.600
agents will make mistakes.

626
00:25:36.600 --> 00:25:37.920
Unlike syntax errors,

627
00:25:37.920 --> 00:25:41.070
these judgment calls lack standard tests.

628
00:25:41.070 --> 00:25:45.240
Yeah, because when we vibe code an application,

629
00:25:45.240 --> 00:25:48.210
we probably not going to have some security tests for it

630
00:25:48.210 --> 00:25:49.950
that are going to run automatically.

631
00:25:49.950 --> 00:25:51.750
While we may have linters

632
00:25:51.750 --> 00:25:54.000
and when we run the application we can see it would break,

633
00:25:54.000 --> 00:25:55.950
with security, this is not the case

634
00:25:55.950 --> 00:25:59.160
so agents don't really validate and verify security.

635
00:25:59.160 --> 00:26:02.700
And here they say the most effective approach is testing,

636
00:26:02.700 --> 00:26:03.660
promoting their product.

637
00:26:03.660 --> 00:26:06.690
Like humans developers, agents will always make mistakes.

638
00:26:06.690 --> 00:26:08.670
Even models improve at coding,

639
00:26:08.670 --> 00:26:10.050
vulnerabilities will persist.

640
00:26:10.050 --> 00:26:12.660
As they accelerate development velocity,

641
00:26:12.660 --> 00:26:14.670
the volume of introduced vulnerability

642
00:26:14.670 --> 00:26:15.990
will grow proportionally,

643
00:26:15.990 --> 00:26:18.810
quickly overwhelming traditional testing approaches.

644
00:26:18.810 --> 00:26:20.880
While AI agents may introduce vulnerabilities,

645
00:26:20.880 --> 00:26:22.770
they also excel identifying them.

646
00:26:22.770 --> 00:26:24.600
And this is the Tenzai's product,

647
00:26:24.600 --> 00:26:27.150
their agentic red teaming.

648
00:26:27.150 --> 00:26:29.610
To keep pace with AI-accelerated code development,

649
00:26:29.610 --> 00:26:31.650
organizations need paradigm shift,

650
00:26:31.650 --> 00:26:35.880
deploy AI agents not only to target code, but to secure it.

651
00:26:35.880 --> 00:26:38.610
So here we can see they're plugging in their product,

652
00:26:38.610 --> 00:26:41.820
which at least from this blog looks really, really solid

653
00:26:41.820 --> 00:26:43.290
and looks like it's very capable

654
00:26:43.290 --> 00:26:45.420
of detecting important vulnerabilities.

655
00:26:45.420 --> 00:26:46.950
And here they say the same technology

656
00:26:46.950 --> 00:26:48.270
creating security risks

657
00:26:48.270 --> 00:26:51.900
can be your most powerful defense against them.

658
00:26:51.900 --> 00:26:53.130
I do agree with them.

659
00:26:53.130 --> 00:26:57.600
I think it's super important to have security testing,

660
00:26:57.600 --> 00:26:59.580
of course, when it's automatically,

661
00:26:59.580 --> 00:27:02.850
but I really think we should shift left all the security.

662
00:27:02.850 --> 00:27:06.300
So we should catch everything when we code it.

663
00:27:06.300 --> 00:27:09.720
And catching it when it's already deployed,

664
00:27:09.720 --> 00:27:11.880
it's way too late in my opinion.

665
00:27:11.880 --> 00:27:13.860
And I think all of the issues that we saw,

666
00:27:13.860 --> 00:27:15.000
all of the vulnerabilities,

667
00:27:15.000 --> 00:27:16.320
they can be addressed

668
00:27:16.320 --> 00:27:19.560
when we are going to be responsible as developers

669
00:27:19.560 --> 00:27:21.600
on the security of the application,

670
00:27:21.600 --> 00:27:25.140
so the responsibility is on the developer.

671
00:27:25.140 --> 00:27:26.790
So that's pretty much it.

672
00:27:26.790 --> 00:27:28.800
This was the blog

673
00:27:28.800 --> 00:27:33.800
and discussing the security quality of AI coding agents,

674
00:27:33.960 --> 00:27:35.023
hope you enjoyed it.