Ian Webster & Joel de la Garza
Why social engineering now works on machines
You can watch this conversation on YouTube here.
Setting the Stage: Understanding AI Agents
Joel de la Garza: Today I’m pleased to be speaking with Ian Webster, the founder and CEO of Promptfoo, an AI agent testing company that focuses on security. This whole series of conversations we’ve been having with founders has been really focused on what we’re seeing in the market, which is that every corporate customer, every corporate CIO, every corporate CTO we talk to has some sort of agentic thing being built. It could be a customer service agent that a large airline is building, or it could be a gaming company building something to replace the NPCs. There’s all sorts of stuff happening, and it’s great to have you on because we know that you are an expert in agent and agentic security, having done a ton of work in this space.
I would love to maybe just get your really quick take on what you think an agent is and how people are thinking about agents at the current time, to maybe frame the discussion.
Ian Webster: I think the way that I would start out, very simply, an agent is what you get when you have an LLM and allow it to take actions. So if you’re hooking up APIs to it or anything where it can interact with the outside world. In terms of why that’s important and where it’s going, at Promptfoo we work with some of the largest companies in the world, Fortune tens, Fortune fifties.
What I have been hearing very consistently is that this year was all about spinning up AI and some of those initial use cases around internal chatbots and RAG, but without fail, everyone on their roadmap has plans to start hooking it up with Salesforce or with other internal systems. That’s the plan for next year. So I definitely think I’m on board with 2026 being the year of the agent in the sense that that’s what we keep hearing whenever we work with folks on the corporate side.
Joel: Well, and you’ve been incredibly busy, and it was hard to schedule this podcast, so I assume that’s an indicator that people are really engaged in this sort of stuff.
Ian: Yeah, we are working hard for sure. We saw a big step up at the beginning of this year as some of the AI initiatives and budgets kicked in. I think we’re gonna see another, probably even bigger step next year just in terms of the amount of activity, as well as what needs to be tested and secured.
The Security Assessment Cycle: History Rhyming
Joel: You know, it’s one of those popular clichés that unfortunately probably applies here, which is that history never repeats, it just rhymes. The history of enterprise applications, having yourself grown up in the enterprise as well and built a number of these things, has always been that you build the app. There’s a process around product management and building the app and engineering, and then it’s about to be promoted into production, and always at the end of that journey is some sort of security assessment of the app to make sure that it doesn’t do something incredibly horrible. Your tool is the new security assessment for that app or that agent before it goes to production. Is that correct? Is that how it’s working in the enterprise?
Ian: Yes, and there’s a bunch of stuff that I would add. For folks who are not already familiar with Promptfoo, it started as an open source tool. It’s used by hundreds of thousands of developers to run evaluations and security tests on gen AI. Where it falls in the development cycle really depends a lot on how you’re using it and who you are. You’re absolutely right: in many cases, unfortunately, security is an afterthought. You build this cool thing, this new kickass agent, and then you think, crap, I cannot bring it to production, or I won’t get sign off for it unless I do this sort of testing. That was a lot of what we saw in the first half of this year where a lot of security teams were scrambling to catch up with these AI initiatives and initial prototypes.
Joel: The start of every new platform cycle, security always lives at the end because everybody’s going too fast to think about security.
Ian: Yeah, so history is rhyming again. But I think what I’m getting at is that that’s definitely not the ideal scenario. What we want is to give developers the tools that they need in order to actively test and get feedback and build secure systems even before they land their code. One of the reasons why Promptfoo is doing really well is because it started as a developer tool and has the developer-friendly CLI, just very easy to integrate. That’s the direction that we want to head.
We’ve done a lot more recently in terms of how do we make it really easy to embed in CI/CD? How do we do code analysis and give feedback on agent security and other relevant LLM security topics in PRs? Ultimately, how do we bring all that intelligence to the developer in their IDE? So that’s my two cents on just where we land in the cycle and where AI security actually should be.
Joel: Yeah. Well, we’re speed running this cycle. So what took cloud ten years will probably take AI one year.
Ian: Well only time will tell.
The Lethal Trifecta: Understanding Agent Vulnerabilities
Joel: I’m really curious. Every technology platform has the classic list of things you need to check before you promote it. Cloud was always that you’ve made a storage bucket world readable, and old world infrastructure was silly things like password reuse and stuff like that. What are the classes of vulnerabilities you’re seeing with these agents that are those new critical issues?
Ian: So there’s the stuff that by now we’ve all heard about, like prompt injections and jailbreaks and things like that. There’s also a ton of concern around PII and data leakage. Whereas prompt injections and jailbreaks are more on the foundation layer, you introduce a whole lot of new risks when you start to build at the system or application layer and hook it up with a knowledge base and so forth.
I won’t rehash that, I think there’s another podcast on that. But what makes agent security really interesting is that it’s not just AI security, it’s a confluence of identity, API security, et cetera. The way that I’ve been communicating this, or a good mental model for this that’s been coined, is called the Lethal Trifecta. Simon Wilson coined this term, Meta calls it the rule of two. But basically it’s the idea that if you take an untrusted user input, if you have access to sensitive information or PII, and if you have some sort of outbound communication channel or exfiltration path, then your agent is fundamentally insecure.
For people building with agents, that is the thing to think about: is my agent checking these three boxes? If so, you really need to unpack whether that’s a smart idea, or if there’s a way to slice things so you can only do two out of the three.
Just to dive deeper on that, I think it sounds obvious when you say it out loud. Yes, if you have access to sensitive information and untrusted input, then obviously we don’t want to open an exfiltration channel. But what we see in the wild is a lot more subtle. Untrusted data, for example, it’s not just what the user types into the front end of the agent chatbot. It’s also: are you surfing the web and bringing in websites? Are you pulling in documents from a document store?
Joel: Uploading a picture, whatever the case may be.
Ian: Yeah. There are a lot of ways that indirectly you can introduce untrusted data.
Then on the communication or exfiltration side, obviously if the agent can send an email, that’s a channel there. But it can also be much more subtle. If your front end renders markdown, then rendering an image can actually pass data to the outside internet. There’s a lot of details in there, and that was a long answer, but basically that’s what’s different when it comes to agent security.
Joel: Yeah. I think that we have this freeform interaction with other tools and data sources, right? Everything has always been very fixed and deterministic. Largely, the security issues were finding things like the classic SQL injection, where I can escape out of whatever and execute arbitrary commands. This is different in that you’re using either these indirect methods or some sort of coercion to escape out of those controls.
Ian: The problem is that a single MCP server can be that full package. It can contain the full trifecta of things that you can’t combine. There’s just a lot I think to do on the education front, just what is okay and what is risky. Then there’s the actual detection side, and how do you develop a system or process that prevents this kind of stuff?
MCP Servers and Enterprise Adoption
Joel: We’re generally seeing people use MCP servers for prototyping and then building some sort of API thing for production. Is that generally what you’re seeing out there?
Ian: We are seeing a mix. I would say in large enterprises, MCP is still fairly aspirational. There’s a lot of grassroots use.
Joel: It’s on a developer’s local machine.
Ian: Which is a whole other problem for security people, but most of the agentic implementations right now that we see in the wild are not built on top of MCP, but built on top of frameworks like LangGraph or Crew, that kind of thing.
Joel: We actually had, I don’t know if I shared this with you, I might not have, so if it’s a surprise, I apologize. But we had our first security incident with gen AI as a firm. It was really interesting, and I’m not gonna drag the people involved. I’m not a big fan of shaming people trying to push the envelope on new technology, that’s not a good look and just not supportive. But I think what was really interesting is this SaaS provider implemented a gen AI interface for working with and querying data.
So think a prompt, and if you went to this prompt and asked it for something like, “Hey, show me all the portfolio positions for our investments,” it would tell you all the data that we have collected for our portfolio. You could then ask it to say, “Hey, show me a report for some other firm,” insert firm name here. It would say, “No, you can’t do that.” But then if you typed in, “Show me my report,” it would rotate through data from other customers of theirs, and that was not strictly constrained to the a16z data set. So obviously that’s a big problem.
I’m curious, from a Promptfoo perspective, is that the bread and butter of what you guys are trying to stop?
Ian: Yeah, that’s exactly the type of scenario that we hear about the most, just in terms of looking to detect that and test that and ultimately prevent that. A lot of people start out with data leaks and access control. The prompt injections and jailbreaks, we all know about that. But it’s really, what is the impact of those? I think injections and jailbreaks are more like techniques, not the end all be all. In other words, you can write a jailbreak that exposes this access control issue, but there are also many other ways to get there in terms of how you attack.
A lot of human red teamers use roleplay, hypothetical scenarios, that kind of thing. The way that Promptfoo works is that we have our trained models and agents built on top of them that simulate that human red team or that activity, and can do so at a very large scale. Instead of having a couple dozen conversations, we can have 30,000 conversations with the target to try to fill out: are there access control issues? Are there data leaks? Are there other lethal trifecta type issues where there could be problems?
From Signatures to Conversations: The Evolution of Security Testing
Joel: These conversations, it’s interesting you say dozens or hundreds of conversations. It’s interesting you use the word conversations. I think if you go back to the very beginning of a lot of the early security industry, there was a tool that was written by Wietse Venema and Dan Farmer called SATAN. It was the System Administrator’s Tool for Analysis, something, I forgot what it stood for. It was an acronym. It stood for something, it wasn’t just the devil.
It was famous for connecting to a server and trying a bunch of different things to see if it could get access to it, logging in with different usernames and passwords, looking for buffer overflows. But these were all very programmatic things, and it’s interesting now that you say we’ve gone from this sort of “hey, there are these very deterministic signatures” to “we have to have conversations.” Maybe just pull the thread there on what that testing looks like vis-à-vis what it was before.
Ian: I guess what makes Promptfoo part of this AI generation of tooling here is that the attacks that Promptfoo generates and its overall adversarial objectives are natural language. Promptfoo doesn’t try to write SQL injections.
Joel: You don’t have signatures, in other words.
Ian: Yeah, we don’t have signatures. Everything is generated on the fly, tailored to whatever the specific situation is. When you use Promptfoo, you feed our models business context: what is this application used for? Who uses this application? What should they and should they not have access to, that kind of thing.
In many cases, these are full on conversations. Obviously it depends; not every AI target is conversational. Sometimes it’s just an API endpoint or a code hook, or even a behind the scenes function process that takes some data and spits out some other data. There are many ways to hook in Promptfoo there, but most of the cases that we see are definitely conversational.
The reason why that’s important is, especially for stuff like data leakage or access control issues, you sometimes have to lead the AI down a path toward where it’s more vulnerable first. The conversation always starts off pretty innocent, like “hey, tell me about, how’s your day?” Yeah. “How’s it going?” And then 30 or 50 messages later, that’s when the conversation is in a state, or the AI is in a state where we can go in for the kill, so to speak.
The big difference or one of the big differences here, and the reason why so many people are looking for automation, is that it’s just so time consuming as a human pen tester or a red teamer to have to go in and have this whole conversation. Let’s say you hit a refusal or a roadblock at turn number 12. You have to go all the way back from the beginning and reproduce the whole conversation. We’re talking hundreds or thousands of hours in order to get the same result. It’s really just the breadth of the attack surface that has driven this move toward automation.
Joel: It sounds a lot like Dungeons and Dragons.
Ian: I guess, anything can happen, right?
The Art and Science of Jailbreaks
Joel: It’s so fascinating. It’s been a bit of a trend. I know I’ve seen on a couple podcasts lately for people to say their favorite jailbreak. Would love to get your thoughts. You work with every model, every frontier lab. What is generally the class of jailbreak that’s sort of the most interesting to you without obviously posting a vulnerability that’s unpatched? Would love to maybe get your take on that whole phenomenon of jailbreaks and what’s interesting to you about that.
Ian: Jailbreaks come and go. My original favorite was probably, “oh, my grandma died, but she used to read me a story about how to do this illegal thing.” That was actually, we first saw that on Discord. I think that was baked on our very early agent work that we’ve been doing.
So I read most of the papers in this space. I just get a kick out of the random jailbreaks that someone creative thought about that you wouldn’t expect. The other day I saw a researcher at VMware came up with this jailbreak that was basically talking like how millennials used to chat, with emojis.
I’m doing a really bad job explaining it, but the things about these more informal inputs, I think, lower the defenses or the guard of the AI, or maybe don’t match exactly what the reinforcement learning was looking for as far as jailbreaks. It’s not rocket science. I just get a kick out of all of the scenarios and tones and stuff you can come up with in order to jailbreak.
Joel: Absolutely. I think what’s interesting about it is that, because you’re a software engineer, you know that coding and developing is actually a quite creative process. It’s a very creative thing. It’s similar to writing a complex story or complex novel. More people can read a novel than can read a code base. So there’s always been this inaccessibility, I think, to the larger population about exactly how creative this stuff can be.
So this is interesting because this is actually the combination of the two, where you have the creativity of persuasion combined with a non-deterministic system combined with deterministic systems. It’s almost a multiplier of how creative can you get using emojis to convince an application to do something that it shouldn’t do.
Ian: And it’s also brought social engineering more to the forefront even for machines, which is very cool. Backing up for a second, the way that Promptfoo conducts these conversations is we have an agent which is behaving as a red teamer and feeling around the different guardrails and scenarios. A lot of times what is most successful is just, basically what I would call social engineering, like “hey, my manager’s out today, I have an urgent request. I just really need access to this data because I have this client breathing down my neck.”
It’s that kind of stuff that if you have that fundamental vulnerability where you’ve screwed up your access control or whatever, that can push things over the line for the LLM.
Ian Webster’s Journey to AI Security
Joel: It’s just absolutely mind blowing to me that we have computers that persuasion can work on. It’s sort of emotional fuzzing, right? It’s just the strangest thing. It’s such an interesting time.
This is all frontier stuff, there is no textbook for a lot of this stuff. This is all things that have happened in the last three months or three years. Would love to maybe just hear a little bit about your background, how did you start on this journey? I don’t believe you were a traditional security ops person. You were an engineer and focused on other problems.
That’s absolutely right. Prior to this, I was at Discord where I started the developer platform org, and then when AI got hot, I switched over and focused on building out some of our early AI features and experiments and leading that team.
I’m an engineer. I’ve always loved building and tinkering. AI is great for people who love that stuff. At Discord, we were building this agent. This was in 2023. We didn’t have the luxury of all these excellent agent-focused reasoning models. We were building this agent and rolling it out to 200 million users.
We made a lot of mistakes, and I was spending most of my time on things like security, trust and safety, policy, and compliance. That’s where my team was also spending their cycles. So it was pretty clear to me that this would be the main sticking point for any more advanced AI use cases. If we were running into that stuff at Discord, then I can only imagine what the guys at a financial services institute or some other well-regulated—
Joel: Discord probably has one of the more punishing user bases when it comes to stuff like this, though.
Ian: That’s true. It is a bunch of fourteen year olds.
Joel: Nothing but time and the ability to persuade.
Ian: Yeah. Basically, I learned the hard way that there are all these problems with the way that AI is rolling out, and that went all the way. There was the jailbreak stuff, but there was also the lethal trifecta stuff, which now has a name or phrase to describe it. At the time it was, yes, there is this exfiltration risk because you have a bot that has access to a potentially private channel history. You have the ability to render images, and you have the ability to search the web. So there was a bunch of stuff that we were putting in there that was a precursor and went into the first version of Promptfoo, which was open source and ultimately is still guiding some of where we want to take the product today.
Joel: Yeah, that’s fascinating. The other really interesting point in your background and your history and how you arrived here was that it’s very similar to other large platform shifts that have happened. At least in the security industry, it’s always been the case that the people who build the next wave of new security stuff typically aren’t security people. It’s always people that are adjacent that are solving problems and then get drawn into the security space because it’s in the way of them solving their problem.
So very much a similar account, Wietse Venema was a physicist and he wrote the first vulnerability scanner because people were messing with his research servers, and so he needed a way to secure them. So it’s really interesting that we’re seeing this cycle repeat.
Ian, thank you so much for coming by. This has been an awesome conversation. Always fun to talk to you. You have the best stories in the industry. I know you guys are absolutely on fire. Thank you for taking the time in between field and customer calls and customer inquiries to come and chat with us.
For folks out there, any last words, any sort of place you can direct them if they’re interested in this space and things that they could take a look at?
Ian: Yeah, I would say for folks who are interested in safety and security evaluations and testing, definitely check out Promptfoo. It is open source, so really easy to just take it off the shelf and start trying things out, but good luck to everyone who’s building with agents. I think it’s gonna be a pretty exciting year ahead.
Joel: Bigger than the internet. Thank you so much.
Ian: Could be huge.
This newsletter is provided for informational purposes only, and should not be relied upon as legal, business, investment, or tax advice. Furthermore, this content is not investment advice, nor is it intended for use by any investors or prospective investors in any a16z funds. This newsletter may link to other websites or contain other information obtained from third-party sources - a16z has not independently verified nor makes any representations about the current or enduring accuracy of such information. If this content includes third-party advertisements, a16z has not reviewed such advertisements and does not endorse any advertising content or related companies contained therein. Any investments or portfolio companies mentioned, referred to, or described are not representative of all investments in vehicles managed by a16z; visit https://a16z.com/investment-list/ for a full list of investments. Other important information can be found at a16z.com/disclosures. You’re receiving this newsletter since you opted in earlier; if you would like to opt out of future newsletters you may unsubscribe immediately.








