Training AI to Think Like an Attacker: The Red Team Approach to Risk Engineering

Home

Blog

Omer Yehudai

Feb 16, 2026

4 min

Table of Contents

Overview

TPRM has always been painfully manual, so it’s no surprise the industry is rushing to apply AI to it.

But there is a fatal flaw in the current approach: the market is using AI to automate the exact same process that we already know is broken.

As we’ve written before, checkbox TPRM is dead. Yet, the default use case for AI today is simply to read SOC2 reports, extract clauses, and automatically fill out risk spreadsheets.

The industry is using AI to build a faster auditor. But automating a broken process doesn't reduce your risk; it just helps you ignore it at scale.

At Lema, we didn't build our AI to automate the checklist. We built it to scale our expertise as offensive security veterans.

Standard AI models are built to be helpful assistants. They are polite, agreeable, and trusting. If you hand a standard LLM a vendor’s security policy, it reads it like a junior auditor: "Do they have an encryption policy? Yes. Pass."

That is why they fail at security.

When we look at a target organization, we don't care about their policy documents. We look for the exploitable reality.

To build Lema, we couldn't just fine-tune a model to summarize text. We had to train an agent to simulate the mindset of an elite vulnerability researcher - continuously, at scale.

Here is how we taught our AI to ignore the compliance checklist and think like an attacker.

‍

Phase 1: Context First

An attacker never looks at a target in a vacuum. They start by mapping the Attack Surface.

Before they waste energy trying to break into a vendor, they ask: "If I compromise this, does it matter?"

If a vendor has weak security but no access to sensitive data? Ignore.
If a vendor has strong security but Write Access to the production codebase? Target.

Standard risk tools fail because they treat every vendor equally. They flag a "Critical Risk" on a marketing tool the same way they flag an engineering tool.

We trained our agents to prioritize based on Context, not just content. By integrating with your internal environment, ingesting data from procurement and monitoring the vendor’s public footprint, the agent establishes the "Blast Radius" before it even looks at a document. It calculates: If this vendor turns malicious, where can they hurt us the most?

It filters out the noise so it can focus its computing power on the vendors that actually hold the keys to the kingdom.

‍

Phase 2: Hunting for Signals (The "Bullshit Detector")

Once an attacker identifies a high-value target, they don't look for what the target says. They look for Signals that contradict the narrative.

A vendor’s SOC2 report is a polished marketing document. But their operational reality leaks signals everywhere:

risk_triangulation.log

       // TARGET: Vendor_Analysis_Engine

       // STATUS: Cross-referencing legal claims vs. technical reality...
    
> The API Signal

        CLAIM:
        "We don't retain data" (Legal Team)
        
        REALITY:
        retention_period defaults to "Forever" (Developer Docs)
      
> The Integration Signal

        CLAIM:
        "Least Privilege" (Security Page)
        
        REALITY:
        requests Admin scope (Installation Guide)
      
> The Talent Signal

        CLAIM:
        "We prioritize security" (Trust Center)
        
        REALITY:
        security_hires_last_18_months = 0 (Talent Data)
      
      -- End of Automated Analysis --

We trained our agents to ignore the "Happy Path" of the documents and instead triangulate these weak signals.

The agent scans technical documentation, public repos, and integration scopes looking for the friction between the promise and the reality. It doesn't ask "Is this compliant?" It asks "Where is the lie?"

‍

Phase 3: Stop Blocking. Start Architecting

The "Department of No" is dead. In the alliance economy, blocking a vendor because they have a "B-" security rating doesn't make you secure; it makes you obsolete.

The Auditor looks at a risky vendor and asks: "How do we stop this?" The Risk Engineer looks at the same vendor and asks: "How do we architect the connection so a breach doesn't matter?"

We trained our agents to focus on Impact Reduction. Instead of flagging a binary "High Risk" alert that freezes the procurement process, the Agent analyzes the actual technical usage and proposes architectural countermeasures.

The Signal: "The vendor requests Write Access to the repo, but their documentation confirms they only perform code scanning."
The Engineering Fix: "Do not block the vendor. Enforce a Read-Only scope at the IdP level."

This is the difference between bureaucracy and security. The business gets to move at velocity, while the Risk Engineer quietly limits the blast radius in the background. It turns TPRM from a gatekeeper into a guardrail.

‍

Conclusion: The Digital Red Teamer

We didn't build Lema to help you read more PDFs. We built it to automate the skepticism of a risk engineer.

We are moving TPRM from a passive administrative task to an active defense discipline. By training agents to think like attackers—prioritizing context, hunting for signals, and engineering safe access—we stop managing compliance and start Engineering Risk.

Key Takeaways

Context over Content: Like an attacker, the AI prioritizes vendors based on their access and blast radius, not just their paperwork.

Signal Triangulation: We train agents to find the "lie" by cross-referencing legal claims against technical signals in APIs and docs.

Enablement Engineering: The goal isn't to block tools, but to suggest architectural mitigations (like scope reduction) that allow the business to run fast, safely.

FAQs

How is Lema’s AI different from a standard LLM wrapper?

Most "AI-powered" tools are just legacy workflows with a new coat of paint. They use generic models to summarize SOC 2 reports, which often leads to hallucinations and missed risks. Lema uses a patented zero-hallucination engine designed to think like a vulnerability researcher. We don’t just summarize; we forensically map evidence against your specific control framework and triangulate signals from public data to find what the vendor isn't telling you.

Can I trust the accuracy of an automated assessment?

In security, "close enough" is a failure. Lema provides 100% accurate, audit-ready evidence. Unlike black-box AI, every finding is linked directly to a specific clause or artifact. We’ve moved beyond the "Trust Me" phase of AI into a "Show Me the Evidence" model, ensuring that every assessment is high-fidelity and defensible.

We have thousands of vendors. How does Lema handle scale without adding noise?

Attackers don't treat every target equally, and neither should your TPRM. Standard tools create noise because they apply the same 200-question assessment to a core engineering vendor and a marketing newsletter tool. Lema handles scale through autonomous triage. By integrating with your internal stack, the Agent instantly calculates the actual "Blast Radius" of all your vendors. If a vendor has zero access to sensitive data, Lema deprioritizes or auto-resolves them. It focuses its computing power—and your team's time—exclusively on the vendors that actually hold the keys to the kingdom.

Does Lema replace our GRC or Procurement tools?

No, we make them intelligent. Lema integrates seamlessly with your existing GRC and procurement systems to act as the intelligence layer. We replace the manual, point-in-time spreadsheets and static questionnaires with a continuous, automated system of record for third-party risk.

What happens to our data? Is it used to train your models?

Security is our foundation. Lema utilizes a Zero-Retention Architecture. Your artifacts are processed in a dedicated, isolated tenant, and your data is never used to train our models. For organizations requiring even higher levels of control, we offer a self-hosted model for artifact analysis that runs on your own infrastructure.

Attackers also exploit vendors we don't know about. How does Lema handle "Shadow IT"

Checklists only work for the vendors you know about. Lema’s AI continuously monitors your environment to detect unsanctioned third parties that have bypassed procurement. We identify these "shadow" relationships and flag high-risk ones immediately, ensuring your visibility matches your actual exposure.

About the Author

Omer Yehudai

Co-Founder & CPO @ Lema AI

Omer Yehudai is the CPO & Co-Founder of Lema, where he leads the product layer for third-party Risk Engineering. A former vulnerability researcher in Israel’s elite Unit 8200, Omer brings security-first thinking and a deeply technical approach to the world of TPRM. He has built products from zero to scale and led R&D teams at startups. At Lema, he’s focused on building a category-defining agentic Risk Engineering platform.

OUR RESOURCES

Level up with Lema

BLOG

Training AI to Think Like an Attacker: The Red Team Approach to Risk Engineering

Phase 1: Context First

Phase 2: Hunting for Signals (The "Bullshit Detector")

Phase 3: Stop Blocking. Start Architecting

Conclusion: The Digital Red Teamer

Key Takeaways

FAQs

How is Lema’s AI different from a standard LLM wrapper?

Can I trust the accuracy of an automated assessment?

We have thousands of vendors. How does Lema handle scale without adding noise?

Does Lema replace our GRC or Procurement tools?

What happens to our data? Is it used to train your models?

Attackers also exploit vendors we don't know about. How does Lema handle "Shadow IT"

OUR RESOURCES

Level up with Lema

Checkbox TPRM is Dead. Start Engineering Risk

What is a Risk Engineer?

Building Agentic TPRM: A Guide

Checkbox TPRM is Dead. Start Engineering Risk

What is a Risk Engineer?

Building Agentic TPRM: A Guide

Think outside the checkbox