When writing the code is the easy part

June 12, 2026

We gave an AI read access to our production systems. Here's why, and how we made sure it can't change a thing.

Engineering Manager

“Why did system A not process this event from queue B?”

“Why did system A not process this event from queue B?”

It’s Tuesday. I am tasked with investigating a monitor indicating a potential issue. After verifying I do indeed actually work at my company by completing two-factor auth (several times), I find our log aggregator isn’t showing specific logs that indicate why this transaction started, but did not complete. Later, I realize I was searching the wrong log facet. My LaCroix is warm.

Making a feature work one time, is coding. Making it work all the time - that’s engineering. And like any engineer would do, we sought to automate and reduce the manual process for ensuring quality. So, we took a leap.

We gave an AI read access to our production systems.

At a security conscious company, that sentence should make you a little nervous. Here's why we did it, and how we made sure we didn’t build the next Skynet.
‍
Writing code is the part of the job that AI already does well, and we lean on it all day. But writing code was never the part that kept us up at night. The hard part comes after you ship: confirming the thing actually works, noticing when it quietly stops, and chasing down why. That's where the quality of a product gets decided, and the trickiest parts of it had not been automated.

It matters more for us than it might somewhere else. Envoy started as the iPad you sign-in on at the front desk. We still do that, but the company has grown into a security and compliance platform, and the data we hold reflects it: access logs, identity, audit trails, a record of who went where and when. When that's what you're storing, a feature that's only "mostly working" stops being cosmetic. A gap in it, is a security gap. So "is this doing what it's supposed to, in production, right now" is a question we ask constantly, and there's more riding on the answer than there used to be.

The part that isn't writing code

Answering that question by hand is tedious. Checking whether a feature works end to end means visiting four or five tools, each with its own login and its own way of showing you half the picture. Debugging an incident is the same scavenger hunt, except now something is actually broken and the clock is running. You spend more time clicking between dashboards than thinking about the actual problem.

This is the sort of thing a model is genuinely good at. It can keep a dozen systems in its head at once, follow a single event through all of them, and point at where things went wrong. The catch was always access. You don't hand an AI the keys to production casually, and at a security company you don't do it at all until you have a very good answer to "what happens if it breaks something."

Illustration of a person analyzing interconnected data, cloud, and security icons in a network.

What we built

Our answer was a data harness. It's a read-only window into production that the AI can look through: you ask a question in natural language, it goes and finds the answer, and it tells you what it found.

We started with a Lumos-based access request model - any request to access read-only data starts with a human-initiated, human-approved ticket. If approved, the engineer will receive a short duration with which they can access read-only replicas through an MCP. The MCP that exposes tooling to the AI is internal-only and the implementation detail cannot be seen, so it must go through the MCP interface without hallucinating ways to re-invent how to retrieve data. That MCP interface is heavily designed around keeping data private - we keep all customer data away from the model calling it. Anything sensitive is swapped for a stand-in before the LLM sees it, and the identifier it swaps it with is consistent enough that the LLM can still tell two records belong to the same person without ever learning who that person is. All relational database columns are mapped in advance, with the MCP agnostically replacing all content, even if there is no column hint about sensitivity. Everything fails closed, too. If a safety step can't run for some reason, the request just dies rather than falling back to handing over the real data.

The hard part was never on how to make it powerful. It was making it safe, and at our compliance bar, "pretty safe" doesn't count. So we didn't write a polite instruction asking the model not to change anything. We made changing things impossible. Read-only users, replica-only access, strict database authorization. We built a custom MCP that enforces guardrails, security measures, and anonymization, as a deterministic pipeline. Access is read-only at every layer we could put a layer on, and the commands that would let it write or delete just aren't part of the tool, so there's nothing there for it to call. If one of those guardrails ever failed, the others would still hold.

Illustration of data flow from servers through security, user verification, and sync to a laptop user.

The change nobody wants to make

One of the use cases we didn't see coming, is the scary stuff. Every team has a change that's been parked on the backlog for a year because nobody's quite sure what it will break, and the longer it sits, the worse that gets. Once an AI can read what's really in production, not just the code but the actual state, all of the actual shapes and volumes and the edge cases nobody wrote down - it can walk through where a change would go wrong before you run it on anything real. That doesn't make the work faster, so much as it makes it possible. Some of those tech debt improvements were stuck for one reason only: we couldn't see well enough to feel safe starting. Now, we can create checkpoints, do a dry-run and inspect every single iteration of data to understand - should we trust this migration?

What actually changed

The difference is mostly speed. Checking whether a feature works in production used to take half an hour of poking around; now it's closer to ten seconds. Working out why something broke used to mean an hour of dragging in other teams, and these days it's usually a minute or two. A customer bug that would have eaten a day or two of cross-system detective work tends to get a verified fix the same morning. And the daily "is anything on fire" check stopped being a row of browser tabs and turned into one question we type once.

None of this made our AI write more of our code. It gave the AI a safe pair of eyes on the place the code actually lives. The check that used to cost an afternoon costs a sentence. The incident that used to mean five dashboards and a guess is now a question with an answer attached. And since you can run those same questions on a schedule, sometimes the answer comes back before a customer has noticed anything is wrong. That last one is the version we care about most.

Illustration showing complex data streams with errors flowing to a central icon, then verified to a growing bar graph.

For a company whose systems now help decide who gets through a door, that isn't a nice-to-have. It's how we keep the bar where it needs to be while everything around it gets more serious. Our engineers write features faster, safer, and with more confidence than ever before. It’s night and day. For anyone building software today, we highly recommend investing in visibility engineering.

If you’d like to join us in solving interesting problems, serious scale, and complex products, join our team at envoy.com/jobs

AUTHOR BIO

Dave Mun

Engineering Manager

Dave Mun is an Engineering Manager at Envoy, where he leads the Physical Identity and Access Management platform team. With a background spanning real-time communications, workplace technology, and physical security systems, Dave has spent nearly seven years helping build the infrastructure that powers secure, connected workplaces. He specializes in distributed systems, access control, AI-enabled workflows, and scaling engineering teams to solve complex, real-world problems.