Claude Code OpenTelemetry Monitoring: What Employees and Employers Need to Know
May 31, 2026
What Claude Code OpenTelemetry monitoring actually captures
Picture a Monday morning, first day back from leave. You open a terminal. Claude Code shows a consent prompt you have never seen before, asking you to accept a managed setting an employer set, an OTEL_EXPORTER_OTLP_ENDPOINT, the OpenTelemetry endpoint Claude Code will export telemetry to.
You hit no. The prompt comes back the next session. You restart Claude Code. Same prompt. You open Teams and find a thread already running, engineers cracking big-brother jokes, the person responsible for the Claude Code rollout answering questions. You add yours. Why is this being pushed without anyone being told what it captures, who can read it, and how long it is kept.
The choice on screen is binary. Yes, I trust these settings. Or No, exit Claude Code. Red pill or blue pill, with no third option for "let me read what I'm signing first." Anthropic's own warning text on that prompt names the stakes plainly: "execution of arbitrary code or interception of your prompts and responses." That is not the language of cost tracking.
That hypothetical is the whole article. The rest is me thinking about it without picking the easy answer.
What the setting actually does
OTel on Claude Code is not only cost tracking. The Anthropic dashboard and the Claude Code agent SDK already expose cost and usage data without OTel. The reason an organization wires up an OpenTelemetry exporter is usually that they want more than cost: traces, correlation, attribution, auditing, and the ability to join AI telemetry with their existing observability and finance stack. Those are reasonable goals.
The honest part is that OTel only carries what the application instruments and what the collector forward. Depending on implementation, that can include:
- Prompts and responses, as documented in OpenTelemetry's GenAI observability blog and in Anthropic's monitoring guide
- Tool and MCP invocations, including parameters and success or failure states
- File paths read or modified
- Human approval decisions for sensitive tool actions
- Model names, token counts, timing, latency, and error information per Claude Code's monitoring docs
| What OTel can capture | What OTel cannot capture |
|---|---|
| Prompts you typed | Model's hidden chain-of-thought |
| Responses received | Reasoning steps inside the model |
| Tool and MCP invocations + parameters | Content you never submitted |
| File paths read or modified | Data on other employees' sessions |
| Human approval decisions | Local clipboard or IDE state |
| Model name, token counts, latency, errors | Anything redacted at the collector before export |
What it does not include is the model's hidden chain of thought. The pipe can expose what you typed and what came back. It does not expose the model's private reasoning. The distinction matters when imagining the worst case, because the worry should be specific.
OpenTelemetry's own security guidance on handling sensitive data is explicit that implementers must minimize, redact, hash, or drop sensitive attributes rather than assume telemetry is safe by default. Unless that work is done, the same pipeline can become a broad surveillance layer rather than a scoped observability one. The Art of CTO has a good walkthrough of how to set this up in a privacy-aware way.
The honest first question is not "do you trust your employer" It is "what is the collector dropping before it leaves the machine, who configured it, and where is that written down?"
Why companies want this, in good faith
Companies rarely frame this as employee surveillance. They frame it as observability, security, governance, and ROI measurement, and those motives are often real. Datadog's piece on Claude Code monitoring and Elastic's security-labs walkthrough both make the case. So does Jellyfish on Claude Code analytics and Harmonic's security practitioner guide.
If I were the CTO signing off, here is what I'd tell myself.
The company is about to spend serious money on this tool. The native dashboard is thin. I cannot allocate cost by team, correlate usage with outcomes, see where the agent fails, or build the executive dashboard I owe finance. I also have a security problem: Claude Code reads files, writes code, calls tools, and touches infrastructure, which makes it part of the operational attack surface. Traces are how I reconstruct incidents and detect misuse. And I have a training problem: I want to know where engineers are stuck so I can build the right enablement.
Those are real reasons. I could sign off on them.
What I couldn't sign off on is the path of least resistance, where security wants traces, finance wants attribution, legal wants auditability, and HR wants investigatory capability, until a system justified as observability becomes an always-on record of how people think through problems. That is the scope creep that most privacy practitioners flag before any of these rollouts begin.
The same instinct shows up around outliers. In any large organization, some people will misunderstand the tool, some will use it casually for personal tasks, and a smaller number may use it in clearly unauthorized ways. Telemetry feels attractive because it promises evidence when aggregate anomalies appear, as the IBA notes in its guidance on AI in employee disputes. That is also a real problem. But the answer to a few outliers is rarely "log everyone forever."
What employees give up by saying yes
The most important thing many employees give up is not secrecy but cognitive privacy. AI coding assistants create a space where people think aloud, ask naive questions, test bad ideas, and iterate toward a solution. That space becomes less candid once prompts can be centrally searched or reviewed, as the American Psychological Association documents on electronic monitoring.
OTel-backed collection wont expose the model's internal reasoning, but it can still expose what the engineer typed: messy prompts, mistaken assumptions, internal URLs, partial credentials pasted by accident, vulnerability descriptions, abandoned experiments. Even if a policy says content will not be used for performance management, employees can reasonably ask what technically prevents that later if the data already exists. Dutch works council guidance covers this exact question.
There is also a concentration risk. Centralizing rich prompt logs creates a high-value target accessible to internal admins and, in some cases, third-party observability vendors. If the backend is breached, misconfigured, or over-shared internally, the consequence can be much worse than losing ordinary application logs because prompt content often contains business context, architecture detail, and sensitive mistakes in unusually compact form.
The boring worst case
If I were the employee, what would worry me here isn't the dramatic version. It's the quiet one.
Someone in middle management discovers they can rank engineers by prompts per day, rework ratio, average iterations to convergence. It looks objective. It moves into HR conversations. Performance reviews start citing telemetry the engineer did not know was being watched at that resolution. Reputations bend on numbers nobody agreed should exist.
This is not paranoia. It is the natural drift of any rich measurement system that lives near HR. The seat-based SaaS era avoided this problem by accident, because seats did not produce a per-person performance signal. Usage-priced AI plus content-level telemetry produces exactly that signal, and produces it whether the organization meant to or not.
The receipts that should worry an employee are boring on purpose. A session spent on a false lead. A prompt asking how to phrase a hard message to a colleague. A debugging spiral that exposes a mental model of a system before it's cleaned up. None of it is misuse. All of it is intimate.
We treat Teams as a communication channel with privacy rules. AI is also a communication channel, and a more intimate one. The difference is that Teams records a finished message. AI records the rough draft of how you think before the message exists.
The European frame: GDPR, the AI Act, works councils
This is where the conversation in Europe diverges from the same conversation in the US, and that divergence matters even if your employer is global.
GDPR. Employee-linked prompt and telemetry data is personal data. Systematic monitoring of workers is often high-risk processing that requires strict necessity, proportionality, transparency, and usually a Data Protection Impact Assessment. Blanket collection of full AI session content is difficult to justify if less intrusive alternatives can reach the same result.
The AI Act. The Act adds a layer when AI systems are used to monitor or evaluate workers or influence employment-related decisions. If Claude Code telemetry is later used to assess productivity, rank employees, or support disciplinary decisions, the organization risks moving closer to a regulated high-risk employment use case with significantly heavier governance duties, as Recital 57 of the Act outlines. Enforcement is hybrid: the EU AI Office oversees general-purpose model providers, while member states designate national market surveillance authorities for other AI systems, including workplace deployments. In practice, implementation has lagged, which means formal enforcement is real but uneven.
Dutch works councils. In the Netherlands, the practical control point is closer than the regulator. Dutch works councils have consent rights over policies involving employee monitoring. Dutch commentary has specifically highlighted privacy and tracking as areas where works councils should intervene early, and dutch-law.com summarizes the rules on monitoring employees in the Netherlands. If a Dutch employer is pushing OTel without that conversation, that is not a procurement question, it is a works council question.
When the prompt pops up on your terminal in Europe, you have the right to ask whether there is a DPIA, what lawful basis the rollout is using, and whether the works council has been consulted. Asking those questions is not adversarial. It is the thing the law expects.
The psychology of being watched
Electronic monitoring is associated with worse mental-health and workplace outcomes. The American Psychological Association reports that monitored workers were more likely to feel tense or stressed, more likely to rate their mental health as poor or fair, and more likely to say work harmed their mental health than non-monitored workers. European research on AI monitoring finds the same pattern, and a 2023 PMC paper traces the mechanism.
The mechanism matters. Monitoring reduces perceived autonomy, increases impression management, and weakens psychological safety, especially in knowledge work where experimentation and error are part of the job. In AI-assisted coding, this is especially damaging because the highest-value use of the tool often involves exploratory, imperfect, unpolished interaction. The more an organization monitors AI use at the content layer, the more likely workers are to use the tool in a guarded, superficial, less innovative way. Over time, that produces more distant workers, more defensive behavior, and less organizational learning, even if short-term compliance visibility improves.
That is the paradox. The thing you measured got smaller because you measured it.
How this compares with monitoring you already accepted
Organizations already monitor many things. Teams, Microsoft 365, and related compliance suites can give administrators broad access to messages, files, call metadata, presence data, and discovery workflows. Many employees underestimate how normal this already is in corporate environments, including activity tracking inside Microsoft Teams.
Companies can also inspect network traffic at different levels. URL logging records visited domains and timing. SSL or HTTPS interception can, when properly deployed with a corporate root certificate, decrypt and inspect some categories of traffic on managed devices for security purposes, though Broadcom's privacy guidance makes clear that the legal basis in Europe is narrow. In European contexts, this is generally easier to justify for narrow security purposes than for generalized productivity surveillance, and still requires transparency and proportionality.
The crucial comparison is this: email and Teams mostly capture finished communications. Full AI prompt monitoring captures the rough draft of thought and exploration before it becomes a work product. That is why many employees experience it as more intimate and more invasive than ordinary business-system logging.
What a sane rollout looks like
A privacy-aware organization can still measure value and reduce risk without normalizing content surveillance. The defensible design usually looks like this:
- Metadata first. Model, latency, failures, token counts, coarse tool categories. The Claude Code monitoring docs already cover most of this without content capture.
- Redact or drop sensitive attributes at the collector, not after storage. OpenTelemetry's own guidance is the reference.
- Separate observability from HR and performance evaluation. Structural, not "we promise."
- Limit access through role-based controls and audited access paths.
- Short retention windows by default. Long retention only on documented investigation cause.
- Involve works councils or employee representatives before rollout in jurisdictions where required, per Dutch works council guidance.
- Document a DPIA and publish a plain-language explanation of what is collected and why.
That program will not satisfy organizations that want the most complete possible visibility. It is, however, far more likely to survive legal, ethical, and cultural scrutiny.
The outlier problem, treated like adults
Organizations face real misuse problems. Some employees do not understand what data should never be entered into an AI tool. Some use company tools for clearly personal work. A smaller number behave recklessly or opportunistically. But this is not a new governance problem. The closest analogies are email misuse, internet misuse, BYOD, and phone-system abuse. In each case, mature organizations eventually converged on the same pattern: clear acceptable use policies, training, aggregate monitoring, technical controls, and targeted investigation on cause, rather than permanent content-level surveillance for everyone. MWH's employee monitoring overview frames the same playbook.
A proportionate response to AI tool misuse typically includes:
- An AI acceptable use policy with examples of permitted and prohibited use
- Mandatory onboarding and periodic refresher training
- Budget and rate limits per user or team
- DLP or pattern-based controls for credentials, PII, and regulated data where feasible
- Aggregate anomaly detection for unusually high usage or unusual patterns
- A documented escalation path for targeted review requiring authorization and oversight
This is usually more defensible than building a system that records every prompt by default.
Where I'd change my mind
I'd flip to yes the day an organization showed four things in writing.
Scope. Metadata only, with explicit redaction of prompt and response content at the collector.
Access. A short named list of who can read what, with audited access paths.
Retention. Short windows by default. Long retention only on documented investigation cause.
Separation. Telemetry stays out of HR and performance evaluation. Structural, not promised.
That is not unreasonable. It is the same thing mature organizations did with email, internet, and BYOD after the first decade of getting it wrong.
Conclusion
Saying yes to Claude Code monitoring is not agreeing to a setting. It may be agreeing to a new relationship between workers, employers, and AI-assisted thought. For employers, the appeal is real: observability, governance, cost control, auditability. For employees, the concern is equally real: once the system can capture prompts, responses, and tool traces, it can become a mechanism of visibility far beyond ordinary software telemetry.
The strongest answer is usually neither absolute trust nor absolute rejection. It is a demand for scope, safeguards, proportionality, and alternatives. Where those do not exist, privacy-aware employees are rational to prefer local models, self-hosted options, or tools with narrower and more transparent logging. Where organizations want adoption without alienation, they should design for minimum necessary monitoring rather than maximum possible capture.
A managed setting in your terminal is not the form factor of a policy decision. It is the form factor of a checkbox. Treat it like the policy decision it actually is.
Frequently Asked Questions
Does saying yes automatically mean my employer reads every prompt?
Not necessarily. The setting enables the capability for telemetry export, but what is actually stored and reviewed depends on the organization's collector configuration, backend, retention policy, and access controls. The problem for employees is that capability without clear scope is already a meaningful risk.
Can a company claim this is only for cost tracking?
It can claim that, but basic cost tracking already exists through native Claude Code and Anthropic usage tooling. OpenTelemetry becomes more attractive when the company wants richer tracing, correlation, investigation, and potentially content-level visibility.
If I use a different AI tool, am I private again?
Not automatically. Vendor telemetry, company API keys, network logging, endpoint agents, or SSL inspection may still expose usage depending on the setup. True privacy is strongest when prompts stay local and off the company network.
Are local models good enough to replace Claude Code?
For some tasks, yes. For others, not yet. Local models can be excellent for routine coding, scaffolding, autocomplete, and low-sensitivity work, but frontier hosted models often remain better at deep reasoning and complex agentic tasks unless the employee has strong local hardware.
Can my employer already see my Teams messages and files?
In many Microsoft 365 environments, administrators can access broad compliance and discovery capabilities for messages, files, and related metadata. That is one reason prompt monitoring feels like a boundary shift: it moves AI conversations into the same visibility zone as enterprise messaging, but at the level of rough drafts rather than finished communication.
Should I ask for a DPIA or works council approval?
In Europe, especially in the Netherlands, yes. If AI prompt monitoring is being rolled out, asking whether there is a DPIA, what lawful basis is relied on, and whether the works council approved the system is the right move.
Is full-content monitoring necessary to measure ROI?
Usually not. ROI can often be estimated with metadata, cost and usage dashboards, surveys, output metrics, and targeted sampling rather than always-on content capture. Full-content logging may improve investigation and analysis, but it also raises legal and trust costs.
Can the AI Act or GDPR really apply to developer prompt logs?
Yes. If prompt logs can be tied to identifiable employees, GDPR clearly applies, and systematic monitoring can trigger heightened duties such as a DPIA. If AI tooling is used to monitor, evaluate, or influence decisions about workers, AI Act issues become more relevant as well.
What is the safest implementation path?
Start with metadata-only collection, aggressive redaction, short retention, role-based access, a documented escalation path, and a clear separation between technical observability and HR decision-making. Treat content capture as an exception-based capability, not a default.
What about employees who misuse the tool for personal work?
That is primarily a policy and governance issue, not a justification for universal content surveillance. Mature responses include acceptable use policies, training, anomaly detection, rate limits, DLP, and targeted investigation on cause.
Can HTTPS interception solve this more quietly at the network level?
In some environments, HTTPS interception can expose traffic to many web services on managed devices, but it is a blunt instrument, legally sensitive, and often technically limited by certificate pinning or app behavior. It is generally easier to justify for security filtering than for broad employee productivity surveillance.
What should a transparent internal communication say?
It should state what is collected, what is not collected, where it is stored, who can access it, how long it is kept, what legal basis is relied on, whether works council consultation occurred, whether content is used in HR decisions, and what alternatives exist for higher-sensitivity work.
Related articles
One call. We'll show you exactly what we'd build with your team.
No pitch decks. No generic proposals. Just a conversation about your workflows and what we can automate together.