How to measure AI-assisted code
AI coding tools are everywhere. Knowing whether they actually help is a different problem — and most teams answer it wrong. Here's how to measure AI properly: adoption, productivity, and real business impact.
Adoption is not impact
The most common AI metric you'll see is a usage number: "half our pull requests used AI." It feels meaningful, but it only tells you the tool is being used, not that it's working. A team can have high adoption and slower delivery, or low adoption and big gains on the work that matters. To measure AI-assisted code well, separate three questions: are people using it (adoption), is it making them faster without hurting quality (productivity), and is it improving business outcomes (impact)?
Step 1: Detect AI-assisted work
You can't measure what you can't see. AI-assisted code is detectable from signals your tools already emit: commit trailers and co-authorship metadata left by assistants like Claude Code, the Copilot agent, Cursor, and aider, combined with vendor usage data from Anthropic, GitHub Copilot, Cursor, and OpenAI. Treat detection as a floor, not a precise census — it's enough to compare AI-assisted work against the rest.
Step 2: Measure adoption
Roll the signals into an AI adoption score: breadth (the share of work that's AI-assisted), coverage (how many teams have adopted), and engagement (suggestion-acceptance rate and how many licensed seats are actually active). Track it over time and you see a maturity arc — experimenting, adopting, scaled — plus the friction points blocking wider use. Rising acceptance is the clearest sign engineers are learning to prompt well.
Step 3: Measure productivity — speed and quality together
Speed alone is a trap. The seat price of an assistant is trivial next to a salary, so any time saved looks like enormous ROI — but AI code can take longer to review, raise rework when it's subtly wrong, and add tech debt. Measure AI-assisted pull requests against non-assisted ones on both cycle time and quality (review rework, change failure rate). And watch for the bottleneck shifting: when AI lifts authoring throughput, review often becomes the new constraint.
Step 4: Measure impact — causally
This is where most tools stop short. To know whether AI caused a delivery improvement, you need a counterfactual: what would have happened without it. AI impact measurement uses a difference-in-differences design — comparing how metrics change for adopting teams against a control of non-adopting teams over the same window. That isolates the AI effect from everything else going on, and it's credible enough to put in front of a board, framed honestly as association rather than proof.
Step 5: Connect it to outcomes
Finally, tie measurement to the business. Compute ROI as real tool spend against measured lift, decide which tools earn their seat, and check work allocation — is the capacity AI frees up flowing to growth work, or being absorbed by maintenance? That's the question executives actually care about.
Put it together
Adoption → productivity → outcomes is a framework, not a single number. DXSignal measures all three from data you already generate, with the causal design at its core. See how AI Impact works, or estimate the upside with the AI ROI calculator.