AI Across the Engineering SDLC -
A Cleverbit Client Case Study
CASE STUDY
Like most engineering teams, this client had started using AI in the way that felt most natural: individuals prompting a chat tool when it was useful. The output varied by person and by prompt. It helped sometimes, but it did not change the consistency of what the team shipped, and it introduced a new problem: AI-generated code that looked right but was not grounded in how the product actually worked.
The client was ambitious and wanted to change how it was using AI.
The goal was to build AI into the way the team works, in a way that was consistent, traceable, and safe to rely on.
"We were the first to identify the stronger AI provider versus the obvious incumbent. We started exploring sub-agents and skills as early as possible, which allows teams with different backgrounds to build consistent-looking code. Before, the code used to look much more fragmented."
Engineering Lead
How we approached the situation
Three decisions shaped the whole engagement.- The first was choosing the right foundation. The client worked with Cleverbit’s R&D team to evaluate available models against their actual work, running test focused on their context and eventually backed the provider that held up better in practice. That gave the team a head start as sub-agents and skills became viable.
- The second was connecting AI to real systems rather than relying on its training data. Using the Model Context Protocol, Cleverbit integrated the AI with Azure DevOps, GitHub, Datadog, a custom ServiceNow connector, and Azure. When an agent can follow a trail from a customer incident through to the pull request that fixed it and into live production telemetry, the output is grounded in something verifiable. That is what makes it reliable enough to act on.
- The third was keeping humans in control at every gate. AI drafts, proposes, and reviews. Engineers and QA approve. Every skill that creates an artefact has explicit sign-off checkpoints. Review and analysis skills are read-only. Establishing this as a cultural norm, not just a technical constraint, was a deliberate part of the work.
"A lot of our effort went towards migrating legacy code from the monolith. We came up with a skill with just enough context: rationale, ways of working, guardrails like multi-tenancy. The skill got us 80% there, then we'd iterate to polish."
Engineering, development and migration
What we built
Over the course of the engagement, twelve uses of AI were embedded across the client’s SDLC, each built as a version-controlled skill, a sub-agent workflow, or a custom application on top of the connected context.
Requirements and component design
AI now builds requirements documents and component designs in a consistent writing rules, with quality rules built in. It has already caught duplicated and misassigned work, flagging when a component overlapped another team’s area before it was built twice.
Legacy migration
The client was carrying significant monolith migration work. A migration skill was built with just enough context: the rationale, the ways of working, and guardrails such as multi-tenancy rules. It gets each migration roughly 80% of the way, then a developer takes it to done. The consistency gain matters as much as the time saving.
Code consistency across teams
The client’s engineering teams came from different backgrounds and the codebase reflected that. Shared conventions are now encoded once in the skill library and available to everyone. Consistent accessibility handling and right-to-left support are explicit goals.
Agentic code review
Peer review skills tuned to the client’s stack now check for regressions, performance issues, null-safety, security, and design quality, not just whether a piece of work meets its acceptance criteria. This takes load off other developers and catches issues before they reach merge.
Test strategy
The AI-assisted test strategy draws on acceptance criteria, past incidents, historical defects, and review threads. It questions the design rather than just confirming it. A QA engineer still signs off before anything is created; the agent runs a multi-step chain from strategy through to test creation with mandatory human gates.
Onboarding and local environment setup
New developers now get AI-assisted setup grounded in the repository’s own configuration and documentation. The long tail of environment-specific issues that previously consumed senior time has largely been absorbed.
Pre-sales feasibility
Custom feature requests used to get a yes or no. The client’s engineering team can now put together a proof-of-concept to show a request is feasible, so sales conversations come with something to demonstrate.
Automated release notes
Detailed customer-facing release notes are generated automatically for both platforms from one source of truth. For the client’s regulated audience, they also flag the data-privacy and compliance implications of each change by default.
Delivery and quality intelligence
Monthly bug-trend analysis links accepted bugs to the customers that raised them, showing recurring areas and guiding priorities. P1/P2 incident root-cause analysis surfaces reliability risks before they compound. Every figure traces back to a source record.
In-house troubleshooting assistant
Engineers and support staff ask questions in plain language and get back likely causes, troubleshooting steps, known workarounds, the linked defect, and the release that fixed it. The search covers the client’s entire support and engineering history, including discussion and resolution threads where the real answer usually sits. Every response cites its sources. The client is now extending this from a single-team agent into an org-wide knowledge platform on Microsoft Copilot, Azure AI Foundry, and Microsoft Fabric.
Live production root-cause analysis
An AI agent pulls live telemetry alongside the source code and matches what production is doing to the code paths that could explain it. In one incident during the engagement, it narrowed a server collapse to two evidence-backed hypotheses, identified the candidate configuration and code for each, and proposed fixes for the SRE team to evaluate. It keeps proven findings separate from hypotheses, so the output is a reliable starting point rather than a guess.
Cloud cost analysis
The client measured savings from moving a search and analytics fleet to ARM-based silicon and Linux. Key figures were checked against billed cloud meters, reconciled to the portal, and reviewed by a separate agent that pushed back on framing risks: baseline choice, performance parity, and licence attribution. The result is a set of savings figures with clearly labelled ranges and a short list of items still to verify, plus a method the client can repeat each month.
"We integrated context engineering using agents, which reduced hallucinations and gave us quality output. The AI peer reviews reduced the burden on other developers and raised quality in the early stages, with faster output."
Engineering, platform
What the client measured
The client’s own assessment of outcomes, reported at the close of the engagement:
Code and artefacts are now consistent across teams regardless of individual background or seniority. Problems are being caught before merge, where the cost of fixing them is lowest. Routine review work and environment support are increasingly handled by agents, freeing senior engineers for work that requires judgement. New developers reach productivity faster. Legacy migration is moving at a pace the team considers sustainable for the first time. And because every output is grounded in connected systems with human sign-off, the results are reliable enough to build on.
The client is continuing to extend the capability and has indicated that the patterns developed during the engagement are now forming the basis of their broader AI engineering strategy.
"I'm using it when a customer success manager asks if something can be done custom. In the past I'd reply with a simple answer; now I'm trying to give them a proof-of-concept."
Engineering, product
Our latest posts
Got a High-Performance Team?
Let’s find out.
Complete your Team’s Performance & AI Scorecard.
It only takes 2 minutes.