AI Across the Engineering SDLC -
A Cleverbit Client Case Study

How a leading SaaS provider built AI into its engineering organisation, and what it measured.

CASE STUDY

Like most engineering teams, this client had started using AI in the way that felt most natural: individuals prompting a chat tool when it was useful. The output varied by person and by prompt. It helped sometimes, but it did not change the consistency of what the team shipped, and it introduced a new problem: AI-generated code that looked right but was not grounded in how the product actually worked.

The client was ambitious and wanted to change how it was using AI.

The goal was to build AI into the way the team works, in a way that was consistent, traceable, and safe to rely on.

Cleverbit worked with the client’s engineering organisation to design and build that capability.

"We were the first to identify the stronger AI provider versus the obvious incumbent. We started exploring sub-agents and skills as early as possible, which allows teams with different backgrounds to build consistent-looking code. Before, the code used to look much more fragmented."

Engineering Lead

In this article

Add a header to begin generating the table of contents

How we approached the situation

Three decisions shaped the whole engagement.

The first was choosing the right foundation. The client worked with Cleverbit’s R&D team to evaluate available models against their actual work, running test focused on their context and eventually backed the provider that held up better in practice. That gave the team a head start as sub-agents and skills became viable.

The second was connecting AI to real systems rather than relying on its training data. Using the Model Context Protocol, Cleverbit integrated the AI with Azure DevOps, GitHub, Datadog, a custom ServiceNow connector, and Azure. When an agent can follow a trail from a customer incident through to the pull request that fixed it and into live production telemetry, the output is grounded in something verifiable. That is what makes it reliable enough to act on.

The third was keeping humans in control at every gate. AI drafts, proposes, and reviews. Engineers and QA approve. Every skill that creates an artefact has explicit sign-off checkpoints. Review and analysis skills are read-only. Establishing this as a cultural norm, not just a technical constraint, was a deliberate part of the work.

"A lot of our effort went towards migrating legacy code from the monolith. We came up with a skill with just enough context: rationale, ways of working, guardrails like multi-tenancy. The skill got us 80% there, then we'd iterate to polish."

Engineering, development and migration

What we built

Over the course of the engagement, twelve uses of AI were embedded across the client’s SDLC, each built as a version-controlled skill, a sub-agent workflow, or a custom application on top of the connected context.

Requirements and component design

AI now builds requirements documents and component designs in a consistent writing rules, with quality rules built in. It has already caught duplicated and misassigned work, flagging when a component overlapped another team’s area before it was built twice.

Legacy migration

The client was carrying significant monolith migration work. A migration skill was built with just enough context: the rationale, the ways of working, and guardrails such as multi-tenancy rules. It gets each migration roughly 80% of the way, then a developer takes it to done. The consistency gain matters as much as the time saving.

Code consistency across teams

The client’s engineering teams came from different backgrounds and the codebase reflected that. Shared conventions are now encoded once in the skill library and available to everyone. Consistent accessibility handling and right-to-left support are explicit goals.

Agentic code review

Peer review skills tuned to the client’s stack now check for regressions, performance issues, null-safety, security, and design quality, not just whether a piece of work meets its acceptance criteria. This takes load off other developers and catches issues before they reach merge.

Test strategy

The AI-assisted test strategy draws on acceptance criteria, past incidents, historical defects, and review threads. It questions the design rather than just confirming it. A QA engineer still signs off before anything is created; the agent runs a multi-step chain from strategy through to test creation with mandatory human gates.

Onboarding and local environment setup

New developers now get AI-assisted setup grounded in the repository’s own configuration and documentation. The long tail of environment-specific issues that previously consumed senior time has largely been absorbed.

Pre-sales feasibility

Custom feature requests used to get a yes or no. The client’s engineering team can now put together a proof-of-concept to show a request is feasible, so sales conversations come with something to demonstrate.

Automated release notes

Detailed customer-facing release notes are generated automatically for both platforms from one source of truth. For the client’s regulated audience, they also flag the data-privacy and compliance implications of each change by default.

Delivery and quality intelligence

Monthly bug-trend analysis links accepted bugs to the customers that raised them, showing recurring areas and guiding priorities. P1/P2 incident root-cause analysis surfaces reliability risks before they compound. Every figure traces back to a source record.

In-house troubleshooting assistant

Engineers and support staff ask questions in plain language and get back likely causes, troubleshooting steps, known workarounds, the linked defect, and the release that fixed it. The search covers the client’s entire support and engineering history, including discussion and resolution threads where the real answer usually sits. Every response cites its sources. The client is now extending this from a single-team agent into an org-wide knowledge platform on Microsoft Copilot, Azure AI Foundry, and Microsoft Fabric.

Live production root-cause analysis

An AI agent pulls live telemetry alongside the source code and matches what production is doing to the code paths that could explain it. In one incident during the engagement, it narrowed a server collapse to two evidence-backed hypotheses, identified the candidate configuration and code for each, and proposed fixes for the SRE team to evaluate. It keeps proven findings separate from hypotheses, so the output is a reliable starting point rather than a guess.

Cloud cost analysis

The client measured savings from moving a search and analytics fleet to ARM-based silicon and Linux. Key figures were checked against billed cloud meters, reconciled to the portal, and reviewed by a separate agent that pushed back on framing risks: baseline choice, performance parity, and licence attribution. The result is a set of savings figures with clearly labelled ranges and a short list of items still to verify, plus a method the client can repeat each month.

"We integrated context engineering using agents, which reduced hallucinations and gave us quality output. The AI peer reviews reduced the burden on other developers and raised quality in the early stages, with faster output."

Engineering, platform

What the client measured

The client’s own assessment of outcomes, reported at the close of the engagement:

Code and artefacts are now consistent across teams regardless of individual background or seniority. Problems are being caught before merge, where the cost of fixing them is lowest. Routine review work and environment support are increasingly handled by agents, freeing senior engineers for work that requires judgement. New developers reach productivity faster. Legacy migration is moving at a pace the team considers sustainable for the first time. And because every output is grounded in connected systems with human sign-off, the results are reliable enough to build on.

The client is continuing to extend the capability and has indicated that the patterns developed during the engagement are now forming the basis of their broader AI engineering strategy.

"I'm using it when a customer success manager asks if something can be done custom. In the past I'd reply with a simple answer; now I'm trying to give them a proof-of-concept."

Engineering, product

*Client name withheld by agreement. This case study is available to share with prospective clients under NDA on request.*

Our latest posts

24 June 2026

The EU AI Act Is Not a Future Problem

Most conversations about the EU Artificial Intelligence Act treat it as something on the horizon. A framework...

22 June 2026

The AI Organisational Chart That Will Define 2027

For the last century, business growth meant one thing: more people. More revenue required more sales...

16 June 2026

What is strategic team scaling? A guide for tech leaders

Discover what is strategic team scaling and how it helps tech leaders enhance growth without sacrificing...

08 June 2026

Explaining tailored scaling solutions for tech teams

Discover how explaining tailored scaling solutions can optimize your tech team’s performance and growth....

04 June 2026

What is the pilot team approach? A guide for tech leaders

Discover what the pilot team approach is and how it empowers tech leaders to drive successful transformations...

02 June 2026

What is agile team scaling for software leaders

Discover what agile team scaling really means for software leaders. Learn how to coordinate processes...

Got a High-Performance Team?

Let’s find out.
Complete your Team’s Performance & AI Scorecard.
It only takes 2 minutes.

AI Across the Engineering SDLC – Case Study

AI Across the Engineering SDLC -
A Cleverbit Client Case Study

How we approached the situation