Case Studies: Brands Winning with AI Tools

The phrase "AI case study" carries baggage. Most published examples are vendor-supplied marketing artifacts with the methodology stripped out, a logo bolted on, and a percentage gain that does not survive a follow-up question. The cases below were selected because each one has a primary source — a press release, an earnings transcript, an interview with a named executive, or a peer-reviewed analysis — and because each one carries enough detail that an operator in a similar situation can ask "would this work for us?" and get a reasoned answer. The companies span customer service, finance, retail, marketing, ecommerce, healthcare, and education. They are not the only AI wins of 2022-2025; they are the ones with documentation that holds up.

Table of contents

Klarna — customer service automation at scale

Function: Customer support. Tool: OpenAI-powered conversational agent integrated into Klarna's customer service surfaces.

Challenge: Klarna handles tens of millions of customers across dozens of markets and many languages. Customer service volume scales with growth, and offshore agent capacity has limits on quality, latency, and cost.

Solution: An OpenAI-based assistant launched in early 2024, grounded in Klarna's policies and integrated with order and account systems. Designed to handle common categories of enquiry end-to-end, with human handover for complex or sensitive cases.

Results (per Klarna's February 2024 release):

  • 2.3 million conversations per month within the first month of launch.
  • Equivalent to the work of around 700 full-time agents.
  • Average resolution time of approximately 2 minutes versus 11 minutes for human agents.
  • Customer satisfaction scores comparable to humans.
  • Roughly 25% reduction in repeat enquiries.
  • Forecast $40 million profit improvement for 2024.

Lessons: One workflow, deep automation, clear pre-AI baseline. The Klarna case is the cleanest reference for any company evaluating conversational AI in customer service. Note the 2025 follow-up: Klarna stated publicly that it was rebalancing back toward more human agents for higher-stakes cases — not a reversal of the deployment, but a recalibration of how much of customer service belongs in which tier. The story underneath: even very successful AI deployments find their natural automation share, and it is rarely 100%.

JPMorgan Chase — internal LLM platform (LLM Suite)

Function: Cross-functional employee productivity. Tool: "LLM Suite," an internal generative AI platform built on top of OpenAI's models, deployed to tens of thousands of employees through 2024.

Challenge: JPMorgan banned consumer ChatGPT for employee use in early 2023 over data security concerns, but employees needed access to generative AI capability. Sending data to public APIs was incompatible with the bank's regulatory and data-handling requirements.

Solution: A private deployment running OpenAI's models inside JPMorgan's controlled infrastructure with strict data isolation. By mid-2024 the platform had been rolled out to a reported 60,000+ employees, with use cases ranging from research synthesis to email drafting to coding assistance. The bank stated publicly that it considers the platform foundational infrastructure rather than an experiment.

Results: JPMorgan has not published a single ROI figure but executives have referenced productivity gains across multiple functions in earnings discussions and at industry events. The bank's 2024 commitment to generative AI was disclosed at $1-1.5 billion in tech spending for the year.

Lessons: Build over buy when the data is too sensitive to send anywhere else, and when scale makes API economics unfavourable. JPMorgan's pattern — a horizontal employee productivity platform that becomes infrastructure — is the path most regulated-industry leaders have chosen.

Stitch Fix — ML-driven personal styling

Function: Product personalisation. Tool: Internal ML and analytics stack, augmented in 2023-2024 with generative AI elements.

Challenge: Stitch Fix sends personalised clothing selections to subscribers, with human stylists making final decisions. Scaling that process while preserving fit and style accuracy is the entire business model.

Solution: A long-running combination of ML on customer attributes, garment attributes, and feedback signals, with human stylists in the loop. Generative AI added in 2023-2024 for personalised note generation, styling tips, and back-end analytics including a tool the company has called "expert-in-the-loop" generative styling.

Results: Stitch Fix has reported gross margin pressure in recent years driven by broader retail conditions, but its core ML system remains a structural differentiator and is referenced in earnings calls as central to fit accuracy. The company's public materials describe ML-driven inventory management as one of the levers maintaining margin during difficult market conditions.

Lessons: ML and AI compounded over a decade beats a generative AI sprint launched yesterday. The Stitch Fix model — human-in-the-loop personalisation augmented by ML, then by generative AI — is a more durable pattern than full automation. The case is also a useful corrective: AI does not save a business from broader market pressures.

Heinz — generative-AI-led brand campaign

Function: Marketing. Tool: DALL-E 2 image generation.

Challenge: Reinforce Heinz's longstanding category-coding ("if you ask for ketchup, you mean Heinz") in a way that felt fresh and earned attention.

Solution: The "A.I. Ketchup" campaign in summer 2022 prompted DALL-E 2 with various ketchup-themed phrases ("ketchup in the Renaissance," "ketchup as Egyptian hieroglyphs") and showed that the model produced bottles consistently shaped like Heinz, supporting the brand insight.

Results: The campaign won a Cannes Lions award and produced widely-reported earned media. Agency Rethink reported the campaign generated 1.15 billion earned impressions and a 38:1 ROI on earned media value compared with paid spend, per Rethink's post-campaign analysis. Standard caveats apply on the agency-self-reported figure, but the cultural moment was unambiguously real.

Lessons: AI-generated imagery as the SUBJECT of a campaign, with the brand insight ("AI itself thinks ketchup means Heinz") doing the work, is a more durable use of the technology than AI as just a faster Photoshop. Heinz also published a 2023 follow-up using AI in the visuals; the earned media was meaningfully smaller, supporting the rule that the moment of "look, AI" passes fast.

Spotify — recommendation and AI DJ

Function: Engagement and retention. Tool: Internal ML stack plus OpenAI for the AI DJ voice host launched in 2023.

Challenge: Spotify's entire value proposition versus owning a record collection is helping listeners find music they will love. Recommendation accuracy is the engine of retention.

Solution: Discovery Weekly (launched 2015) and other personalised playlists, layered over a deep ML recommendation system. The 2023 launch of "DJ" added an AI-voiced host using OpenAI's technology and Sonantic's voice cloning, introducing recommendations conversationally.

Results: Spotify has stated in public materials that personalised recommendations drive a substantial share of total listening time on the platform — frequently quoted at over 30%, varying by definition. The company credited the DJ feature with strong engagement among adopting users in the quarters following its launch.

Lessons: Recommendation as engagement engine is the longest-standing AI category in consumer software, and the wins compound. Spotify demonstrates the difference between buying an AI feature for a quarter and building a personalisation system as a permanent capability.

Walmart — AI for search and supply chain

Function: Search and operations. Tools: Internal AI/ML platforms supplemented with generative AI capabilities from 2023 onward.

Challenge: Walmart operates one of the largest ecommerce sites in the US and runs a globally significant supply chain. Even small percentage improvements on search relevance, ad targeting, and supply chain accuracy translate to billions of dollars.

Solution: Generative AI integrated into search to handle natural-language queries; an AI-augmented internal assistant ("My Assistant") rolled out to corporate employees; ML-driven demand forecasting and inventory placement at scale; and a 2023-2024 push into generative AI for product description and content generation.

Results: Walmart has shared selective figures in earnings discussions and 2024 investor day materials. The company stated in 2024 that AI had been a contributor to operational efficiency and ecommerce growth, and that "Sparky" and other AI capabilities had been deployed in customer-facing surfaces. As with most public retailers, the headline number is not isolatable from broader operations performance.

Lessons: Scale changes the calculus. For a retailer of Walmart's size, internal builds and platform investments make economic sense in ways they would not for smaller competitors. The case also shows the diversity of AI applications inside one company — search, support, content, supply chain, internal productivity — running in parallel.

Duolingo — generative AI in language learning

Function: Product. Tool: OpenAI's GPT-4, integrated as the core of Duolingo's premium "Duolingo Max" tier launched in March 2023.

Challenge: Duolingo's freemium model was hugely successful but the product ceiling was real: rote drills are not a good simulation of the conversational practice language learners need. The company needed a step-change in product capability.

Solution: Duolingo Max, a premium tier with two GPT-4-powered features — "Explain My Answer" (a conversational tutor explaining mistakes) and "Roleplay" (open-ended conversation practice with the model). Launched in select languages and markets in March 2023, expanded through 2024.

Results: Duolingo's public statements credit AI as a contributor to subscriber growth and ARPU improvement. The company's 2024 results showed substantial ARPU lift driven by Max adoption alongside other premium tiers. The number of paid subscribers crossed 8 million by 2024 (per company filings), and the company has consistently referenced AI as a contributor.

Lessons: AI as the basis for a new pricing tier is one of the cleanest paths to monetising the technology. Duolingo did not retrofit AI into the free tier and try to claim productivity savings; it built a new feature set, charged for it, and let the market decide.

Morgan Stanley — financial advisor knowledge agent

Function: Wealth management productivity. Tool: Custom assistant built on OpenAI's GPT-4, deployed to financial advisors from 2023.

Challenge: Morgan Stanley's advisors needed faster access to the bank's vast internal research, policy, and product knowledge base. The institutional knowledge existed in tens of thousands of documents that no advisor could read.

Solution: A retrieval-augmented assistant grounded in the bank's research library, deployed first to a pilot group of advisors, then expanded. The system answers natural-language questions, cites source documents, and supports document drafting tasks.

Results: Morgan Stanley has stated publicly that the assistant has been adopted by the majority of its advisor population and that it has improved time-to-answer on client-relevant research queries. Specific quantitative outcomes have been referenced selectively in interviews and conference appearances rather than as headline numbers.

Lessons: The Morgan Stanley case is a canonical example of "internal knowledge agent" done well — bounded scope, clear user, retrieval-grounded, deployed gradually. It is the template most large knowledge-work organisations are now imitating.

Moderna — AI-augmented drug discovery operations

Function: Pharmaceutical R&D and operations. Tool: Internal "mChat" assistant built on OpenAI plus various ML systems for drug design.

Challenge: Moderna's mRNA platform produces an enormous research and operational data exhaust. Internal teams spend significant time finding, summarising, and acting on information across functions.

Solution: A 2023 partnership with OpenAI and the deployment of "mChat," an internal assistant deployed across thousands of employees, used for literature review, internal document Q&A, and various productivity workflows. Combined with longer-running ML investments in protein design, sequence optimisation, and clinical operations.

Results: Moderna executives have publicly referenced mChat's role in employee productivity and have included AI in strategic priorities. As with other deeply integrated deployments, headline ROI figures are not isolated.

Lessons: The case shows the breadth of AI applicability inside a complex science-driven company — productivity assistant, R&D acceleration, operational analytics — running as part of a coherent platform rather than as isolated experiments.

Unilever — content and research automation

Function: Marketing operations. Tool: Internal AI platform "Unily" plus various third-party generative tools.

Challenge: Unilever runs hundreds of brands across most countries on Earth, each requiring local creative variants, translations, and consumer research. Scaling marketing operations through human production was structurally limited.

Solution: Generative AI integrated into the creative production pipeline for first-draft generation, localisation, and asset variation. The internal platform makes AI tools available broadly inside the marketing organisation, with governance for brand voice and compliance.

Results: Unilever has shared figures including a 30% reduction in content production cost on certain campaigns and shorter concept-testing cycles. As with Nestle, the press-release numbers focus on throughput rather than direct ROI, but the operational lift is real.

Lessons: Industrialising the long tail of marketing production — variants, localisations, channel adaptations — is the highest-leverage AI win for large CPG companies. The unit economics flip on work that human production made too expensive to do well.

Common patterns across the cases

PatternWhy it matters
One workflow, deeply automated, before scalingThe wins were from going deep on a specific job, not from "AI everywhere"
Pre-AI baseline measuredEach company could state the metric the AI was meant to move
Owner with operational accountabilityThe successful programmes had named owners whose KPIs moved with the project
Internal infrastructure for sensitive dataRegulated-industry leaders (JPMorgan, Morgan Stanley, Moderna) built internal platforms rather than send data out
Long-running ML compounded by generative AIThe mature winners (Spotify, Stitch Fix, Walmart) had ML systems before 2022; generative AI extended them rather than replacing them
Public methodology and sourcesThe cases that hold up under scrutiny are the ones with attributable detail

The cases that did NOT make this list are equally instructive. Several mid-2023 vendor case studies named major brands that, when contacted by journalists, either denied the scope of the work or could not confirm the figures attributed to them. The lesson for buyers: never make a budget decision based on a vendor case study without an attributed quote and a callable reference.

Frequently asked questions

Why are most published AI case studies unreliable?

Because the genre is structurally promotional. Vendors write case studies to sell software; agencies write them to win awards; brands cooperate when the numbers flatter them and quietly ignore them when they do not. Unreliable cases share three features: no named source, no baseline metric, and a percentage gain unanchored to absolute numbers. When all three are missing, treat the case as advertising. The cases on this list were filtered for at least one named source and at least one absolute-number anchor.

What is the most replicable AI win for a smaller business?

The closest analogue to a small-business pattern in this list is content scaling (Unilever, Nestle's pattern, Heinz's programmatic creative work). For an under-100-person business, the same pattern works at smaller scale: generative AI to produce faster localisations, more creative variants, and better-personalised outbound. Our 90-day SME plan covers the implementation sequence.

Which case is most relevant if we are a regulated industry?

JPMorgan and Morgan Stanley are the cleanest references. Both prioritised internal deployment of foundation models with strict data isolation, both built around bounded knowledge and productivity use cases, and both expanded gradually with explicit governance. The lesson: in regulated contexts, building (or partnering for a private deployment) usually beats buying public-API SaaS for any workflow that touches customer or proprietary data.

How should we measure ROI on a case like Klarna's?

The Klarna case isolates the metric (work-equivalent of agents, resolution time, CSAT) which is rare. Most companies should aim for the same isolation: a single workflow, a baseline before AI, a controlled rollout, and a measurement window of months not weeks. Avoid ROI claims that mix multiple workflows, conflate productivity with revenue, or omit ongoing run-cost.

Is custom build always better than buying a SaaS tool?

No. Most of the cases on this list combine elements of both — a vendor model (often OpenAI) plus internal integration and grounding work. The pure-build vs pure-buy decision is rarely the actual choice. The real choice is "which part of the stack do we own and which part do we lease," and the answer depends on data sensitivity, scale, and whether the workflow is core to competitive position. Our vendor evaluation guide covers the framework.

Will AI replace whole job functions in companies like these?

The pattern visible across the cases is shift, not elimination. Customer service teams shrink and shift to QA and handover work. Marketing production teams reorganise around AI-augmented workflows. Knowledge workers gain more leverage per person. The functions that disappear are narrow tasks within roles, not whole roles wholesale, at least within the timeframe these cases cover.

What is the most underweighted lesson from these cases?

That the model is rarely the bottleneck. Across every case, the success or failure depended more on workflow scoping, owner accountability, change management, and data hygiene than on the choice of foundation model. Companies obsessing over which LLM to pick while neglecting those operational factors get worse outcomes than companies picking a competent default and investing in the operational work.

The bottom line

The verifiable AI wins of 2022-2025 share a profile that is narrower and less glamorous than the marketing version of the story. One workflow at a time. A real baseline. A named owner. A willingness to publish detail that survives follow-up questions. The leaders are not the loudest voices; they are running fewer, deeper projects and measuring honestly. The pattern compounds over years, not quarters. For broader strategy framing, see our AI for business pillar; for the marketing-specific subset of these cases, see our marketing case studies guide; for the vendor evaluation framework, see our development companies guide.

Last updated: May 2026