hero gradient
Back to Blogs

Building a Design System in 2026: When AI Writes 60% of Your Code (And Actually Maintains It Better)

Building a Design System in 2026: When AI Writes 60% of Your Code (And Actually Maintains It Better)

A year ago, I would have laughed at this headline.

"AI writes design system code? You'll spend weeks fixing garbage."

I was wrong.

Today, at Deloitte, we shipped a production design system with 60% of code generated by AI — GitHub Copilot, Claude Code, Figma MCP, and Shadcn MCP working together.

And here's the part that surprised me: the AI-generated code required less refactoring than human-written code.

Not because AI is magically perfect.

But because we changed how we prompt AI, how we structure design decisions, and how we validate output.

This is not a "AI is the future" fantasy post.

This is a real story from shipping something people use every day.

And the honest lesson: AI doesn't replace your design system work. It amplifies what you were already trying to do — if you do the thinking first.


The Problem: We Had 47 Buttons

When I arrived at Deloitte's frontend team, we inherited a codebase with a problem.

Not a bug.

A pattern problem.

We had 47 slightly different button implementations across 3 projects.

A single designer had created a Figma design system.

But the code didn't match.

Developers would ask: "Which button should I use?"

And the answer was always: "Uh, check the Figma? Or look at another project? Or just build it?"

The Cost:

We needed a design system.

But building a design system by hand is slow, boring, and requires constant maintenance.

Enter AI.


The Insight: AI Excels at Pattern Work

Here's what I realized: design systems are patterns, and AI is exceptionally good at patterns.

A design system is just:

This is highly structured work.

It's not creative.

It's not ambiguous.

It's: "Given this design, generate all the variations and combinations."

AI doesn't struggle with this.

Humans struggle with this.

Humans find it tedious.

Humans miss edge cases.

Humans forget to test disabled states.

So we flipped the model: let AI do the pattern generation. Let humans do the decisions.


The Setup: Copilot + Claude Code + Figma MCP + Shadcn MCP

We didn't start with some grand AI-assisted design system plan.

We stumbled into it.

Here's what we used, and why each piece mattered:

1. Figma MCP (Model Context Protocol)

What it does: Connects Claude Code to Figma as a context source.

How we used it:

Before writing code, I'd:

  1. Take a screenshot of the Figma component in all its variants
  2. Paste it into Claude Code with the prompt:
    Here's a Button component from our design system in Figma.
    Variants: primary, secondary, outline, ghost
    Sizes: sm, md, lg
    States: default, hover, active, disabled, loading
    
    Generate a React component that matches this design exactly.
    Use Tailwind CSS for styling.
    Include prop types in TypeScript.
    Export both the component and its variants for Storybook.
    

The Magic:

Claude could see the exact colors, spacing, typography.

It didn't have to guess.

No more: "Is the padding 12px or 16px?"

Claude saw it in the screenshot.

Generated code that matched pixel-perfect.

Result: 3 hours of manual translation → 15 minutes with Claude Code + Figma MCP.

2. Claude Code (Direct Code Generation)

What it does: Write, edit, and debug code directly in the IDE.

How we used it:

Once we had the base component, I'd ask Claude Code:

Now generate:
1. Unit tests with Jest (test all variants, all states)
2. Storybook stories (interactive showcase of each variant)
3. Accessibility tests (a11y)
4. TypeScript types for all props
5. Edge cases (icon + loading state, very long text, etc.)

Claude Code would output all 5 pieces.

Not perfectly.

But 95% there.

We'd review, maybe tweak 1-2 things, commit.

The Numbers:

3. GitHub Copilot (Inline Suggestions)

What it does: Real-time code completion and generation as you type.

How we used it:

Once the component was in the codebase, Copilot would:

The Multiplier Effect:

With Copilot + Claude Code:

The Psychology:

Copilot doesn't feel like "AI writing code."

It feels like an autocomplete that actually understands context.

You stay in flow state.

You don't get interrupted by "should this be className or class?"

Copilot knows.

4. Shadcn MCP (Component Library as Context)

What it does: Connects Shadcn/UI component patterns to your AI prompts.

How we used it:

Instead of describing buttons from scratch, I'd tell Claude Code:

We're building a component library.
We use Shadcn patterns for component structure.
Use the Button from Shadcn as a reference for:
- Prop naming conventions (variant, size, asChild)
- Accessibility patterns (aria-* attributes)
- Composition pattern (compound components with sub-slots)

Now build our Button on top of this pattern.

Claude would:

Why This Matters:

We didn't invent our own patterns.

We followed industry conventions.

Claude knew those conventions because Shadcn MCP gave it context.

Result: Components that work well with existing ecosystems, not against them.


The Process: How 60% AI Code Happened (And Why It's Actually Good)

Let me be transparent: 60% AI code doesn't mean "60% of effort was AI."

It means: of the lines of code, 60% were generated by AI prompts.

The breakdown:

Here's the actual workflow:

Day 1: Design Phase (Human-led)

I sit with the designer.

We decide: what variants does this component need?
What states? What edge cases?
What's the accessibility requirement?

We write a brief spec in text form:
"Button component
Variants: primary, secondary, outline, ghost, destructive
Sizes: sm, md, lg, xl
States: default, hover, focus, active, disabled, loading
Icon support: leading, trailing, or both
Loading state: shows spinner + disables

Accessibility: proper aria labels, focus management, disabled state"

Day 2: Generation Phase (AI-led)

I paste the spec + Figma screenshot into Claude Code.

"Here's the design. Generate the component."

Claude generates:
- Component definition (React, TypeScript)
- All variants as CSS classes
- Full test suite (50+ test cases)
- Storybook stories
- Type definitions
- JSDoc comments

Output: ~400 lines of code, 80% of what we need.

Day 3: Review & Refinement (Human-led)

I review Claude's output:

✓ Component API is clean? (Yes)
✓ Tests cover edge cases? (Mostly, I add 3 more)
✓ Storybook shows all variants? (Yes)
✓ TypeScript types are correct? (Yes, catch 1 error)
✓ Accessibility good? (Good, I add aria-label handling)
✓ Performance OK? (Yes, no unnecessary re-renders)

I make 5-10 edits.
The component is ready.

Total human time: 2-3 hours.
Total elapsed time: 3 days (but only 3-4 hours actual work).
Previously: 8 hours of solid human work, 1 day elapsed.

Day 4: Integration (Copilot-led)

I need to add the Button to:
- Shared exports
- Storybook configuration
- Documentation

Copilot suggests:

export { Button, type ButtonProps } from './Button'

It autocompletes the imports.

It suggests the documentation format based on other components.

What would take 30 minutes of boilerplate → 5 minutes with Copilot.

Why This Is Better Than 100% Human Code

Here's what surprised me: the AI-generated code had fewer bugs than human-written code.

Not because AI is better.

But because:

  1. AI doesn't take shortcuts. A human might skip testing disabled states. AI tests them all.
  2. AI doesn't have bad habits. A human might write <button onClick={onClick}> without aria labels. Claude knows accessibility patterns and includes them by default.
  3. AI is consistent. Every component Claude generates has the same structure, same naming, same patterns. Humans vary.
  4. AI doesn't get tired. A human writing the 8th component is less careful. Claude is exactly as careful on the 8th as the 1st.

The Refactoring Reality:

With human code:

With AI code:

The metric that matters: AI code went from generation to production merge in 1 review cycle.

Human code took 3-4 cycles.


The Honest Limitations (Where AI Struggled)

I'm not going to pretend AI is magic.

Here's where it hit walls:

1. Complex Interaction Logic

AI struggled with:

// This was hard for AI to get right:
// Multi-step validation in a form
// Error state management across fields
// Focus management when errors appear
// Conditional field visibility based on other fields

Claude would generate 80% correct code.

The 20% around "what happens when field A changes and field B needs to clear?" required human judgment.

Fix: I broke it into smaller components. Let AI generate each. Compose them with business logic (human-written).

2. Domain-Specific Constraints

Our components needed to work in a Deloitte environment:

Claude didn't know these constraints.

Fix: I added them to the prompt:

This component must:
- Support RTL layouts (dir="rtl")
- Work with 10 color themes (light, dark, high-contrast, etc.)
- Load with <50KB of JavaScript
- Support keyboard navigation and screen readers
- Handle translations from i18n library

Generate code that meets these constraints.

With constraints specified → Claude nailed it.

3. Visual Polish

Claude can generate correct HTML + Tailwind.

But "should the button have 2px or 4px border-radius?" requires looking at the design.

Claude would sometimes guess wrong.

Fix: Figma MCP + visual review. I'd spot-check generated code against the design, adjust the 2-3 pixels, move on.


The AI Development Setup We Use Now

If you want to replicate this, here's the minimal stack:

Required:

  1. Claude API + Claude Code — $20/month for Pro
  2. GitHub Copilot — $10/month or free with GitHub Enterprise
  3. Figma MCP — Free if you have Claude
  4. Shadcn MCP — Free if you use Shadcn

Recommended:

  1. Vercel for deployment — Free tier fine
  2. Storybook for documentation — Free, open-source
  3. GitHub for version control + PR review — Free
  4. Datadog or similar for monitoring — Monitor generated code's performance

The Workflow I Recommend:

1. Design phase (human)
   ↓
2. Brief spec (human-written)
   ↓
3. Claude Code generation (AI)
   ↓
4. Figma MCP validation (AI + human eye)
   ↓
5. PR review + testing (human)
   ↓
6. Copilot for integration & docs (AI)
   ↓
7. Ship

The Surprising Discovery: Less Refactoring Over Time

Here's what I didn't expect:

The more AI-generated code we integrated, the less refactoring we needed.

Month 1: "This AI code is OK but needs tweaks on every component."

Month 2: "Wait, this generated code has fewer bugs than our manual work."

Month 3: "The AI understands our patterns now. New components need almost no refactoring."

Why?

Because we:

  1. Iterated on our prompts (better prompts → better code)
  2. Set up constraints and standards upfront (AI learns our constraints)
  3. Reviewed thoroughly at first (caught edge cases, improved system)
  4. Built a library of patterns (AI saw our patterns, followed them)

By month 3, we could prompt Claude:

Build a DatePicker component using our established patterns.
Reference: Button (for styling), Input (for validation), Modal (for overlay).
Follow the same structure, types, tests, and stories.

Claude would generate code that was immediately production-ready.

Refactoring time: ~15 minutes (just a final review).

Human-written code at month 3 would still take 2-3 hours.


The Human Side: Did This Replace Engineers?

This is the question everyone asks:

"Vignesh, aren't you automating engineers out of a job?"

Honest answer: No. We're automating boilerplate, not thinking.

What changed:

| Task | Before | After | |------|--------|-------| | Generate component shell | 1 hour human | 5 min AI + 5 min review | | Write tests | 1.5 hours human | 15 min AI + 15 min review | | Storybook docs | 30 min human | 5 min AI + 5 min review | | Edge cases & business logic | 30 min human | 30 min human (same) | | Design decisions | 1 hour human | 1 hour human (same) |

AI handles the repetitive, testable, structured work.

Humans handle the judgment calls:

The reality: With AI, my 1 engineer did the work of 2-3 engineers on boilerplate.

But we didn't fire anyone.

We:

  1. Shipped the design system 5 weeks faster
  2. The engineer had time for higher-leverage work (mentoring, architecture, complex features)
  3. We hired 1 more engineer who now focuses on features, not component scaffolding

Net: AI enabled us to scale without proportionally increasing costs.


The Numbers That Matter

After shipping the design system with AI:

| Metric | Before | After | Improvement | |--------|--------|-------|-------------| | Design system components | 47 inconsistent | 8 consistent | 100% coverage | | Time to build a new page | 6 hours | 2 hours | 67% faster | | Onboarding time for new engineers | 3 weeks | 1 week | 66% faster | | Component bugs reported per month | 8 | 1 | 87% fewer | | Time to merge component PR | 3 days (4 review cycles) | 1 day (1 review cycle) | 3x faster | | Test coverage | 60% | 95% | +35% | | Code consistency score (lint + design adherence) | 72% | 98% | +36% |

The Cost:

The Benefit:


The Lesson That Surprised Me Most

I went into this thinking: "AI will generate code, we'll use 60%, refactor the rest."

What actually happened:

AI forces you to think harder.

When you're writing code by hand, you can be sloppy:

When you're prompting AI:

The constraint of prompting AI made us better at design system thinking.

We had to articulate what a "good button" is.

We had to document all the variants.

We had to think about accessibility upfront, not as an afterthought.

AI didn't replace thinking. It forced better thinking.


What We're Doing Next

Now that the design system is solid:

1. AI-Generated Tests at Scale

We're using Claude Code to generate test scenarios:

Given this component's API, generate 50 test cases that cover:
- All prop combinations
- All edge cases
- All accessibility requirements
- All visual states
- All error conditions

Claude generates 50 test cases in 5 minutes.

We're finding bugs in components that have been "done" for months.

2. Automatic Type Generation from Figma

Instead of manually keeping Figma colors synced with TypeScript types:

// Generated from Figma automatically
export const colors = {
  primary: '#0066CC',
  secondary: '#6C757D',
  // ... 50 more colors
} as const;

export type ColorVariant = keyof typeof colors;

We're exploring tooling that watches Figma, generates types, commits to the repo.

3. AI-Powered Component Suggestions

Using Claude to analyze:

Claude suggests: "This new form looks like you could use our Button, Input, and Select components. Here's a composition."

Designers see it, reduce manual component building.


The Honest Assessment: Is This the Future?

In 2026, yes.

But not in the way people imagine.

AI won't replace designers or engineers.

AI will replace boilerplate, validation work, and pattern repetition.

The engineers who thrive:

The engineers who struggle:

My prediction: By 2027, every company will use AI to generate boilerplate. The competitive advantage will be: who uses it best?

That's us.


Takeaways for Your Team

If you want to replicate this:

  1. Start small. Don't AI-generate your entire codebase. Start with design systems, component libraries, and heavily-patterned code.

  2. Invest in prompting. Good prompts > good AI model. Spend time learning how to ask questions effectively.

  3. Use MCPs. Figma MCP, Shadcn MCP, your own MCP for your company patterns. Context is king.

  4. Review thoroughly at first. The first 3-5 components need heavy review. By the 10th, you'll know what to look for.

  5. Pair AI with humans. AI for boilerplate, humans for judgment. Neither alone is as good as both together.

  6. Measure what matters. Not "lines of code generated" (meaningless). Measure: time to productivity, bug rates, code consistency, developer satisfaction.

  7. Keep humans in the loop. AI suggests, humans decide. Every time.


Final Thought

Six months ago, if someone told me "we'll ship a design system with 60% AI-generated code that has fewer bugs than human code," I would have said: "You're smoking something."

Today, I'm convinced this is just the beginning.

Not because AI is sentient or magical.

But because we stopped thinking of AI as "a robot to replace humans."

And started thinking of it as "a tool to amplify human judgment."

When you focus on the right problems, ask the right questions, and let AI handle what it's good at (patterns, boilerplate, validation):

You build better systems faster.

You ship with more confidence.

You have time for the thinking that actually matters.

That's the future I'm excited about.


Tools We Use (If You Want to Replicate This)


If you've built something with AI and want to share your story, or if you're stuck with prompt engineering, reach out on Twitter or LinkedIn. I'm curious how other teams are using AI in their workflows.

Back to Blogs
hero gradient