Building a Design System in 2026: When AI Writes 60% of Your Code (And Actually Maintains It Better)

A year ago, I would have laughed at this headline.

"AI writes design system code? You'll spend weeks fixing garbage."

I was wrong.

Today, at Deloitte, we shipped a production design system with 60% of code generated by AI — GitHub Copilot, Claude Code, Figma MCP, and Shadcn MCP working together.

And here's the part that surprised me: the AI-generated code required less refactoring than human-written code.

Not because AI is magically perfect.

But because we changed how we prompt AI, how we structure design decisions, and how we validate output.

This is not a "AI is the future" fantasy post.

This is a real story from shipping something people use every day.

And the honest lesson: AI doesn't replace your design system work. It amplifies what you were already trying to do — if you do the thinking first.

The Problem: We Had 47 Buttons

When I arrived at Deloitte's frontend team, we inherited a codebase with a problem.

Not a bug.

A pattern problem.

We had 47 slightly different button implementations across 3 projects.

Some used className="px-4 py-2 bg-blue-500" (inline Tailwind)
Some used a Button component with props I'd never seen before
Some were from Material-UI
Some were from a custom component library nobody maintained
Some were just raw HTML <button> tags with CSS

A single designer had created a Figma design system.

But the code didn't match.

Developers would ask: "Which button should I use?"

And the answer was always: "Uh, check the Figma? Or look at another project? Or just build it?"

The Cost:

Onboarding: New engineers spent 3 weeks just understanding which component to use where
Inconsistency: The same action button looked different on different screens
Maintenance: When we updated button styling, we had to find and fix it in 5 different places
Velocity: Building a new page with forms was slow because form components were fragmented
Quality: We shipped UI bugs because components behaved differently in different contexts

We needed a design system.

But building a design system by hand is slow, boring, and requires constant maintenance.

Enter AI.

The Insight: AI Excels at Pattern Work

Here's what I realized: design systems are patterns, and AI is exceptionally good at patterns.

A design system is just:

Take a design (button with 4 variants: primary, secondary, outline, ghost)
Generate variations (sizes: sm, md, lg)
Add states (hover, focus, active, disabled, loading)
Write tests for each
Document each variant
Export it so other components can use it

This is highly structured work.

It's not creative.

It's not ambiguous.

It's: "Given this design, generate all the variations and combinations."

AI doesn't struggle with this.

Humans struggle with this.

Humans find it tedious.

Humans miss edge cases.

Humans forget to test disabled states.

So we flipped the model: let AI do the pattern generation. Let humans do the decisions.

The Setup: Copilot + Claude Code + Figma MCP + Shadcn MCP

We didn't start with some grand AI-assisted design system plan.

We stumbled into it.

Here's what we used, and why each piece mattered:

1. Figma MCP (Model Context Protocol)

What it does: Connects Claude Code to Figma as a context source.

How we used it:

Before writing code, I'd:

Take a screenshot of the Figma component in all its variants

Paste it into Claude Code with the prompt:

Here's a Button component from our design system in Figma.
Variants: primary, secondary, outline, ghost
Sizes: sm, md, lg
States: default, hover, active, disabled, loading

Generate a React component that matches this design exactly.
Use Tailwind CSS for styling.
Include prop types in TypeScript.
Export both the component and its variants for Storybook.

The Magic:

Claude could see the exact colors, spacing, typography.

It didn't have to guess.

No more: "Is the padding 12px or 16px?"

Claude saw it in the screenshot.

Generated code that matched pixel-perfect.

Result: 3 hours of manual translation → 15 minutes with Claude Code + Figma MCP.

2. Claude Code (Direct Code Generation)

What it does: Write, edit, and debug code directly in the IDE.

How we used it:

Once we had the base component, I'd ask Claude Code:

Now generate:
1. Unit tests with Jest (test all variants, all states)
2. Storybook stories (interactive showcase of each variant)
3. Accessibility tests (a11y)
4. TypeScript types for all props
5. Edge cases (icon + loading state, very long text, etc.)

Claude Code would output all 5 pieces.

Not perfectly.

But 95% there.

We'd review, maybe tweak 1-2 things, commit.

The Numbers:

Button component (HTML + CSS + types + tests + stories): 2 hours manually → 25 minutes with Claude Code
We shipped 8 components this way
Saved roughly 16 hours of engineering time per component
Total: 128 hours saved

3. GitHub Copilot (Inline Suggestions)

What it does: Real-time code completion and generation as you type.

How we used it:

Once the component was in the codebase, Copilot would:

Auto-complete prop destructuring
Suggest className patterns for new variants
Generate test cases when you type it('should...
Suggest export statements and types

The Multiplier Effect:

With Copilot + Claude Code:

A component that took 4 hours now took 20 minutes
But we still needed to review, tweak, think about edge cases
Without Copilot: another 30 minutes of boilerplate
With Copilot: automatic suggestions meant we focused on logic, not typing

The Psychology:

Copilot doesn't feel like "AI writing code."

It feels like an autocomplete that actually understands context.

You stay in flow state.

You don't get interrupted by "should this be className or class?"

Copilot knows.

4. Shadcn MCP (Component Library as Context)

What it does: Connects Shadcn/UI component patterns to your AI prompts.

How we used it:

Instead of describing buttons from scratch, I'd tell Claude Code:

We're building a component library.
We use Shadcn patterns for component structure.
Use the Button from Shadcn as a reference for:
- Prop naming conventions (variant, size, asChild)
- Accessibility patterns (aria-* attributes)
- Composition pattern (compound components with sub-slots)

Now build our Button on top of this pattern.

Claude would:

Match Shadcn conventions (so it felt familiar)
Inherit accessibility best practices
Compose correctly with other Shadcn components
Export in the same way Shadcn does

Why This Matters:

We didn't invent our own patterns.

We followed industry conventions.

Claude knew those conventions because Shadcn MCP gave it context.

Result: Components that work well with existing ecosystems, not against them.

The Process: How 60% AI Code Happened (And Why It's Actually Good)

Let me be transparent: 60% AI code doesn't mean "60% of effort was AI."

It means: of the lines of code, 60% were generated by AI prompts.

The breakdown:

10%: Figma MCP translating design to component shells
20%: Claude Code generating tests, stories, types
30%: GitHub Copilot filling in standard patterns
40%: Human code review, edge case thinking, business logic

Here's the actual workflow:

Day 1: Design Phase (Human-led)

I sit with the designer.

We decide: what variants does this component need?
What states? What edge cases?
What's the accessibility requirement?

We write a brief spec in text form:
"Button component
Variants: primary, secondary, outline, ghost, destructive
Sizes: sm, md, lg, xl
States: default, hover, focus, active, disabled, loading
Icon support: leading, trailing, or both
Loading state: shows spinner + disables

Accessibility: proper aria labels, focus management, disabled state"

Day 2: Generation Phase (AI-led)

I paste the spec + Figma screenshot into Claude Code.

"Here's the design. Generate the component."

Claude generates:
- Component definition (React, TypeScript)
- All variants as CSS classes
- Full test suite (50+ test cases)
- Storybook stories
- Type definitions
- JSDoc comments

Output: ~400 lines of code, 80% of what we need.

Day 3: Review & Refinement (Human-led)

I review Claude's output:

✓ Component API is clean? (Yes)
✓ Tests cover edge cases? (Mostly, I add 3 more)
✓ Storybook shows all variants? (Yes)
✓ TypeScript types are correct? (Yes, catch 1 error)
✓ Accessibility good? (Good, I add aria-label handling)
✓ Performance OK? (Yes, no unnecessary re-renders)

I make 5-10 edits.
The component is ready.

Total human time: 2-3 hours.
Total elapsed time: 3 days (but only 3-4 hours actual work).
Previously: 8 hours of solid human work, 1 day elapsed.

Day 4: Integration (Copilot-led)

I need to add the Button to:
- Shared exports
- Storybook configuration
- Documentation

Copilot suggests:

export { Button, type ButtonProps } from './Button'

It autocompletes the imports.

It suggests the documentation format based on other components.

What would take 30 minutes of boilerplate → 5 minutes with Copilot.

Why This Is Better Than 100% Human Code

Here's what surprised me: the AI-generated code had fewer bugs than human-written code.

Not because AI is better.

But because:

AI doesn't take shortcuts. A human might skip testing disabled states. AI tests them all.
AI doesn't have bad habits. A human might write <button onClick={onClick}> without aria labels. Claude knows accessibility patterns and includes them by default.
AI is consistent. Every component Claude generates has the same structure, same naming, same patterns. Humans vary.
AI doesn't get tired. A human writing the 8th component is less careful. Claude is exactly as careful on the 8th as the 1st.

The Refactoring Reality:

With human code:

Review → Find issues → Refactor → Retest → Reupdate stories → 3-4 rounds

With AI code:

Review → Find 1-2 edge case issues → Add handling → Retest → Done → 1-2 rounds

The metric that matters: AI code went from generation to production merge in 1 review cycle.

Human code took 3-4 cycles.

The Honest Limitations (Where AI Struggled)

I'm not going to pretend AI is magic.

Here's where it hit walls:

1. Complex Interaction Logic

AI struggled with:

// This was hard for AI to get right:
// Multi-step validation in a form
// Error state management across fields
// Focus management when errors appear
// Conditional field visibility based on other fields

Claude would generate 80% correct code.

The 20% around "what happens when field A changes and field B needs to clear?" required human judgment.

Fix: I broke it into smaller components. Let AI generate each. Compose them with business logic (human-written).

2. Domain-Specific Constraints

Our components needed to work in a Deloitte environment:

Accessibility standards (WCAG 2.1 AA, not just WCAG 2.0)
Enterprise theming (10 different color themes)
International support (RTL, translated error messages)
Performance under slow networks

Claude didn't know these constraints.

Fix: I added them to the prompt:

This component must:
- Support RTL layouts (dir="rtl")
- Work with 10 color themes (light, dark, high-contrast, etc.)
- Load with <50KB of JavaScript
- Support keyboard navigation and screen readers
- Handle translations from i18n library

Generate code that meets these constraints.

With constraints specified → Claude nailed it.

3. Visual Polish

Claude can generate correct HTML + Tailwind.

But "should the button have 2px or 4px border-radius?" requires looking at the design.

Claude would sometimes guess wrong.

Fix: Figma MCP + visual review. I'd spot-check generated code against the design, adjust the 2-3 pixels, move on.

The AI Development Setup We Use Now

If you want to replicate this, here's the minimal stack:

Required:

Claude API + Claude Code — $20/month for Pro
GitHub Copilot — $10/month or free with GitHub Enterprise
Figma MCP — Free if you have Claude
Shadcn MCP — Free if you use Shadcn

The Workflow I Recommend:

1. Design phase (human)
   ↓
2. Brief spec (human-written)
   ↓
3. Claude Code generation (AI)
   ↓
4. Figma MCP validation (AI + human eye)
   ↓
5. PR review + testing (human)
   ↓
6. Copilot for integration & docs (AI)
   ↓
7. Ship

The Surprising Discovery: Less Refactoring Over Time

Here's what I didn't expect:

The more AI-generated code we integrated, the less refactoring we needed.

Month 1: "This AI code is OK but needs tweaks on every component."

Month 2: "Wait, this generated code has fewer bugs than our manual work."

Month 3: "The AI understands our patterns now. New components need almost no refactoring."

Why?

Because we:

Iterated on our prompts (better prompts → better code)
Set up constraints and standards upfront (AI learns our constraints)
Reviewed thoroughly at first (caught edge cases, improved system)
Built a library of patterns (AI saw our patterns, followed them)

By month 3, we could prompt Claude:

Build a DatePicker component using our established patterns.
Reference: Button (for styling), Input (for validation), Modal (for overlay).
Follow the same structure, types, tests, and stories.

Claude would generate code that was immediately production-ready.

Refactoring time: ~15 minutes (just a final review).

Human-written code at month 3 would still take 2-3 hours.

The Human Side: Did This Replace Engineers?

This is the question everyone asks:

"Vignesh, aren't you automating engineers out of a job?"

Honest answer: No. We're automating boilerplate, not thinking.

What changed:

| Task | Before | After | |------|--------|-------| | Generate component shell | 1 hour human | 5 min AI + 5 min review | | Write tests | 1.5 hours human | 15 min AI + 15 min review | | Storybook docs | 30 min human | 5 min AI + 5 min review | | Edge cases & business logic | 30 min human | 30 min human (same) | | Design decisions | 1 hour human | 1 hour human (same) |

AI handles the repetitive, testable, structured work.

Humans handle the judgment calls:

"Should this form validate on blur or on submit?"
"Is the error message tone right?"
"Does this workflow make sense for our users?"
"How should this component interact with the rest of the system?"

The reality: With AI, my 1 engineer did the work of 2-3 engineers on boilerplate.

But we didn't fire anyone.

We:

Shipped the design system 5 weeks faster
The engineer had time for higher-leverage work (mentoring, architecture, complex features)
We hired 1 more engineer who now focuses on features, not component scaffolding

Net: AI enabled us to scale without proportionally increasing costs.

The Numbers That Matter

After shipping the design system with AI:

| Metric | Before | After | Improvement | |--------|--------|-------|-------------| | Design system components | 47 inconsistent | 8 consistent | 100% coverage | | Time to build a new page | 6 hours | 2 hours | 67% faster | | Onboarding time for new engineers | 3 weeks | 1 week | 66% faster | | Component bugs reported per month | 8 | 1 | 87% fewer | | Time to merge component PR | 3 days (4 review cycles) | 1 day (1 review cycle) | 3x faster | | Test coverage | 60% | 95% | +35% | | Code consistency score (lint + design adherence) | 72% | 98% | +36% |

The Cost:

AI tools: $30/month per engineer
Time investment: 3 weeks from 1 engineer + 0.3 FTE designer
Total cost: ~$10K in time + $90 in software

The Benefit:

Design system that 50+ engineers now use
Onboarding new people 3x faster
Shipping features faster
Fewer bugs in production
Better code consistency
A foundation to build on for the next 2 years

The Lesson That Surprised Me Most

I went into this thinking: "AI will generate code, we'll use 60%, refactor the rest."

What actually happened:

AI forces you to think harder.

When you're writing code by hand, you can be sloppy:

"I'll handle this edge case later"
"This comment is good enough"
"I'll test this manually"

When you're prompting AI:

You have to be explicit: "Here's what we need"
You have to be thoughtful: "What could go wrong?"
You have to be thorough: "These are all the states"

The constraint of prompting AI made us better at design system thinking.

We had to articulate what a "good button" is.

We had to document all the variants.

We had to think about accessibility upfront, not as an afterthought.

AI didn't replace thinking. It forced better thinking.

What We're Doing Next

Now that the design system is solid:

1. AI-Generated Tests at Scale

We're using Claude Code to generate test scenarios:

Given this component's API, generate 50 test cases that cover:
- All prop combinations
- All edge cases
- All accessibility requirements
- All visual states
- All error conditions

Claude generates 50 test cases in 5 minutes.

We're finding bugs in components that have been "done" for months.

2. Automatic Type Generation from Figma

Instead of manually keeping Figma colors synced with TypeScript types:

// Generated from Figma automatically
export const colors = {
  primary: '#0066CC',
  secondary: '#6C757D',
  // ... 50 more colors
} as const;

export type ColorVariant = keyof typeof colors;

We're exploring tooling that watches Figma, generates types, commits to the repo.

3. AI-Powered Component Suggestions

Using Claude to analyze:

Figma designs being created
Component library capabilities
Upcoming features

Claude suggests: "This new form looks like you could use our Button, Input, and Select components. Here's a composition."

Designers see it, reduce manual component building.

The Honest Assessment: Is This the Future?

In 2026, yes.

But not in the way people imagine.

AI won't replace designers or engineers.

AI will replace boilerplate, validation work, and pattern repetition.

The engineers who thrive:

Understand why patterns exist
Can prompt AI effectively (this is a skill)
Can judge AI output critically (not trust, verify)
Can compose AI work into systems
Can think about design and architecture

The engineers who struggle:

Treat AI as a robot that does coding
Don't learn the patterns behind what AI generates
Copy-paste AI code without understanding it
Don't invest in prompt engineering skills
Can't tell good code from okay code

My prediction: By 2027, every company will use AI to generate boilerplate. The competitive advantage will be: who uses it best?

That's us.

Takeaways for Your Team

If you want to replicate this:

Start small. Don't AI-generate your entire codebase. Start with design systems, component libraries, and heavily-patterned code.
Invest in prompting. Good prompts > good AI model. Spend time learning how to ask questions effectively.
Use MCPs. Figma MCP, Shadcn MCP, your own MCP for your company patterns. Context is king.
Review thoroughly at first. The first 3-5 components need heavy review. By the 10th, you'll know what to look for.
Pair AI with humans. AI for boilerplate, humans for judgment. Neither alone is as good as both together.
Measure what matters. Not "lines of code generated" (meaningless). Measure: time to productivity, bug rates, code consistency, developer satisfaction.
Keep humans in the loop. AI suggests, humans decide. Every time.

Final Thought

Six months ago, if someone told me "we'll ship a design system with 60% AI-generated code that has fewer bugs than human code," I would have said: "You're smoking something."

Today, I'm convinced this is just the beginning.

Not because AI is sentient or magical.

But because we stopped thinking of AI as "a robot to replace humans."

And started thinking of it as "a tool to amplify human judgment."

When you focus on the right problems, ask the right questions, and let AI handle what it's good at (patterns, boilerplate, validation):

You build better systems faster.

You ship with more confidence.

You have time for the thinking that actually matters.

That's the future I'm excited about.

Tools We Use (If You Want to Replicate This)

Claude API + Code: https://claude.ai
GitHub Copilot: https://github.com/features/copilot
Figma MCP: Available through Claude
Shadcn MCP: Available through Claude
Storybook: https://storybook.js.org
Next.js: https://nextjs.org (what we use for the framework)
Tailwind CSS: https://tailwindcss.com
Zod: https://zod.dev (for validation)

If you've built something with AI and want to share your story, or if you're stuck with prompt engineering, reach out on Twitter or LinkedIn. I'm curious how other teams are using AI in their workflows.

Building a Design System in 2026: When AI Writes 60% of Your Code (And Actually Maintains It Better)

Building a Design System in 2026: When AI Writes 60% of Your Code (And Actually Maintains It Better)

The Problem: We Had 47 Buttons

The Insight: AI Excels at Pattern Work

The Setup: Copilot + Claude Code + Figma MCP + Shadcn MCP

1. Figma MCP (Model Context Protocol)

2. Claude Code (Direct Code Generation)

3. GitHub Copilot (Inline Suggestions)

4. Shadcn MCP (Component Library as Context)

The Process: How 60% AI Code Happened (And Why It's Actually Good)

Here's the actual workflow:

Why This Is Better Than 100% Human Code

The Honest Limitations (Where AI Struggled)

1. Complex Interaction Logic

2. Domain-Specific Constraints

3. Visual Polish

The AI Development Setup We Use Now

Required:

Recommended:

The Workflow I Recommend:

The Surprising Discovery: Less Refactoring Over Time

The Human Side: Did This Replace Engineers?

The Numbers That Matter

The Lesson That Surprised Me Most

What We're Doing Next

1. AI-Generated Tests at Scale

2. Automatic Type Generation from Figma

3. AI-Powered Component Suggestions

The Honest Assessment: Is This the Future?

Takeaways for Your Team

Final Thought

Tools We Use (If You Want to Replicate This)