Building a Design System in 2026: When AI Writes 60% of Your Code (And Actually Maintains It Better)
A year ago, I would have laughed at this headline.
"AI writes design system code? You'll spend weeks fixing garbage."
I was wrong.
Today, at Deloitte, we shipped a production design system with 60% of code generated by AI — GitHub Copilot, Claude Code, Figma MCP, and Shadcn MCP working together.
And here's the part that surprised me: the AI-generated code required less refactoring than human-written code.
Not because AI is magically perfect.
But because we changed how we prompt AI, how we structure design decisions, and how we validate output.
This is not a "AI is the future" fantasy post.
This is a real story from shipping something people use every day.
And the honest lesson: AI doesn't replace your design system work. It amplifies what you were already trying to do — if you do the thinking first.
The Problem: We Had 47 Buttons
When I arrived at Deloitte's frontend team, we inherited a codebase with a problem.
Not a bug.
A pattern problem.
We had 47 slightly different button implementations across 3 projects.
- Some used
className="px-4 py-2 bg-blue-500"(inline Tailwind) - Some used a
Buttoncomponent with props I'd never seen before - Some were from Material-UI
- Some were from a custom component library nobody maintained
- Some were just raw HTML
<button>tags with CSS
A single designer had created a Figma design system.
But the code didn't match.
Developers would ask: "Which button should I use?"
And the answer was always: "Uh, check the Figma? Or look at another project? Or just build it?"
The Cost:
- Onboarding: New engineers spent 3 weeks just understanding which component to use where
- Inconsistency: The same action button looked different on different screens
- Maintenance: When we updated button styling, we had to find and fix it in 5 different places
- Velocity: Building a new page with forms was slow because form components were fragmented
- Quality: We shipped UI bugs because components behaved differently in different contexts
We needed a design system.
But building a design system by hand is slow, boring, and requires constant maintenance.
Enter AI.
The Insight: AI Excels at Pattern Work
Here's what I realized: design systems are patterns, and AI is exceptionally good at patterns.
A design system is just:
- Take a design (button with 4 variants: primary, secondary, outline, ghost)
- Generate variations (sizes: sm, md, lg)
- Add states (hover, focus, active, disabled, loading)
- Write tests for each
- Document each variant
- Export it so other components can use it
This is highly structured work.
It's not creative.
It's not ambiguous.
It's: "Given this design, generate all the variations and combinations."
AI doesn't struggle with this.
Humans struggle with this.
Humans find it tedious.
Humans miss edge cases.
Humans forget to test disabled states.
So we flipped the model: let AI do the pattern generation. Let humans do the decisions.
The Setup: Copilot + Claude Code + Figma MCP + Shadcn MCP
We didn't start with some grand AI-assisted design system plan.
We stumbled into it.
Here's what we used, and why each piece mattered:
1. Figma MCP (Model Context Protocol)
What it does: Connects Claude Code to Figma as a context source.
How we used it:
Before writing code, I'd:
- Take a screenshot of the Figma component in all its variants
- Paste it into Claude Code with the prompt:
Here's a Button component from our design system in Figma. Variants: primary, secondary, outline, ghost Sizes: sm, md, lg States: default, hover, active, disabled, loading Generate a React component that matches this design exactly. Use Tailwind CSS for styling. Include prop types in TypeScript. Export both the component and its variants for Storybook.
The Magic:
Claude could see the exact colors, spacing, typography.
It didn't have to guess.
No more: "Is the padding 12px or 16px?"
Claude saw it in the screenshot.
Generated code that matched pixel-perfect.
Result: 3 hours of manual translation → 15 minutes with Claude Code + Figma MCP.
2. Claude Code (Direct Code Generation)
What it does: Write, edit, and debug code directly in the IDE.
How we used it:
Once we had the base component, I'd ask Claude Code:
Now generate:
1. Unit tests with Jest (test all variants, all states)
2. Storybook stories (interactive showcase of each variant)
3. Accessibility tests (a11y)
4. TypeScript types for all props
5. Edge cases (icon + loading state, very long text, etc.)
Claude Code would output all 5 pieces.
Not perfectly.
But 95% there.
We'd review, maybe tweak 1-2 things, commit.
The Numbers:
- Button component (HTML + CSS + types + tests + stories): 2 hours manually → 25 minutes with Claude Code
- We shipped 8 components this way
- Saved roughly 16 hours of engineering time per component
- Total: 128 hours saved
3. GitHub Copilot (Inline Suggestions)
What it does: Real-time code completion and generation as you type.
How we used it:
Once the component was in the codebase, Copilot would:
- Auto-complete prop destructuring
- Suggest className patterns for new variants
- Generate test cases when you type
it('should... - Suggest export statements and types
The Multiplier Effect:
With Copilot + Claude Code:
- A component that took 4 hours now took 20 minutes
- But we still needed to review, tweak, think about edge cases
- Without Copilot: another 30 minutes of boilerplate
- With Copilot: automatic suggestions meant we focused on logic, not typing
The Psychology:
Copilot doesn't feel like "AI writing code."
It feels like an autocomplete that actually understands context.
You stay in flow state.
You don't get interrupted by "should this be className or class?"
Copilot knows.
4. Shadcn MCP (Component Library as Context)
What it does: Connects Shadcn/UI component patterns to your AI prompts.
How we used it:
Instead of describing buttons from scratch, I'd tell Claude Code:
We're building a component library.
We use Shadcn patterns for component structure.
Use the Button from Shadcn as a reference for:
- Prop naming conventions (variant, size, asChild)
- Accessibility patterns (aria-* attributes)
- Composition pattern (compound components with sub-slots)
Now build our Button on top of this pattern.
Claude would:
- Match Shadcn conventions (so it felt familiar)
- Inherit accessibility best practices
- Compose correctly with other Shadcn components
- Export in the same way Shadcn does
Why This Matters:
We didn't invent our own patterns.
We followed industry conventions.
Claude knew those conventions because Shadcn MCP gave it context.
Result: Components that work well with existing ecosystems, not against them.
The Process: How 60% AI Code Happened (And Why It's Actually Good)
Let me be transparent: 60% AI code doesn't mean "60% of effort was AI."
It means: of the lines of code, 60% were generated by AI prompts.
The breakdown:
- 10%: Figma MCP translating design to component shells
- 20%: Claude Code generating tests, stories, types
- 30%: GitHub Copilot filling in standard patterns
- 40%: Human code review, edge case thinking, business logic
Here's the actual workflow:
Day 1: Design Phase (Human-led)
I sit with the designer.
We decide: what variants does this component need?
What states? What edge cases?
What's the accessibility requirement?
We write a brief spec in text form:
"Button component
Variants: primary, secondary, outline, ghost, destructive
Sizes: sm, md, lg, xl
States: default, hover, focus, active, disabled, loading
Icon support: leading, trailing, or both
Loading state: shows spinner + disables
Accessibility: proper aria labels, focus management, disabled state"
Day 2: Generation Phase (AI-led)
I paste the spec + Figma screenshot into Claude Code.
"Here's the design. Generate the component."
Claude generates:
- Component definition (React, TypeScript)
- All variants as CSS classes
- Full test suite (50+ test cases)
- Storybook stories
- Type definitions
- JSDoc comments
Output: ~400 lines of code, 80% of what we need.
Day 3: Review & Refinement (Human-led)
I review Claude's output:
✓ Component API is clean? (Yes)
✓ Tests cover edge cases? (Mostly, I add 3 more)
✓ Storybook shows all variants? (Yes)
✓ TypeScript types are correct? (Yes, catch 1 error)
✓ Accessibility good? (Good, I add aria-label handling)
✓ Performance OK? (Yes, no unnecessary re-renders)
I make 5-10 edits.
The component is ready.
Total human time: 2-3 hours.
Total elapsed time: 3 days (but only 3-4 hours actual work).
Previously: 8 hours of solid human work, 1 day elapsed.
Day 4: Integration (Copilot-led)
I need to add the Button to:
- Shared exports
- Storybook configuration
- Documentation
Copilot suggests:
export { Button, type ButtonProps } from './Button'
It autocompletes the imports.
It suggests the documentation format based on other components.
What would take 30 minutes of boilerplate → 5 minutes with Copilot.
Why This Is Better Than 100% Human Code
Here's what surprised me: the AI-generated code had fewer bugs than human-written code.
Not because AI is better.
But because:
- AI doesn't take shortcuts. A human might skip testing disabled states. AI tests them all.
- AI doesn't have bad habits. A human might write
<button onClick={onClick}>without aria labels. Claude knows accessibility patterns and includes them by default. - AI is consistent. Every component Claude generates has the same structure, same naming, same patterns. Humans vary.
- AI doesn't get tired. A human writing the 8th component is less careful. Claude is exactly as careful on the 8th as the 1st.
The Refactoring Reality:
With human code:
- Review → Find issues → Refactor → Retest → Reupdate stories → 3-4 rounds
With AI code:
- Review → Find 1-2 edge case issues → Add handling → Retest → Done → 1-2 rounds
The metric that matters: AI code went from generation to production merge in 1 review cycle.
Human code took 3-4 cycles.
The Honest Limitations (Where AI Struggled)
I'm not going to pretend AI is magic.
Here's where it hit walls:
1. Complex Interaction Logic
AI struggled with:
// This was hard for AI to get right:
// Multi-step validation in a form
// Error state management across fields
// Focus management when errors appear
// Conditional field visibility based on other fields
Claude would generate 80% correct code.
The 20% around "what happens when field A changes and field B needs to clear?" required human judgment.
Fix: I broke it into smaller components. Let AI generate each. Compose them with business logic (human-written).
2. Domain-Specific Constraints
Our components needed to work in a Deloitte environment:
- Accessibility standards (WCAG 2.1 AA, not just WCAG 2.0)
- Enterprise theming (10 different color themes)
- International support (RTL, translated error messages)
- Performance under slow networks
Claude didn't know these constraints.
Fix: I added them to the prompt:
This component must:
- Support RTL layouts (dir="rtl")
- Work with 10 color themes (light, dark, high-contrast, etc.)
- Load with <50KB of JavaScript
- Support keyboard navigation and screen readers
- Handle translations from i18n library
Generate code that meets these constraints.
With constraints specified → Claude nailed it.
3. Visual Polish
Claude can generate correct HTML + Tailwind.
But "should the button have 2px or 4px border-radius?" requires looking at the design.
Claude would sometimes guess wrong.
Fix: Figma MCP + visual review. I'd spot-check generated code against the design, adjust the 2-3 pixels, move on.
The AI Development Setup We Use Now
If you want to replicate this, here's the minimal stack:
Required:
- Claude API + Claude Code — $20/month for Pro
- GitHub Copilot — $10/month or free with GitHub Enterprise
- Figma MCP — Free if you have Claude
- Shadcn MCP — Free if you use Shadcn
Recommended:
- Vercel for deployment — Free tier fine
- Storybook for documentation — Free, open-source
- GitHub for version control + PR review — Free
- Datadog or similar for monitoring — Monitor generated code's performance
The Workflow I Recommend:
1. Design phase (human)
↓
2. Brief spec (human-written)
↓
3. Claude Code generation (AI)
↓
4. Figma MCP validation (AI + human eye)
↓
5. PR review + testing (human)
↓
6. Copilot for integration & docs (AI)
↓
7. Ship
The Surprising Discovery: Less Refactoring Over Time
Here's what I didn't expect:
The more AI-generated code we integrated, the less refactoring we needed.
Month 1: "This AI code is OK but needs tweaks on every component."
Month 2: "Wait, this generated code has fewer bugs than our manual work."
Month 3: "The AI understands our patterns now. New components need almost no refactoring."
Why?
Because we:
- Iterated on our prompts (better prompts → better code)
- Set up constraints and standards upfront (AI learns our constraints)
- Reviewed thoroughly at first (caught edge cases, improved system)
- Built a library of patterns (AI saw our patterns, followed them)
By month 3, we could prompt Claude:
Build a DatePicker component using our established patterns.
Reference: Button (for styling), Input (for validation), Modal (for overlay).
Follow the same structure, types, tests, and stories.
Claude would generate code that was immediately production-ready.
Refactoring time: ~15 minutes (just a final review).
Human-written code at month 3 would still take 2-3 hours.
The Human Side: Did This Replace Engineers?
This is the question everyone asks:
"Vignesh, aren't you automating engineers out of a job?"
Honest answer: No. We're automating boilerplate, not thinking.
What changed:
| Task | Before | After | |------|--------|-------| | Generate component shell | 1 hour human | 5 min AI + 5 min review | | Write tests | 1.5 hours human | 15 min AI + 15 min review | | Storybook docs | 30 min human | 5 min AI + 5 min review | | Edge cases & business logic | 30 min human | 30 min human (same) | | Design decisions | 1 hour human | 1 hour human (same) |
AI handles the repetitive, testable, structured work.
Humans handle the judgment calls:
- "Should this form validate on blur or on submit?"
- "Is the error message tone right?"
- "Does this workflow make sense for our users?"
- "How should this component interact with the rest of the system?"
The reality: With AI, my 1 engineer did the work of 2-3 engineers on boilerplate.
But we didn't fire anyone.
We:
- Shipped the design system 5 weeks faster
- The engineer had time for higher-leverage work (mentoring, architecture, complex features)
- We hired 1 more engineer who now focuses on features, not component scaffolding
Net: AI enabled us to scale without proportionally increasing costs.
The Numbers That Matter
After shipping the design system with AI:
| Metric | Before | After | Improvement | |--------|--------|-------|-------------| | Design system components | 47 inconsistent | 8 consistent | 100% coverage | | Time to build a new page | 6 hours | 2 hours | 67% faster | | Onboarding time for new engineers | 3 weeks | 1 week | 66% faster | | Component bugs reported per month | 8 | 1 | 87% fewer | | Time to merge component PR | 3 days (4 review cycles) | 1 day (1 review cycle) | 3x faster | | Test coverage | 60% | 95% | +35% | | Code consistency score (lint + design adherence) | 72% | 98% | +36% |
The Cost:
- AI tools: $30/month per engineer
- Time investment: 3 weeks from 1 engineer + 0.3 FTE designer
- Total cost: ~$10K in time + $90 in software
The Benefit:
- Design system that 50+ engineers now use
- Onboarding new people 3x faster
- Shipping features faster
- Fewer bugs in production
- Better code consistency
- A foundation to build on for the next 2 years
The Lesson That Surprised Me Most
I went into this thinking: "AI will generate code, we'll use 60%, refactor the rest."
What actually happened:
AI forces you to think harder.
When you're writing code by hand, you can be sloppy:
- "I'll handle this edge case later"
- "This comment is good enough"
- "I'll test this manually"
When you're prompting AI:
- You have to be explicit: "Here's what we need"
- You have to be thoughtful: "What could go wrong?"
- You have to be thorough: "These are all the states"
The constraint of prompting AI made us better at design system thinking.
We had to articulate what a "good button" is.
We had to document all the variants.
We had to think about accessibility upfront, not as an afterthought.
AI didn't replace thinking. It forced better thinking.
What We're Doing Next
Now that the design system is solid:
1. AI-Generated Tests at Scale
We're using Claude Code to generate test scenarios:
Given this component's API, generate 50 test cases that cover:
- All prop combinations
- All edge cases
- All accessibility requirements
- All visual states
- All error conditions
Claude generates 50 test cases in 5 minutes.
We're finding bugs in components that have been "done" for months.
2. Automatic Type Generation from Figma
Instead of manually keeping Figma colors synced with TypeScript types:
// Generated from Figma automatically
export const colors = {
primary: '#0066CC',
secondary: '#6C757D',
// ... 50 more colors
} as const;
export type ColorVariant = keyof typeof colors;
We're exploring tooling that watches Figma, generates types, commits to the repo.
3. AI-Powered Component Suggestions
Using Claude to analyze:
- Figma designs being created
- Component library capabilities
- Upcoming features
Claude suggests: "This new form looks like you could use our Button, Input, and Select components. Here's a composition."
Designers see it, reduce manual component building.
The Honest Assessment: Is This the Future?
In 2026, yes.
But not in the way people imagine.
AI won't replace designers or engineers.
AI will replace boilerplate, validation work, and pattern repetition.
The engineers who thrive:
- Understand why patterns exist
- Can prompt AI effectively (this is a skill)
- Can judge AI output critically (not trust, verify)
- Can compose AI work into systems
- Can think about design and architecture
The engineers who struggle:
- Treat AI as a robot that does coding
- Don't learn the patterns behind what AI generates
- Copy-paste AI code without understanding it
- Don't invest in prompt engineering skills
- Can't tell good code from okay code
My prediction: By 2027, every company will use AI to generate boilerplate. The competitive advantage will be: who uses it best?
That's us.
Takeaways for Your Team
If you want to replicate this:
-
Start small. Don't AI-generate your entire codebase. Start with design systems, component libraries, and heavily-patterned code.
-
Invest in prompting. Good prompts > good AI model. Spend time learning how to ask questions effectively.
-
Use MCPs. Figma MCP, Shadcn MCP, your own MCP for your company patterns. Context is king.
-
Review thoroughly at first. The first 3-5 components need heavy review. By the 10th, you'll know what to look for.
-
Pair AI with humans. AI for boilerplate, humans for judgment. Neither alone is as good as both together.
-
Measure what matters. Not "lines of code generated" (meaningless). Measure: time to productivity, bug rates, code consistency, developer satisfaction.
-
Keep humans in the loop. AI suggests, humans decide. Every time.
Final Thought
Six months ago, if someone told me "we'll ship a design system with 60% AI-generated code that has fewer bugs than human code," I would have said: "You're smoking something."
Today, I'm convinced this is just the beginning.
Not because AI is sentient or magical.
But because we stopped thinking of AI as "a robot to replace humans."
And started thinking of it as "a tool to amplify human judgment."
When you focus on the right problems, ask the right questions, and let AI handle what it's good at (patterns, boilerplate, validation):
You build better systems faster.
You ship with more confidence.
You have time for the thinking that actually matters.
That's the future I'm excited about.
Tools We Use (If You Want to Replicate This)
- Claude API + Code: https://claude.ai
- GitHub Copilot: https://github.com/features/copilot
- Figma MCP: Available through Claude
- Shadcn MCP: Available through Claude
- Storybook: https://storybook.js.org
- Next.js: https://nextjs.org (what we use for the framework)
- Tailwind CSS: https://tailwindcss.com
- Zod: https://zod.dev (for validation)
If you've built something with AI and want to share your story, or if you're stuck with prompt engineering, reach out on Twitter or LinkedIn. I'm curious how other teams are using AI in their workflows.