4 Engineers, 12 Microservices, and an AI Teammate: Scaling Our OMS Team with Claude
4 Engineers, 12 Microservices, and an AI Teammate
There's a specific kind of dread that settles in on a Sunday evening when your on-call week is about to start. For our OMS team (one PM and three engineers), that feeling was amplified by the twelve microservices we shepherded. We were perpetually drowning in context-switching, boilerplate code, and a growing backlog. We weren't building. We were bailing water.
The problem: death by context switches
Our microservice architecture, born from the best of intentions, had matured into a constellation of siloed knowledge. Each service was its own kingdom of forgotten lore and undocumented tribal wisdom. Adding a new return reason felt like orchestrating diplomacy between twelve mutually suspicious nations.
The pain points:
- Context switching: Tracing a single user journey meant jumping between multiple codebases, deployment pipelines, and dashboards. Productivity was theoretical.
- Boilerplate treadmill: Writing the same NestJS modules, DTOs, repository patterns, and Jest tests across services. We were human copy-paste machines.
- Documentation as afterthought: Hard to update Confluence when production is on fire. Onboarding was a nightmare. Incident response was an archaeological dig.
- Cursory code reviews: Everyone was stretched thin. Small inconsistencies slipped through and greeted us with PagerDuty alerts at 3 AM.
We couldn't hire 12 more engineers. We couldn't delete 8 microservices (though the thought crossed my mind during certain 3 AM calls). We needed a force multiplier.
An AI-augmented workflow
We didn't throw a chatbot at the problem. We integrated Claude into the development lifecycle with a specific goal: eliminate the drudgery, free our brains for the complex architectural problems that actually need human judgment.
Universal intern: context retrieval
First step was feeding Claude our entire repository, architectural decision records, API docs, and sanitized Slack conversations from #oms-dev. Now instead of interrupting the one engineer who remembers why a specific gRPC endpoint was designed that way six years ago, we just ask.
Typical query:
"What's the full data flow for a return request, from
returns-apitoinventory-service? Include key database tables and event topics."
Response:
returns-apireceives POST to/returns.- Validates payload via
CreateReturnDto.ReturnsServicecreates entry inreturnstable withPENDINGstatus.- Publishes
return.createdevent tooms-eventsKafka topic.logistics-serviceconsumes event, arranges pickup, updates toIN_TRANSIT.inventory-serviceconsumesreturn.receivedevent, updatesproduct_inventory, triggers refund.Key files:
returns.service.tsinreturns-apiandInventoryEventHandler.tsininventory-service.
Onboarding went from a 3-week headache to a 3-day guided tour.
Boilerplate buster: code generation
This is where we saw the biggest productivity gains. We created standardized prompts for common tasks in our NestJS/TypeScript environment.
Prompt:
Create a NestJS DTO for 'UpdateShipmentStatus' using class-validator.
- 'shipmentId': string, UUID format, required.
- 'status': enum of 'IN_TRANSIT', 'DELIVERED', 'DELAYED', required.
- 'notes': string, optional, max 500 characters.
- 'eventTimestamp': ISO8601 date string, required.
Generated code:
import { IsString, IsUUID, IsEnum, IsOptional, MaxLength, IsISO8601 } from 'class-validator';
export enum ShipmentStatus {
IN_TRANSIT = 'IN_TRANSIT',
DELIVERED = 'DELIVERED',
DELAYED = 'DELAYED',
}
export class UpdateShipmentStatusDto {
@IsUUID()
@IsString()
readonly shipmentId: string;
@IsEnum(ShipmentStatus)
readonly status: ShipmentStatus;
@IsOptional()
@IsString()
@MaxLength(500)
readonly notes?: string;
@IsISO8601()
readonly eventTimestamp: string;
}We extended this to generate entire modules, services with CRUD operations, and Jest test skeletons. The time saved from boilerplate goes to solving actual business problems.
Quality guardian: automated first-pass reviews
We integrated Claude into CI/CD via a custom GitHub Action. On every PR, it performs a first-pass review. Not replacing human oversight for logic and architecture, but catching the common stuff that slows us down.
Example PR comment from claude-bot:
Suggestion (Performance): In
ProductService.tsline 84, thisArray.prototype.findinside a loop is O(n²). ConvertproductsToFindto aMapbefore the loop for O(1) lookup.Suggestion (Clarity): Variable
dataon line 112 is vague.customerOrderHistorywould improve readability.
The results
Did we become a 16-person team? No. Headcount is still four. But we started shipping like a team three times our size.
- Cycle time from ticket to production: cut in half.
- Production bugs decreased over 60%. The AI catches simple mistakes; humans focus on complex logic.
- Developer morale: Engineers spend less time on toil and more on creative problem-solving. The Sunday dread is still there, but it's lighter.
Your next hire might be an API call
AI won't fix bad architecture or a toxic culture. It's a tool, and its effectiveness depends on who's using it.
But if you treat it as a tireless pair programmer, it can free up your most valuable resource: your team's collective brainpower. It lets your best people do their best work.
Stop bailing water. Start building a better boat.