Stop Writing God Prompts: Building a Multi-Agent Pipeline

Building a monolithic application is like handing a single chef a warehouse of ingredients and demanding a seven-course tasting menu in ten minutes. They will panic and serve you a blender full of mush. That is exactly how most of us treat Large Language Models right now. We cram extensive market research into one bloated prompt along with financial projections and risk analysis. Then we fire it off to an API and wonder why the output reads like a high school book report.

I burned through fifty dollars of API credits last month trying to generate a cohesive business plan with a single query. The model kept forgetting its own assumptions. If the market size was $10B in section one, it magically became $50B by the finance section. The attention mechanism breaks down when you ask an LLM to hold too many distinct logical frameworks at once. A single agent trying to act as a researcher and financial analyst while also being a skeptical reviewer is an architectural nightmare.

The Microservice Mindset for AI

The fix is treating AI like microservices. Instead of one massive prompt, I wired up five distinct agents.

The Market Researcher pulls live web data.
The Competitor Analyst cross-references the research against known players.
The Strategist builds the revenue models.
The Quant calculates TAM/SAM/SOM and a 3-year projection.
The Devil's Advocate looks at the previous four outputs and aggressively attacks the weak points.

This chain means Agent 2 does not have to guess the market size. It receives Agent 1's hard data as read-only context. It is just state management for words.

I also realized different models are good at different things. Claude is exceptional at native phrasing and nuance. Gemini handles structured data without complaining. GPT-4o is a brute-force reasoning engine.

Multi-Model Routing in TypeScript

Let me show you how the routing and fallback mechanism actually looks in TypeScript. If an API hangs, we gracefully degrade to another provider.

type ModelVendor = 'anthropic' | 'openai' | 'google';
 
interface RouteConfig {
  primary: ModelVendor;
  fallback: ModelVendor[];
  temperature: number;
}
 
const taskRouter: Record<string, RouteConfig> = {
  research: { primary: 'google', fallback: ['openai'], temperature: 0.2 },
  strategy: { primary: 'anthropic', fallback: ['openai', 'google'], temperature: 0.7 },
  finance: { primary: 'openai', fallback: ['google'], temperature: 0.1 }
};
 
async function executeWithFallback(task: string, prompt: string) {
  const config = taskRouter[task];
  const vendors = [config.primary, ...config.fallback];
 
  for (const vendor of vendors) {
    try {
      return await callLLM(vendor, prompt, config.temperature);
    } catch (error) {
      console.warn(`Vendor ${vendor} failed. Falling back...`);
    }
  }
  throw new Error("All models failed. Time to panic.");
}

That handles reliability. Next is the pipeline itself. The output of one agent must flow into the next. I built this around a strict context object so the agents do not overwrite each other's work.

interface AgentContext {
  marketData?: string;
  competitors?: string;
  strategy?: string;
}
 
async function runAgentPipeline(topic: string) {
  const context: AgentContext = {};
 
  context.marketData = await executeWithFallback(
    'research',
    `Find market sizing for ${topic}`
  );
 
  context.competitors = await executeWithFallback(
    'strategy',
    `Given this market data: ${context.marketData}, identify 3 competitors.`
  );
 
  const critique = await executeWithFallback(
    'strategy',
    `Review this entire context and tear it apart: ${JSON.stringify(context)}`
  );
 
  return { context, critique };
}

Real-time Rendering with SSE

The final piece is rendering this without making the user stare at a loading spinner for forty seconds while five agents chat with each other. Next.js 15 App Router handles Server-Sent Events (SSE) perfectly for this. You pipe the stream right to the client as the agents hit their milestones.

import { NextResponse } from 'next/server';
 
export async function POST(req: Request) {
  const { topic } = await req.json();
  const stream = new TransformStream();
  const writer = stream.writable.getWriter();
 
  startPipelineAndStream(topic, writer);
 
  return new NextResponse(stream.readable, {
    headers: {
      'Content-Type': 'text/event-stream',
      'Cache-Control': 'no-cache',
      'Connection': 'keep-alive',
    },
  });
}
 
async function startPipelineAndStream(topic: string, writer: WritableStreamDefaultWriter) {
  const encoder = new TextEncoder();
  const send = (msg: string) => writer.write(encoder.encode(`data: ${msg}\n\n`));
 
  try {
    send("Agent 1: Market sizing started...");
    // Pipeline logic runs here...
    send("Agent 1: Complete. Passing to Agent 2.");
  } finally {
    await writer.close();
  }
}

Conclusion

Stop treating language models like omniscient oracles. They are functions. String them together. You must also manage their state and enforce boundaries. It takes more setup, but you actually get to sleep at night knowing your application is not randomly hallucinating numbers.