Stop Feeding Your AI Garbage: Context-Aware Search for Agents

The $500 Regex Mistake

I woke up one morning to an OpenAI API bill that looked more like a mortgage payment. My custom coding agent had spent the night trying to refactor a legacy module. Instead of surgically modifying the three relevant functions, it had cat'd the entire 50,000-line codebase into the context window, hallucinated a few dependencies, and timed out.

The problem was never the model's intelligence. It was the retrieval strategy. We treat AI agents like senior engineers but hand them tools from the 1970s. We hand them grep.

grep is great if you are a human with a terminal and a vague memory of a variable name. It falls apart when an LLM tries to build a mental map of a system. grep returns lines of text. Agents need symbols, definitions, and relationships. When an agent asks "Where is processAuth defined?" and your tool returns 500 lines of logs, comments, and string literals, you are burning tokens on noise.

We need to stop treating code like text strings and start treating it like a graph. That is where tools like cgrep (Context-aware Grep) enter the picture.

The System Problem: Token Efficiency vs. Context Accuracy

Building effective RAG (Retrieval-Augmented Generation) for code is harder than for text. With a PDF, semantic similarity works fine. With code, "similar" text is often irrelevant. Fifty different files importing the same library all match the same query.

The bottleneck is the "Context Payload":

Low Precision: Standard search floods the context window with irrelevant matches.
High Latency: The agent reads garbage, thinks, requests more files, reads more garbage.
Cost: Every irrelevant line is a micro-transaction draining your budget.

cgrep solves this by combining two indexing strategies: BM25 (the industry standard for text ranking) and Tree-sitter (Abstract Syntax Tree analysis). It does not just find the string "User"; it finds the Class Definition of User, or the Callers of User.login.

The Two-Stage Flow: Locate, Then Expand

This is the system pattern that caught my attention. Instead of a naive "search and dump," cgrep encourages a lazy-loading architecture for agents:

Locate: The agent asks for a symbol. The tool returns a lightweight list of candidates (file paths + minimal context).
Expand: The agent selects the specific candidate it needs, and the tool extracts the full AST node (the entire function or class body).

This reduces token consumption by over 90% in large codebases like PyTorch. Reading the table of contents vs. reading the whole library.

Implementation: Integrating Semantic Search

While cgrep is a binary (written in Rust or Go), we interface with it via TypeScript in our agent workflows. It supports the Model Context Protocol (MCP), which is quickly becoming the standard for connecting LLMs to local tools.

Let's build a TypeScript wrapper that mimics this "Locate -> Expand" pattern. This ensures you can integrate it into a custom agent loop without relying solely on their pre-built MCP server.

Prerequisites

Assume you have the cgrep binary installed and available in your PATH.

The TypeScript Agent Tool

Here is how I structure a tool definition that forces the agent to be frugal with tokens. We define two distinct tools: find_symbol and read_symbol.

import { exec } from 'child_process';
import { promisify } from 'util';
 
const execAsync = promisify(exec);
 
// 1. Define the Types for our Agent's mental model
interface SearchResult {
  file: string;
  line: number;
  kind: 'function' | 'class' | 'variable' | 'unknown';
  signature: string;
}
 
interface SymbolContent {
  fullCode: string;
  dependencies: string[];
}
 
class CodeNavigator {
  private projectRoot: string;
 
  constructor(root: string) {
    this.projectRoot = root;
  }
 
  /**
   * STAGE 1: LOCATE
   * fast, cheap, low-token output.
   * Instead of cat-ing files, we find specific AST nodes.
   */
  async locateSymbol(query: string): Promise<SearchResult[]> {
    try {
      // Using cgrep's AST analysis to find definitions only
      const { stdout } = await execAsync(
        `cgrep search --query "${query}" --type definition --format json`,
        { cwd: this.projectRoot }
      );
 
      const raw = JSON.parse(stdout);
 
      // Map to a token-efficient format for the LLM
      return raw.matches.map((m: any) => ({
        file: m.filePath,
        line: m.startLine,
        kind: m.astType,
        signature: m.preview
      }));
    } catch (error) {
      console.error("Search failed", error);
      return [];
    }
  }
 
  /**
   * STAGE 2: EXPAND
   * The agent decides this is the right symbol, so we pay the token cost.
   */
  async expandSymbol(filePath: string, line: number): Promise<SymbolContent | null> {
    try {
      const { stdout } = await execAsync(
        `cgrep extract --file "${filePath}" --line ${line}`,
        { cwd: this.projectRoot }
      );
 
      return {
        fullCode: stdout,
        dependencies: this.parseImports(stdout)
      };
    } catch (err) {
      return null;
    }
  }
 
  private parseImports(code: string): string[] {
    return [];
  }
}
 
// Usage Example within an Agent Loop
(async () => {
  const nav = new CodeNavigator('./legacy-monorepo');
 
  console.log("Agent: Looking for 'AuthService'...");
  const candidates = await nav.locateSymbol('AuthService');
 
  console.log("Candidates found:", candidates);
 
  if (candidates.length > 0) {
    const bestMatch = candidates[0];
    console.log(`Agent: Expanding ${bestMatch.file}...`);
 
    const content = await nav.expandSymbol(bestMatch.file, bestMatch.line);
    console.log("Full Context loaded (first 50 chars):", content?.fullCode.substring(0, 50));
  }
})();

Why This Matters

In a standard setup, the agent might call read_file('auth_service.ts'). If that file is 2,000 lines long, you just lost context space for the actual reasoning.

By forcing the locateSymbol step, we give the agent a menu. It sees:

AuthService (Class, /src/auth/service.ts)
AuthService (Interface, /src/common/types.ts)
MockAuthService (Class, /tests/mocks.ts)

The agent can look at that list and say, "I only care about the Interface right now." That one decision saves thousands of tokens.

The MCP Connection

Writing wrappers is educational, but standard protocols are better. cgrep supports MCP out of the box. If you use Claude Desktop or Cursor, you don't need the TypeScript wrapper above; you just point the config to the cgrep binary.

{
  "mcpServers": {
    "cgrep": {
      "command": "cgrep",
      "args": ["mcp", "serve"]
    }
  }
}

This exposes the tool to the model natively. The model understands it has a tool capable of semantic search. When you type "Refactor the login logic," the model queries the cgrep server for references to the login function, pulls the specific implementation, and ignores the 400 unit tests that also contain the word "login."

Conclusion

We are moving past the "wow, it writes code" phase and entering the "how do we make it work at scale" phase. Tools like cgrep represent moving from brute-force text processing to structure-aware navigation.

The AST is not new. Compilers have used it for decades. We are just finally pointing our AI colleagues at the same map that compilers already had. If you are building agents or just tired of your context window overflowing, stop using grep. Give your agent a map, not a flashlight.