Documents as Build Artifacts: Orchestrating Pandoc Templates with TypeScript
Documents as Build Artifacts: Orchestrating Pandoc Templates with TypeScript
For the first three years of my solopreneur journey, I manually formatted every client invoice and architectural proposal in Google Docs, carefully nudging margins and fixing page breaks until 2 AM. I genuinely do not know how to feel about those lost hours. There is something profoundly embarrassing about designing a fault-tolerant, event-driven microservice architecture during the day, only to revert to dragging text boxes around a WYSIWYG editor like a digital caveman at night.
We treat our code as a sacred artifact. We lint code and track it in version control. Then, we deploy it deterministically through continuous integration pipelines. Yet, when it comes to the actual documents we produce, such as resumes and pitch decks, we suddenly abandon all our system-thinking principles. We accept brittle formatting that breaks because someone pressed the spacebar twice.
This cognitive dissonance bothered me for years. The truth is, documents should just be another build artifact.
I eventually stumbled into the world of document compilation, transitioning my entire workflow to plain Markdown. But basic Markdown to PDF conversion usually looks terrible out of the box. The missing link between "raw text" and "professional client deliverable" is the presentation layer.
The Template Underworld
Pandoc is legendary in the engineering space. It is the Swiss Army knife of document conversion, capable of parsing almost any markup language into an abstract syntax tree (AST) and rendering it into dozens of output formats.
However, the raw output of a basic Pandoc command is aesthetically barren. You get the structure, but none of the soul. For a long time, I assumed creating beautiful PDFs required writing raw LaTeX, a language that feels like parsing XML with regex while someone yells at you in German.
Then I discovered the expansive ecosystem of community-driven Pandoc templates. These are pre-configured layouts that map Markdown frontmatter to complex typesetting rules. Suddenly, I was binding data to views.
Some of the architectural lifesavers I rely on include:
- Eisvogel: A brilliant LaTeX template designed for computer science notes and technical reports. It beautifully handles code blocks, syntax highlighting, along with deep nesting. I use this for all client-facing architecture proposals.
- CV Boilerplate & The Markdown Resume: Templates specifically designed to turn a bulleted list of your career trauma into a sleek, ATS-friendly PDF or HTML file.
- Tufte CSS: For HTML outputs, mapping Markdown to the elegant, margin-note-heavy typographic style of Edward Tufte.
- Invoice Boilerplates: Simple, automated systems that take YAML frontmatter containing line items and hourly rates, spitting out a perfectly calculated and formatted PDF invoice.
The Rendering Engine Dilemma: LaTeX vs. Typst
Before we build our automation pipeline, we need to address the elephant in the typesetting room.
Historically, PDF generation via Pandoc relied heavily on LaTeX. LaTeX is incredibly powerful but notoriously fragile. I have spent entirely too much of my life debugging cryptic errors because a Unicode character like an arrow (→) was missing from the default font, causing the entire compiler to silently drop the character or crash.
Furthermore, controlling page breaks or table column widths in LaTeX via Markdown often requires injecting raw TeX commands into your text, which defeats the entire purpose of using a pristine, portable Markdown file.
Recently, the community has been shifting toward Typst, a modern, Rust-based typesetting system that is orders of magnitude faster and significantly more intuitive than LaTeX. Pandoc now natively supports Typst as an output engine. If you are building a document pipeline today, I highly recommend using Typst templates where possible, though the LaTeX ecosystem (like Eisvogel) still holds the crown for sheer variety.
Architecting the Pipeline in TypeScript
Knowing these tools exist is only half the battle. As a developer, I refuse to remember complex CLI flags for every invoice I generate. We need an orchestrator.
Let's build a Node.js/TypeScript tool that is our Document Build Pipeline. This system will read our Markdown files and validate the YAML frontmatter using Zod. Then, it dynamically selects the correct Pandoc template to spawn the compilation process.
Step 1: Defining the Schema
First, we need strict contracts for our document metadata. If I am generating an invoice, the system must ensure I haven't forgotten the client's billing address. Zod is perfect for this.
import { z } from "zod";
import matter from "gray-matter";
import { readFileSync } from "fs";
import { spawn } from "child_process";
import path from "path";
// Define our strictly typed frontmatter schemas
const BaseDocSchema = z.object({
title: z.string(),
date: z.string().or(z.date()),
author: z.string().optional(),
});
const InvoiceSchema = BaseDocSchema.extend({
type: z.literal("invoice"),
client_name: z.string(),
amount_due: z.number(),
currency: z.string().default("USD"),
due_date: z.string(),
});
const ProposalSchema = BaseDocSchema.extend({
type: z.literal("proposal"),
client_name: z.string(),
version: z.string(),
confidential: z.boolean().default(true),
});
// A discriminated union to handle all supported document types
const DocumentFrontmatter = z.discriminatedUnion("type", [
InvoiceSchema,
ProposalSchema,
]);
type DocMeta = z.infer<typeof DocumentFrontmatter>;Step 2: The Compilation Engine
Next, we need a class to handle orchestration. It reads the file and extracts metadata. From there, it selects the appropriate Pandoc template and executes the build.
class DocumentPipeline {
private readonly outputDir: string;
private readonly templateDir: string;
constructor(outputDir: string, templateDir: string) {
this.outputDir = outputDir;
this.templateDir = templateDir;
}
/**
* Reads the markdown file, parses frontmatter, and validates it.
*/
private parseFile(filePath: string): { meta: DocMeta; content: string } {
const rawContent = readFileSync(filePath, "utf-8");
const { data, content } = matter(rawContent);
try {
const validatedMeta = DocumentFrontmatter.parse(data);
return { meta: validatedMeta, content };
} catch (error) {
console.error(`Frontmatter validation failed for ${filePath}`);
if (error instanceof z.ZodError) {
console.error(JSON.stringify(error.issues, null, 2));
}
process.exit(1);
}
}
/**
* Maps a document type to its specific Pandoc arguments and templates.
*/
private getBuildArgs(meta: DocMeta, inputPath: string, outputPath: string): string[] {
const baseArgs = [
inputPath,
"-o", outputPath,
"--pdf-engine=xelatex", // Using XeLaTeX for better Unicode support
];
switch (meta.type) {
case "proposal":
return [
...baseArgs,
`--template=${path.join(this.templateDir, "eisvogel.tex")}`,
"--listings", // Required for Eisvogel code blocks
"--number-sections",
"-V", `titlepage=true`,
"-V", `toc=true`,
"-V", `titlepage-color=0b2a3d`,
];
case "invoice":
return [
...baseArgs,
`--template=${path.join(this.templateDir, "invoice-boilerplate.tex")}`,
"-V", `client=${meta.client_name}`,
"-V", `amount=${meta.amount_due}`,
];
default:
throw new Error("Unsupported document type");
}
}
/**
* Executes the Pandoc process.
*/
public async build(inputPath: string): Promise<void> {
console.log(`\n[Build Started] Analyzing ${inputPath}...`);
const { meta } = this.parseFile(inputPath);
const fileName = path.parse(inputPath).name;
const outputPath = path.join(this.outputDir, `${fileName}.pdf`);
const args = this.getBuildArgs(meta, inputPath, outputPath);
console.log(`[Compiling] Generating ${meta.type.toUpperCase()} -> ${outputPath}`);
return new Promise((resolve, reject) => {
const pandoc = spawn("pandoc", args);
pandoc.stderr.on("data", (data) => {
// Pandoc outputs warnings to stderr even on success
console.warn(`[Pandoc Warning]: ${data.toString().trim()}`);
});
pandoc.on("close", (code) => {
if (code === 0) {
console.log(`[Success] Artifact generated successfully at ${outputPath}`);
resolve();
} else {
console.error(`[Fatal] Pandoc exited with code ${code}`);
reject(new Error("Build failed"));
}
});
});
}
}Step 3: Executing the Pipeline
Now, we just need an entry point to trigger our build. You can hook this into a Git pre-commit hook or a CI/CD pipeline. Alternatively, run it locally via a package.json script.
// index.ts
async function main() {
const pipeline = new DocumentPipeline(
path.resolve(__dirname, "./dist"),
path.resolve(__dirname, "./templates")
);
const targetFile = process.argv[2];
if (!targetFile) {
console.error("Please provide a markdown file path.");
process.exit(1);
}
try {
await pipeline.build(targetFile);
} catch (err) {
console.error("Pipeline execution halted.", err);
process.exit(1);
}
}
main();The Solopreneur's ROI
I keep coming back to the mental clarity this system provides. When I land a new consulting client, I don't open Word. I open Neovim. I create a file named proposal-acme-corp.md and type type: proposal in the frontmatter. My editor immediately auto-completes the required Zod schema fields.
I write the architecture specs in plain text and drop in Mermaid.js diagrams before hitting save to run npm run build:docs. The TypeScript orchestrator validates the metadata and injects it into the Eisvogel template. The system then compiles a 15-page PDF containing syntax-highlighted code blocks, a clickable table of contents, along with perfectly branded title pages.
If the client asks for a revision, I change a few lines of plain text before committing the update to Git and recompiling. The diffs are readable and history is permanent. Formatting remains completely decoupled from the content.
There are still edge cases. Tables in generic text representations remain an absolute nightmare to parse perfectly into constrained PDF widths. Sometimes a deeply nested list will misbehave. But these are engineering problems, solvable by tweaking the template or writing a custom Pandoc Lua filter, rather than layout problems requiring me to drag a ruler with my mouse.
Moving Forward
Stop writing business-critical documents in fragile visual editors. Business documents like invoices and system architectures deserve the same rigorous infrastructure as your codebase.
Treating Markdown files as source code and frontmatter as a configuration layer turns Pandoc templates into UI components. This paradigm unlocks a level of velocity and professionalism that most independent developers miss.