Engineering

Structuring Privacy: Why Your Architecture Is Complicit in Data Leaks

Structuring Privacy: Why Your Architecture Is Complicit in Data Leaks

Building a distributed system without explicit data boundaries is exactly like declaring every variable globally in a legacy monolith. Eventually, a rogue function is going to read or leak something it has no business touching.

Years ago, I found myself staring at a Slack message from a board member of a startup I was advising. The request was infuriatingly casual: they wanted us to dump our entire user profile database, including millions of rows of photos and demographic data, and send it over to a "partner" company doing machine learning for facial recognition. No formal contract. No user consent. Just a "synergy" play because the founders happened to be investors in both companies.

The legal team was completely fine with it, citing some buried clause in our Terms of Service. I refused to run the query. It was a messy, uncomfortable fight that cost me political capital, but it forced a realization that completely changed how I build software: if a single database dump can compromise millions of people, the architecture itself is complicit.

We love to think of system security in terms of bad actors. We build massive moats around our databases and configure complex VPCs to keep the hackers out. But the reality is far more depressing. The biggest threat to user data is an internal product manager with legitimate access, looking for a quick revenue bump by piping user data to a third-party API.

When we treat user data like a giant global variable, ripe for the taking by any internal service, we fail our users. Relying on privacy policies to protect data is like relying on code comments to prevent bugs. Nobody reads them, and they carry zero execution-time weight.

We need to build systems where doing the wrong thing is technically difficult. We have to enforce privacy at the compiler and infrastructure level.

The Fallacy of the "Trusted" Internal Network

I used to browse engineering forums and see a recurring, cynical sentiment whenever data privacy came up. The consensus always boiled down to a dark nihilism: "Every online service is an adversary. The only safe move is to delete your accounts."

I hate this mindset. It shifts the burden of privacy onto the consumer. If the only way for a user to protect their identity is to unplug from the digital world entirely, we have completely failed as a profession.

The root of this failure is architectural. Most backend architectures operate on a "soft center" model. Once a request makes it past the API gateway and authentication layer, the internal microservices have a free-for-all. Service A can pull full user objects just to get an email address. Service B logs raw payloads to Datadog for "debugging," while Service C ships the unencrypted mess to an analytics warehouse.

To fix this, we have to start treating Personally Identifiable Information (PII) as a highly reactive, radioactive isotope. It needs shielding and specialized handling. It should never be passed around naked.

Type-Level Privacy with TypeScript

The first line of defense is the compiler. If a developer accidentally tries to send a user's unredacted profile to a generic logging service, the build should fail.

In TypeScript, we can achieve this using Nominal Typing (often called Branded Types or Opaque Types). TypeScript is structurally typed by default, meaning if two objects have the same shape, they are interchangeable. This is dangerous for PII. A string containing a user's location looks exactly like a string containing a harmless config value to the compiler.

Let's force the compiler to care.

// types/privacy.ts
 
// We create a unique symbol that doesn't exist at runtime
declare const __brand: unique symbol;
 
// The Brand helper attaches a phantom type to our primitive
type Brand<T, B> = T & { readonly [__brand]: B };
 
// Define our strictly typed PII fields
export type PII_Email = Brand<string, 'Email'>;
export type PII_Location = Brand<string, 'Location'>;
export type PII_Biometric = Brand<string, 'Biometric'>;
 
// A helper function to explicitly cast raw data to PII types
// This acts as our secure gateway into the domain
export const makeEmail = (email: string): PII_Email => 
  email.toLowerCase().trim() as PII_Email;

Now, how does this protect us? Let's say we have an external analytics service. We define its input interface strictly, refusing to accept any branded PII types.

// services/analytics.ts
import { PII_Email, PII_Location } from '../types/privacy';
 
type SafePrimitive = string | number | boolean;
 
// A utility type that forbids PII from being passed
type AssertNoPII<T> = T extends PII_Email | PII_Location | PII_Biometric
  ? never
  : T;
 
interface AnalyticsEvent<T> {
  eventName: string;
  payload: Record<string, AssertNoPII<T>>;
}
 
export class AnalyticsGateway {
  static track<T>(event: AnalyticsEvent<T>) {
    // Implementation details... 
    console.log(`Tracking ${event.eventName}`, event.payload);
  }
}

If a well-meaning but rushed developer tries to pass the raw user object into the analytics tracker, TypeScript throws a fit.

// domain/user.ts
import { makeEmail, PII_Email, PII_Location } from '../types/privacy';
import { AnalyticsGateway } from '../services/analytics';
 
interface User {
  id: string;
  email: PII_Email;
  location: PII_Location;
  loginCount: number;
}
 
const currentUser: User = {
  id: 'usr_98765',
  email: makeEmail('user@example.com'),
  location: 'New York' as PII_Location,
  loginCount: 42
};
 
// ❌ COMPILER ERROR: Type 'PII_Email' is not assignable to type 'never'.
AnalyticsGateway.track({
  eventName: 'USER_LOGIN',
  payload: {
    id: currentUser.id,
    email: currentUser.email, 
    loginCount: currentUser.loginCount
  }
});

This is a massive win. We have moved privacy enforcement from a dusty Confluence document directly into the IDE. The developer is forced to stop and explicitly strip or hash the email before tracking the event.

The Data Egress Gateway Pattern

Types are great, but they don't protect us from malicious intent or direct database dumps. We need runtime infrastructure to act as a physical checkpoint.

My approach to this is the Data Egress Gateway. Instead of allowing services to make arbitrary HTTP requests to third-party APIs (like a facial recognition service or a marketing tool), all external traffic must route through a dedicated egress proxy.

This proxy has one job: inspect outgoing payloads and redact anything that looks like user data unless an explicit, auditable policy allows it.

Here is a simplified Node.js implementation of an egress filter that automatically strips sensitive keys before the payload leaves your VPC.

// infrastructure/egress.ts
import axios, { AxiosRequestConfig } from 'axios';
import crypto from 'crypto';
 
const SENSITIVE_KEYS = new Set([
  'email', 
  'password', 
  'ssn', 
  'face_encoding', 
  'latitude', 
  'longitude', 
  'dob'
]);
 
export class EgressGateway {
  /**
   * Recursively walks the payload and redacts sensitive keys.
   * Arrays are mapped, objects are traversed.
   */
  private static redactPayload(data: any): any {
    if (Array.isArray(data)) {
      return data.map(item => this.redactPayload(item));
    }
    
    if (data !== null && typeof data === 'object') {
      const sanitized = { ...data };
      for (const [key, value] of Object.entries(sanitized)) {
        if (SENSITIVE_KEYS.has(key.toLowerCase())) {
          // Replace with a deterministic hash so analytics can still 
          // track unique users without knowing who they are.
          sanitized[key] = this.hashValue(value as string);
        } else if (typeof value === 'object') {
          sanitized[key] = this.redactPayload(value);
        }
      }
      return sanitized;
    }
    
    return data;
  }
 
  private static hashValue(val: string): string {
    return crypto.createHash('sha256').update(String(val)).digest('hex');
  }
 
  /**
   * The only allowed method for sending data to third parties.
   */
  static async postToThirdParty(url: string, payload: any, config?: AxiosRequestConfig) {
    const safePayload = this.redactPayload(payload);
    
    // Log the egress attempt for auditing (without the raw data)
    console.info(`[EGRESS] Sending payload to ${url}`, {
      keys_sent: Object.keys(safePayload)
    });
 
    return axios.post(url, safePayload, config);
  }
}

When you force all external integrations through an egress class, you create a chokepoint. If the board demands a raw export of user photos to a machine learning firm, an engineer actually has to go into the EgressGateway and bypass the redaction logic to commit that code.

That commit creates an audit trail. It requires a pull request and code review. It transforms a casual, quiet violation of trust into a loud, deliberate engineering decision. Most bad ideas die when you force them into the light of a pull request.

Decoupling Identity from Telemetry

The most pervasive architectural sin I see in startups is the "God Table." The users table holds the hashed password, the billing address, and metadata about how many times they clicked the "upgrade" button.

Because everything is coupled, any microservice that needs to know if a user is active accidentally gains access to their home address.

We fix this with strict data minimization and database segregation. Identity data should live in one isolated database. Behavioral telemetry should live entirely separately, linked only by an opaque, non-reversible identifier.

Let's map out how this looks using a theoretical Prisma schema.

// schema.prisma
 
// --- DATABASE 1: IDENTITY VAULT ---
// This database has strictly limited access. Only the Auth service 
// can read or write to it. It is never replicated to analytics.
model Identity {
  internal_uuid     String   @id @default(uuid())
  email             String   @unique
  password_hash     String
  legal_name        String?
  // This is the key we share with the rest of the system.
  // It is a one-way hash generated at signup.
  system_routing_id String   @unique 
  created_at        DateTime @default(now())
}
 
// --- DATABASE 2: APPLICATION STATE ---
// This database powers the actual application. It knows nothing 
// about the user's real-world identity.
model UserProfile {
  // We use the opaque routing ID, not the internal UUID or email.
  routing_id        String   @id 
  display_name      String
  theme_preference  String   @default("dark")
  login_count       Int      @default(0)
}

By physically separating the data, a compromise in the application layer or a sloppy analytics export yields absolutely zero real-world identities. If a product manager asks for a data dump of the UserProfile table to send to a third party, they get a pile of random strings and UI preferences. The sensitive data remains locked in the identity vault, untouched.

The Human Cost of Lazy Architecture

I keep coming back to the sheer arrogance of casual data sharing. When you collect a user's face or personal preferences, you are taking custody of a piece of their life.

We abstract this away behind JSON payloads and database rows, making it easy to forget that row[42] is a real person with a right to be left alone. By trading massive datasets to train algorithms or score partnerships, companies are betraying the people who trusted them.

Regulators are woefully behind. A government agency might occasionally wake up and issue a sternly worded letter to extract a meaningless promise that the company will "do better." Financial penalties are rare, and when they do happen, they are treated as a minor operating expense by the offending corporation.

The law will not force our industry to be ethical.

That responsibility falls entirely on us, the engineers and architects writing the code. We are the ones who actually build the pipes. We have the power to put valves and filters on those pipes.

Stop building systems that make it easy to do the wrong thing. Lock down your types and choke your external network egress. Segregate your databases. Treat privacy as a rigid structural requirement defined in your infrastructure.

Because at the end of the day, your architecture is your morality in code.