TFD-019: EA Catalog Storage & Delivery Model

TFD-019: EA Catalog Storage & Delivery Model

Date: 2026-05-15 Status: Accepted Author: Infrastructure Engineer Scope: infrastructure

Context

agent-ea (the Enterprise Architect digital talent) historically produced its LeanIX Object + Relationship catalogs as per-DAE CSV files rendered to standalone HTML. Across a customer's many engagements (DAE-0001…000N) this created CSV sprawl: no single source of truth, no referential integrity, no cross-DAE view, and copies leaking into every solution folder.

A proof (Cloudflare D1 + Pages) was built and deployed live, collapsing a customer's catalogs into one queryable database with a published site. During scoping, a hard requirement surfaced: STM and Transgesco are SEPARATE customers (even though all DAE work currently sits under the OneDrive - STM folder). Transgesco's mandate is explicitly IT autonomy separate from STM, with PL-104 / LCOM compliance implications. This makes customer data isolation a compliance constraint, not a convenience.

A second refinement also surfaced: the catalog only stabilises after the validation gate. Before that, extraction output is corrected iteratively and CSV/spreadsheets are the natural correction surface (see the manipulable-output principle).

Decision

agent-ea's enterprise architecture catalog adopts a per-customer D1 + Pages storage and delivery model with a CSV-validated lifecycle:

  1. Lifecycle. CSV remains the working / validation / correction surface through the validation gate. Upload to the customer's D1 occurs only once the catalog is validated. D1 + Pages is the published consumption layer, never the editing surface. Re-validation → re-export CSV → correct → re-upload.

  2. One D1 database per customer. Each customer gets exactly one D1 database; each DAE is a tagged catalog inside it (catalogs.dae + catalog discriminators). Hard isolation: separate databases, separate Pages projects, no cross-customer queries. Cross-DAE queries are permitted within a single customer.

  3. Factory-vs-delivery hosting split. During an active engagement the factory hosts the live customer D1 + Pages (factory Cloudflare account). Customer delivery is a wrangler d1 export → portable .sqlite + CSV bundle the customer can run independently of the factory and of any specific LLM/tooling (foundry rule). The delivered artifact, not the hosted instance, is the contractual deliverable.

  4. Naming & ownership. D1 database name = customer slug (transgesco, stm). Pages project name = customer slug; note the public *.pages.dev subdomain is globally namespaced and may collide (observed: stmstm-3kl.pages.dev) — cosmetic only, resolved later via a custom domain. The importer is config-driven (customers.json); adding a customer or DAE is a config edit.

Open items deferred to follow-up (not blocking this decision)

  • Post-delivery hosting: whether/how long the factory keeps a customer's live site running after handover, and who bears the (currently $0, free-tier) cost and credential ownership. Default until decided: factory hosts during engagement only; post-delivery the customer runs the exported bundle.
  • Custom-domain strategy per customer.

Alternatives Considered

Per-DAE CSV → HTML (status quo)

Zero infrastructure and perfectly customer-independent, but no referential integrity, no cross-DAE view, and active sprawl across solution folders. Rejected: it is a degraded approximation of an EA repository, which is exactly what LeanIX is meant to be.

Per-DAE D1 (one database per request)

Kills sprawl within a single request but keeps each DAE an island — no enterprise-wide model, defeating the "enterprise architecture" purpose. Rejected in favour of per-customer scope.

Single shared D1 across all customers (catalog/customer column)

Simplest to operate, but places STM and Transgesco data in one database. Rejected outright: violates the customer data-isolation / compliance constraint.

Atlassian Assets / headless CMS as the store

Native object modelling (Assets) or strong publishing (CMS), but both introduce vendor lock-in and break the foundry independence rule (customer cannot run the deliverable without the SaaS). Rejected.

Consequences

Positive

  • One queryable enterprise model per customer; cross-DAE questions become answerable. Referential integrity is enforceable (orphan checks).
  • Compliance-grade customer isolation by construction.
  • Foundry rule preserved: portable .sqlite + CSV deliverable, no factory lock-in.
  • CSV-during-validation keeps the correction workflow unchanged; the DB step is additive, not disruptive.
  • Config-driven importer makes onboarding a new customer/DAE trivial.

Negative

  • agent-ea's output contract changes: a validated-upload step replaces loose CSV emission. Pipeline and leanix-catalog-extract toolkit must be updated (Phase 3a).
  • A build/publish step (CSV ⇄ D1 ⇄ Pages) now exists where there was a direct CSV → HTML render.
  • Factory now operates per-customer cloud resources (credentials, account hygiene) during engagements.

Neutral

  • Cloudflare free tier covers current volume; cost is $0 today but is a factory-account dependency to track.
  • wrangler.toml is swapped in place per customer at deploy time (wrangler.{customer}.toml are the sources of truth) because pages deploy does not accept a custom config path.
  • Existing STM DAEs without Object/Relation catalogs (e.g. DAE-0006, Jira-only) are simply not ingested; no data loss, they were never catalog sources.