TFD-019: EA Catalog Storage & Delivery Model
TFD-019: EA Catalog Storage & Delivery Model
Date: 2026-05-15 Status: Accepted Author: Infrastructure Engineer Scope: infrastructure
Context
agent-ea (the Enterprise Architect digital talent) historically produced its LeanIX Object + Relationship catalogs as per-DAE CSV files rendered to standalone HTML. Across a customer's many engagements (DAE-0001…000N) this created CSV sprawl: no single source of truth, no referential integrity, no cross-DAE view, and copies leaking into every solution folder.
A proof (Cloudflare D1 + Pages) was built and deployed live, collapsing a
customer's catalogs into one queryable database with a published site.
During scoping, a hard requirement surfaced: STM and Transgesco are
SEPARATE customers (even though all DAE work currently sits under the
OneDrive - STM folder). Transgesco's mandate is explicitly IT autonomy
separate from STM, with PL-104 / LCOM compliance implications. This makes
customer data isolation a compliance constraint, not a convenience.
A second refinement also surfaced: the catalog only stabilises after the validation gate. Before that, extraction output is corrected iteratively and CSV/spreadsheets are the natural correction surface (see the manipulable-output principle).
Decision
agent-ea's enterprise architecture catalog adopts a per-customer D1 + Pages storage and delivery model with a CSV-validated lifecycle:
Lifecycle. CSV remains the working / validation / correction surface through the validation gate. Upload to the customer's D1 occurs only once the catalog is validated. D1 + Pages is the published consumption layer, never the editing surface. Re-validation → re-export CSV → correct → re-upload.
One D1 database per customer. Each customer gets exactly one D1 database; each DAE is a tagged catalog inside it (
catalogs.dae+catalogdiscriminators). Hard isolation: separate databases, separate Pages projects, no cross-customer queries. Cross-DAE queries are permitted within a single customer.Factory-vs-delivery hosting split. During an active engagement the factory hosts the live customer D1 + Pages (factory Cloudflare account). Customer delivery is a
wrangler d1 export→ portable.sqlite+ CSV bundle the customer can run independently of the factory and of any specific LLM/tooling (foundry rule). The delivered artifact, not the hosted instance, is the contractual deliverable.Naming & ownership. D1 database name = customer slug (
transgesco,stm). Pages project name = customer slug; note the public*.pages.devsubdomain is globally namespaced and may collide (observed:stm→stm-3kl.pages.dev) — cosmetic only, resolved later via a custom domain. The importer is config-driven (customers.json); adding a customer or DAE is a config edit.
Open items deferred to follow-up (not blocking this decision)
- Post-delivery hosting: whether/how long the factory keeps a customer's live site running after handover, and who bears the (currently $0, free-tier) cost and credential ownership. Default until decided: factory hosts during engagement only; post-delivery the customer runs the exported bundle.
- Custom-domain strategy per customer.
Alternatives Considered
Per-DAE CSV → HTML (status quo)
Zero infrastructure and perfectly customer-independent, but no referential integrity, no cross-DAE view, and active sprawl across solution folders. Rejected: it is a degraded approximation of an EA repository, which is exactly what LeanIX is meant to be.
Per-DAE D1 (one database per request)
Kills sprawl within a single request but keeps each DAE an island — no enterprise-wide model, defeating the "enterprise architecture" purpose. Rejected in favour of per-customer scope.
Single shared D1 across all customers (catalog/customer column)
Simplest to operate, but places STM and Transgesco data in one database. Rejected outright: violates the customer data-isolation / compliance constraint.
Atlassian Assets / headless CMS as the store
Native object modelling (Assets) or strong publishing (CMS), but both introduce vendor lock-in and break the foundry independence rule (customer cannot run the deliverable without the SaaS). Rejected.
Consequences
Positive
- One queryable enterprise model per customer; cross-DAE questions become answerable. Referential integrity is enforceable (orphan checks).
- Compliance-grade customer isolation by construction.
- Foundry rule preserved: portable
.sqlite+ CSV deliverable, no factory lock-in. - CSV-during-validation keeps the correction workflow unchanged; the DB step is additive, not disruptive.
- Config-driven importer makes onboarding a new customer/DAE trivial.
Negative
- agent-ea's output contract changes: a validated-upload step replaces
loose CSV emission. Pipeline and
leanix-catalog-extracttoolkit must be updated (Phase 3a). - A build/publish step (CSV ⇄ D1 ⇄ Pages) now exists where there was a direct CSV → HTML render.
- Factory now operates per-customer cloud resources (credentials, account hygiene) during engagements.
Neutral
- Cloudflare free tier covers current volume; cost is $0 today but is a factory-account dependency to track.
wrangler.tomlis swapped in place per customer at deploy time (wrangler.{customer}.tomlare the sources of truth) becausepages deploydoes not accept a custom config path.- Existing STM DAEs without Object/Relation catalogs (e.g. DAE-0006, Jira-only) are simply not ingested; no data loss, they were never catalog sources.