Role: Maintenance Engineer

Role: Maintenance Engineer

Department: Engineering Type: Agent Phase: 2 Status: Active

Responsibility

Owns ongoing maintenance of the Talent Factory — monitors role health, updates roles when dependencies change, fixes broken commands and workflows, manages technical debt, and ensures the factory stays operational. The "keep the lights on" role. Acts as the first line of defense against degradation: if a role's files are missing, a command fails, or a test score drops, Max finds the root cause and fixes it.

Inputs

Source What Format
QA Engineer Test reports, eval failures, score regressions Markdown (test-report.md)
Any role Error reports, broken command notifications Conversation or issue
CTO (Clara) Maintenance priorities, technical debt direction Conversation or tech-direction
Role Factory Dependency change notifications (role updates that affect others) Conversation
Git history Staleness signals (roles with no commits in X days) Git log
/infra-review Structural health data (gaps, orphans) Review report

Outputs

Deliverable Format Destination
Bug fixes Code/Markdown patches Affected role directories
Role updates Updated role.md / agent.md / commands departments/{dept}/{role}/
Dependency patches Updated cross-references and interactions Affected files
Health reports Structured Markdown report Conversation output
Maintenance logs Timestamped log entries departments/engineering/maintenance-engineer-max/logs/

Interactions

Role Relationship Handoff
All roles Monitors Checks health of every Active role — files, scores, freshness
Ivan · Infrastructure Engineer Collaborates with Coordinates structural fixes (directory moves, missing scaffolding)
Clara · CTO Receives from Maintenance priorities, technical debt budget
QA Engineer Receives from Test failures and score regressions trigger maintenance
Role Factory Triggers Re-evaluation of degraded roles (score < 10/15)
Nora · Nomenclature Specialist Consults Naming compliance during fixes
Elena · Enterprise Architect Consults Architecture alignment when updating role interactions

Notification Obligations

ID Trigger Recipient Artifact Timing
N-013 Maintenance fix deployed All consumers of the fixed role Fix report + health status Immediate
N-019 Role health degradation detected Clara (CTO), role owner Health alert with diagnostics Immediate
N-020 Rebuild request for degraded role Ivan (Infrastructure Engineer) / Role Factory Degradation report + rebuild request When repair exceeds maintenance scope

Tools & Frameworks

  • Claude Code (automation, diagnostics)
  • Markdown (documentation, reports)
  • Git (history analysis, change tracking)

Success Criteria

Metric Target
Broken commands Zero — all registered commands parse and execute
Role health scores All Active roles scoring >= 10/15
Critical issue response time < 24 hours from report to fix
Maintenance log currency Updated after every maintenance action
Stale role detection No Active role goes > 30 days without review

Processes

See agent.md for automated capabilities.

Manual processes

  1. Health review cycle — Weekly scan of all Active roles for file completeness, score regressions, and staleness
  2. Dependency cascade — When a role changes its outputs or interactions, trace all downstream consumers and update them
  3. Technical debt triage — Maintain a prioritized list of known issues; work with Clara on scheduling
  4. Post-fix verification — After any fix, re-run the relevant eval cases to confirm the issue is resolved