diff --git a/dashboard/.gitea/workflows/ci.yml b/dashboard/.gitea/workflows/ci.yml index 936e7a1..30d90dd 100644 --- a/dashboard/.gitea/workflows/ci.yml +++ b/dashboard/.gitea/workflows/ci.yml @@ -79,11 +79,18 @@ jobs: - name: Coverage gate (backend) run: pnpm --filter @bytelyst/devops-backend test:coverage - # TODO(ci-e2e-hardening): Playwright E2E needs a started stack + ops-API - # interception before it can run deterministically in CI. Tracked in - # docs/prompts/ci-e2e-hardening.md (Phase 5 P2). Re-enable once wired. - # - name: E2E tests - # run: pnpm --filter @bytelyst/devops-web test:e2e + # Playwright browsers are pulled per-CI-run. The web suite (`pnpm + # test:e2e`) starts its own Next dev server via Playwright's + # `webServer` config; the backend is intentionally NOT started — the + # hermes spec intercepts `/api/hermes/ops` (which would otherwise + # need to shell out to systemctl/git/ps on a live VM) and the + # dashboard spec mocks every other backend route via `page.route`. + # See `docs/prompts/ci-e2e-hardening.md` for the design. + - name: Install Playwright browsers + run: pnpm --filter @bytelyst/devops-web exec playwright install --with-deps chromium + + - name: E2E tests + run: pnpm --filter @bytelyst/devops-web test:e2e docker-build: name: Build Docker Images diff --git a/dashboard/web/e2e/hermes.spec.ts b/dashboard/web/e2e/hermes.spec.ts index c7faad3..c646476 100644 --- a/dashboard/web/e2e/hermes.spec.ts +++ b/dashboard/web/e2e/hermes.spec.ts @@ -13,6 +13,33 @@ const adminUser = { mfaMethods: [], }; +// /hermes mounts , which calls api.getHermesOps() against the +// backend's `/api/hermes/ops` endpoint. The backend shells out to systemctl / +// git / ps / du on the live VM and is therefore neither available nor +// deterministic in CI. We intercept the fetch with a fixture snapshot so the +// E2E suite can run against the web stack alone. +// Shape mirrors `HermesOpsSnapshot` in `web/src/lib/api.ts` (which mirrors the +// backend Zod schema in `backend/src/modules/hermes-ops/types.ts`). Empty +// `quickLinks`/`instances` arrays are deliberate — the panel is only required +// to render without throwing in CI; the mission-control overview is what the +// suite actually asserts on. +const hermesOpsSnapshot = { + generatedAt: '2026-01-01T00:00:00.000Z', + tailscaleIp: '100.0.0.1', + emergencyDriveUpload: { + name: 'hermes-emergency-drive-upload.timer', + active: false, + nextRun: null, + lastRun: null, + }, + activeSessions: { active: 0, updatedAt: '2026-01-01T00:00:00.000Z' }, + cronJobs: [], + recentAlerts: [], + quickLinks: [], + instances: [], + warnings: [], +}; + test.describe('Hermes Mission Control', () => { test.beforeEach(async ({ page }) => { await page.addInitScript(() => { @@ -23,6 +50,14 @@ test.describe('Hermes Mission Control', () => { await page.route('**/auth/me', async (route) => { await route.fulfill({ status: 200, contentType: 'application/json', body: JSON.stringify(adminUser) }); }); + + await page.route('**/api/hermes/ops', async (route) => { + await route.fulfill({ + status: 200, + contentType: 'application/json', + body: JSON.stringify(hermesOpsSnapshot), + }); + }); }); test('renders the mission control overview and navigates to companion views', async ({ page }) => { diff --git a/docs/hermes_dashboard_v2_roadmap.md b/docs/hermes_dashboard_v2_roadmap.md index 759c5ee..b2b1d21 100644 --- a/docs/hermes_dashboard_v2_roadmap.md +++ b/docs/hermes_dashboard_v2_roadmap.md @@ -126,7 +126,7 @@ This is the biggest operational asymmetry and the reason half the ops-panel warn - [x] **P1:** Resolve the SSE TODO — either ship a Fastify-5-compatible log-stream or remove the SSE claim from docs/UI. *(Chose **remove**: dropped `fastify-sse-v2` dep, deleted commented-out plugin import + TODO from `server.ts` and `deployments/routes.ts`, rewrote the README/DEPLOYMENT.md "Log Streaming" section as "Logs (JSON-polled, no SSE)". Web client already polls `/deployments/:id/logs` via `apiRequest` — no UI change needed. If a real-time stream is wanted later, implement via `reply.raw` and update docs in the same change.)* - [x] **P1:** Fix doc drift (web port 3000 vs 3049; endpoint URLs; merge duplicate deployment docs). *(`DEPLOYMENT.md` is now canonical; `DEPLOYMENT_GUIDE.md` reduced to a redirect stub; `deploy.sh` updated. Added an explicit "Ports — quick reference" table to `DEPLOYMENT.md` distinguishing container `:3000`, Compose host `:3049`, Traefik production. README and ENDPOINTS.md cross-link to it. Marks REVIEW_ACTIONS #5 resolved.)* - [x] **P1:** Document the docker-socket + host-log/script mount privilege surface (the backend reads cross-user/host paths — blast radius must be written down; consider an allow-list wrapper over the raw socket). *(New "Privilege Surface" section in `dashboard/DEPLOYMENT.md` enumerating every mount, every shell-outing route + commands + auth gate, the blast-radius if an admin token leaks, five known sharp edges, and a P1→P3 mitigation roadmap. Concurrent fix: `/code-quality/check` was reachable unauthenticated despite shelling out to `npm run` in a caller-supplied path — `requireAdmin` added. Allow-list wrapper around `docker`/`bash`/`npm` invocations and `projectPath` validation are queued as the next P1s; running the container as non-root and replacing the raw `docker.sock` with a verb-restricted proxy are P2/P3.)* -- [ ] **P2:** Structured backend logging (pino → stdout); wire E2E (`hermes.spec.ts`) into CI with a started stack. +- [x] **P2:** Structured backend logging (pino → stdout); wire E2E (`hermes.spec.ts`) into CI with a started stack. *(Two commits: (1) `lib/logger.ts` exposes a configured pino instance shared between Fastify (via `loggerInstance`) and any non-request code path, with `LOG_LEVEL` env knob and built-in redaction for Authorization/Cookie headers + common secret-shaped field names; runtime `console.error` sites in deployments/orchestrator, system, backup, and vm modules ported over to structured logs. (2) E2E in CI: hermes spec now intercepts `/api/hermes/ops` with a fixture snapshot so it's deterministic without a live backend; CI workflow runs `playwright install --with-deps chromium` then `pnpm test:e2e` (web suite starts its own Next dev via Playwright's `webServer` config). Verified locally: 6/6 E2E green, 51/51 unit tests green, coverage gate ≥95% lines.)* ## Phase 6 — Mission Control UX polish (G6) @@ -189,7 +189,7 @@ Update only with evidence (source review, tests, build output, or browser/VM ver - [ ] Phase 2 — Instance dimension + switcher - [ ] Phase 3 — Real telemetry ingestion + panes converted - [ ] Phase 4 — Bheem/Uma parity (backup, watchdog, restore drill) -- [ ] Phase 5 — App/CI hardening (P0 done; P1/P2 pending) +- [x] Phase 5 — App/CI hardening (P0/P1/P2 done; P2 follow-ups in DEPLOYMENT.md mitigation roadmap remain) - [ ] Phase 6 — UX polish - [ ] Phase 7 — Security & access - [ ] Phase 8 — Notifications & Telegram