Closes the final Phase 5 P1 checkbox and REVIEW_ACTIONS #6. The backend container has root-equivalent host access via the docker socket, host log mounts, and the VM scripts mount, but until now the "who can do what to the host?" answer was scattered across compose files and route handlers. This commit centralizes it. DEPLOYMENT.md gains a "Privilege Surface" section that lists: - every host mount + container path + mode + purpose - every shell-outing route, the actual commands it runs, and the auth gate on each - what an admin token can do today (≈ host shell) - five known sharp edges (un-allow-listed container names, unvalidated projectPath, no per-route audit-log on shell-outs, container runs as root, global rate-limit only) - a P1 → P3 mitigation roadmap (allow-list wrapper around shell-outs, projectPath validation, audit-logging shell-outs, drop root in container, replace docker.sock with a verb-restricted proxy) Concurrent code fix: `POST /code-quality/check` was reachable **unauthenticated** despite shelling out to `npm run typecheck/lint/ build/test:run` in a caller-supplied `projectPath`. Added `preHandler: requireAdmin` to bring it in line with every other shell-outing route in the dashboard. Same commit because the documentation table promises this gate exists. REVIEW_ACTIONS #6 marked RESOLVED with the rationale; roadmap checkbox ticked. Tests, typecheck, lint (0 errors), build, and coverage gate (≥95% lines on every gated file) all stay green. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
92 lines
8.3 KiB
Markdown
92 lines
8.3 KiB
Markdown
# Dashboard Repo Review — Top Actions
|
|
|
|
Reviewed: 2026-05-27. Scope: `/opt/bytelyst/learning_ai_devops_tools/dashboard` (the ByteLyst DevOps Dashboard pnpm workspace: `backend/` Fastify 5 + `web/` Next.js 16).
|
|
|
|
Baseline state (verified during review):
|
|
- `pnpm typecheck` — passes for both backend and web.
|
|
- `pnpm test:run` — passes (backend 9 tests / 1 file, web 11 tests / 2 files).
|
|
- `pnpm secret-scan` — clean.
|
|
- `.env` is gitignored; only `.env.example` files are tracked.
|
|
|
|
The dashboard is functional and well-structured, but several issues block CI, hide regressions, and create operational risk. Actions are ordered by priority.
|
|
|
|
---
|
|
|
|
## P0 — Broken / Urgent
|
|
|
|
### 1. CI workflow points at a non-existent path
|
|
`.gitea/workflows/ci.yml` runs everything from `/opt/bytelyst/bytelyst-devops-tools/dashboard`, but the actual checkout lives at `/opt/bytelyst/learning_ai_devops_tools/dashboard`. The same wrong path is hard-coded in `DEPLOYMENT.md` and `scripts/deploy-hotcopy.sh`.
|
|
|
|
- Action: replace the hard-coded path with `${{ gitea.workspace }}` (or a single `WORKDIR` env var) in <ref_file file="/opt/bytelyst/learning_ai_devops_tools/dashboard/.gitea/workflows/ci.yml" />, then fix the two other references in <ref_file file="/opt/bytelyst/learning_ai_devops_tools/dashboard/DEPLOYMENT.md" /> and <ref_file file="/opt/bytelyst/learning_ai_devops_tools/dashboard/scripts/deploy-hotcopy.sh" />.
|
|
- Verify: trigger a CI run on a throwaway branch and confirm green.
|
|
|
|
### 2. "Lint" steps are no-ops
|
|
Both `backend/package.json` and `web/package.json` define `lint` as `echo 'No linting configured...'`. The CI step "Lint" therefore always passes regardless of code quality. There is no ESLint, Biome, or equivalent configured anywhere in the workspace.
|
|
|
|
- Action: pick one tool (recommend ESLint + `@typescript-eslint` for backend, Next.js's built-in ESLint config for web, since `next` already ships it). Wire `next lint` into `web/package.json` and add a minimal `.eslintrc` to backend.
|
|
- Verify: `pnpm lint` returns a non-zero exit on a deliberately bad change.
|
|
|
|
---
|
|
|
|
## P1 — Important Gaps
|
|
|
|
### 3. Test coverage is extremely thin
|
|
Backend has 12 modules (`services`, `deployments`, `health`, `audit`, `backup`, `system`, `env`, `azure-config`, `code-quality`, `cosmos-config`, `hermes-ops`, `vm`) but only `services` has a test file. The deployment orchestrator (`backend/src/modules/deployments/orchestrator.ts`), CSRF (`backend/src/lib/csrf.ts`), and auth (`backend/src/lib/auth.ts`) — the highest-risk surfaces — have no tests at all.
|
|
|
|
- Action: add `*.test.ts` for at least `auth`, `csrf`, `deployments/orchestrator`, and `health` repository before adding more features. Mirror the style of <ref_file file="/opt/bytelyst/learning_ai_devops_tools/dashboard/backend/src/modules/services/services.test.ts" />.
|
|
- Add `pnpm test:coverage` to CI and fail under a threshold (start at 50 %, raise over time).
|
|
|
|
### 4. SSE deployment-log streaming is disabled — RESOLVED (removed)
|
|
The TODO has been resolved by **removing the SSE claim**, not by shipping it: the `fastify-sse-v2` dependency is gone from `backend/package.json`, the commented-out import + plugin registration are gone from `backend/src/server.ts`, and the deployment-log endpoint is now documented as JSON-polled. The web client never used `EventSource` (`web/src/lib/api.ts` already polls `/api/deployments/:id/logs` via the normal `apiRequest` helper), so no UI change was required. README/DEPLOYMENT.md updated to match. If a real-time stream is wanted later, ship it explicitly via `reply.raw` and update the docs in the same change.
|
|
|
|
### 5. Documentation drift — RESOLVED
|
|
- `DEPLOYMENT.md` is now the single canonical deployment guide; `DEPLOYMENT_GUIDE.md` is reduced to a one-line redirect for backwards compat with `deploy.sh` and external links. `deploy.sh` updated to reference `DEPLOYMENT.md`.
|
|
- `DEPLOYMENT.md` carries an explicit **Ports — quick reference** table that distinguishes container port (`:3000`), Compose host port (`:3049`), and the Traefik production URL — so the 3000-vs-3049 question has one truthful answer per deployment mode rather than three contradictory prose claims.
|
|
- `README.md` "Web port: 3000" rewritten to call out container vs Compose host vs dev-mode explicitly and link to the port table.
|
|
- `ENDPOINTS.md` got a top-of-file note: every `localhost:3000` URL in the file refers to `pnpm dev`; substitute `:3049` for the Dockerized stack. The `https://api.bytelyst.com/api/devops` vs `/devops` ambiguity was already resolved by the existing "URL Note" section (kept).
|
|
|
|
### 6. Docker socket + host log mounts are very privileged — RESOLVED (documented; allow-list wrapper queued)
|
|
`DEPLOYMENT.md` now has a **Privilege Surface** section enumerating every host mount, every shell-outing route + the exact commands it runs, the auth gate on each, and an explicit blast-radius statement ("an admin token is equivalent to a host shell, modulo what the codebase wraps"). Concurrent fix: the `/code-quality/check` endpoint was missing `requireAdmin` and was therefore reachable unauthenticated even though it shells out to `npm run typecheck/lint/build/test:run` in a caller-supplied `projectPath` — that's been gated to admin in the same commit. Two follow-up P1s remain in the doc's mitigation roadmap (allow-list wrapper around shell-outs; validate `code-quality/check` `projectPath` against an allowed root set); P2/P3 cover audit-logging shell-outs, dropping root in the container, and moving off the raw docker socket.
|
|
|
|
---
|
|
|
|
## P2 — Hygiene
|
|
|
|
### 7. Backend module structure isn't enforced
|
|
Most modules follow the `routes.ts / repository.ts / types.ts` triple, but a few have extras (`deployments/orchestrator.ts`). There is no architectural test, README, or generator. New contributors will diverge.
|
|
|
|
- Action: add a short `backend/src/modules/README.md` describing the convention, and (optionally) an architectural test using `dependency-cruiser` or a custom vitest.
|
|
|
|
### 8. README is unfocused
|
|
`README.md` mixes "Recent Improvements" (a changelog), feature list, setup, env vars, and full API docs into one 219-line file. The first cat of the file even shows it begins with two blank lines after the title — easy to miss content.
|
|
|
|
- Action: trim README to: what / quickstart / pointers. Move "Recent Improvements" into `CHANGELOG.md` and keep API docs only in `ENDPOINTS.md` / Swagger.
|
|
|
|
### 9. `.pnpmfile.cjs` dual-mode install is undocumented in CI
|
|
`pnpm install:common-plat` vs `pnpm install:gitea` is only mentioned in the README. The CI workflow uses `install:common-plat`, which only works if the runner has the sibling `learning_ai_common_plat` checkout available. That assumption isn't asserted anywhere.
|
|
|
|
- Action: add a pre-install check that fails fast with a clear message if the expected workspace path is missing, and document the runner prerequisites in the CI file.
|
|
|
|
### 10. No production logging / metrics story
|
|
`backend/src/server.ts` uses Fastify's default logger only. There is a `web/src/lib/telemetry.ts` file but nothing wires it to a backend. The dashboard advertises "monitoring" but doesn't emit its own structured telemetry.
|
|
|
|
- Action: decide on a target (pino transports → stdout for container logs is enough for now) and write down the choice. If Prometheus / OpenTelemetry is in scope, file a tracked issue rather than leaving it implied.
|
|
|
|
### 11. E2E tests aren't wired into local workflow
|
|
`web/e2e/dashboard.spec.ts` and `web/e2e/hermes.spec.ts` exist and `pnpm test:e2e` is defined, but nothing documents how to start the backend+web before running them, and CI's E2E step (visible in `.gitea/workflows/ci.yml`) is cut off in the file — need to confirm it actually launches the stack.
|
|
|
|
- Action: read the bottom half of `ci.yml` and confirm the E2E job sets up backend+web; if not, fix it. Add a `pnpm test:e2e` recipe to README that explicitly says "run `pnpm dev` first" or use Playwright's `webServer` config.
|
|
|
|
---
|
|
|
|
## Suggested execution order
|
|
|
|
1. Fix the CI path (#1) — unblocks everything else.
|
|
2. Reconcile the SSE TODO (#4) — either remove the claim or ship the feature.
|
|
3. Add real linting (#2) and tighten test coverage on auth/csrf/orchestrator (#3).
|
|
4. Documentation pass: ports, deployment docs, README trim (#5, #8).
|
|
5. Privilege/operational hardening (#6, #10).
|
|
6. Convention + DX polish (#7, #9, #11).
|
|
|
|
Each item above is small enough to land as a single PR.
|