# Dashboard Repo Review — Top Actions
Reviewed: 2026-05-27. Scope: `/opt/bytelyst/learning_ai_devops_tools/dashboard` (the ByteLyst DevOps Dashboard pnpm workspace: `backend/` Fastify 5 + `web/` Next.js 16).
Baseline state (verified during review):
- `pnpm typecheck` — passes for both backend and web.
- `pnpm test:run` — passes (backend 9 tests / 1 file, web 11 tests / 2 files).
- `pnpm secret-scan` — clean.
- `.env` is gitignored; only `.env.example` files are tracked.
The dashboard is functional and well-structured, but several issues block CI, hide regressions, and create operational risk. Actions are ordered by priority.
---
## P0 — Broken / Urgent
### 1. CI workflow points at a non-existent path
`.gitea/workflows/ci.yml` runs everything from `/opt/bytelyst/bytelyst-devops-tools/dashboard`, but the actual checkout lives at `/opt/bytelyst/learning_ai_devops_tools/dashboard`. The same wrong path is hard-coded in `DEPLOYMENT.md` and `scripts/deploy-hotcopy.sh`.
- Action: replace the hard-coded path with `${{ gitea.workspace }}` (or a single `WORKDIR` env var) in , then fix the two other references in and .
- Verify: trigger a CI run on a throwaway branch and confirm green.
### 2. "Lint" steps are no-ops
Both `backend/package.json` and `web/package.json` define `lint` as `echo 'No linting configured...'`. The CI step "Lint" therefore always passes regardless of code quality. There is no ESLint, Biome, or equivalent configured anywhere in the workspace.
- Action: pick one tool (recommend ESLint + `@typescript-eslint` for backend, Next.js's built-in ESLint config for web, since `next` already ships it). Wire `next lint` into `web/package.json` and add a minimal `.eslintrc` to backend.
- Verify: `pnpm lint` returns a non-zero exit on a deliberately bad change.
---
## P1 — Important Gaps
### 3. Test coverage is extremely thin
Backend has 12 modules (`services`, `deployments`, `health`, `audit`, `backup`, `system`, `env`, `azure-config`, `code-quality`, `cosmos-config`, `hermes-ops`, `vm`) but only `services` has a test file. The deployment orchestrator (`backend/src/modules/deployments/orchestrator.ts`), CSRF (`backend/src/lib/csrf.ts`), and auth (`backend/src/lib/auth.ts`) — the highest-risk surfaces — have no tests at all.
- Action: add `*.test.ts` for at least `auth`, `csrf`, `deployments/orchestrator`, and `health` repository before adding more features. Mirror the style of .
- Add `pnpm test:coverage` to CI and fail under a threshold (start at 50 %, raise over time).
### 4. SSE deployment-log streaming is disabled — RESOLVED (removed)
The TODO has been resolved by **removing the SSE claim**, not by shipping it: the `fastify-sse-v2` dependency is gone from `backend/package.json`, the commented-out import + plugin registration are gone from `backend/src/server.ts`, and the deployment-log endpoint is now documented as JSON-polled. The web client never used `EventSource` (`web/src/lib/api.ts` already polls `/api/deployments/:id/logs` via the normal `apiRequest` helper), so no UI change was required. README/DEPLOYMENT.md updated to match. If a real-time stream is wanted later, ship it explicitly via `reply.raw` and update the docs in the same change.
### 5. Documentation drift — RESOLVED
- `DEPLOYMENT.md` is now the single canonical deployment guide; `DEPLOYMENT_GUIDE.md` is reduced to a one-line redirect for backwards compat with `deploy.sh` and external links. `deploy.sh` updated to reference `DEPLOYMENT.md`.
- `DEPLOYMENT.md` carries an explicit **Ports — quick reference** table that distinguishes container port (`:3000`), Compose host port (`:3049`), and the Traefik production URL — so the 3000-vs-3049 question has one truthful answer per deployment mode rather than three contradictory prose claims.
- `README.md` "Web port: 3000" rewritten to call out container vs Compose host vs dev-mode explicitly and link to the port table.
- `ENDPOINTS.md` got a top-of-file note: every `localhost:3000` URL in the file refers to `pnpm dev`; substitute `:3049` for the Dockerized stack. The `https://api.bytelyst.com/api/devops` vs `/devops` ambiguity was already resolved by the existing "URL Note" section (kept).
### 6. Docker socket + host log mounts are very privileged — RESOLVED (documented; allow-list wrapper queued)
`DEPLOYMENT.md` now has a **Privilege Surface** section enumerating every host mount, every shell-outing route + the exact commands it runs, the auth gate on each, and an explicit blast-radius statement ("an admin token is equivalent to a host shell, modulo what the codebase wraps"). Concurrent fix: the `/code-quality/check` endpoint was missing `requireAdmin` and was therefore reachable unauthenticated even though it shells out to `npm run typecheck/lint/build/test:run` in a caller-supplied `projectPath` — that's been gated to admin in the same commit. Two follow-up P1s remain in the doc's mitigation roadmap (allow-list wrapper around shell-outs; validate `code-quality/check` `projectPath` against an allowed root set); P2/P3 cover audit-logging shell-outs, dropping root in the container, and moving off the raw docker socket.
---
## P2 — Hygiene
### 7. Backend module structure isn't enforced
Most modules follow the `routes.ts / repository.ts / types.ts` triple, but a few have extras (`deployments/orchestrator.ts`). There is no architectural test, README, or generator. New contributors will diverge.
- Action: add a short `backend/src/modules/README.md` describing the convention, and (optionally) an architectural test using `dependency-cruiser` or a custom vitest.
### 8. README is unfocused
`README.md` mixes "Recent Improvements" (a changelog), feature list, setup, env vars, and full API docs into one 219-line file. The first cat of the file even shows it begins with two blank lines after the title — easy to miss content.
- Action: trim README to: what / quickstart / pointers. Move "Recent Improvements" into `CHANGELOG.md` and keep API docs only in `ENDPOINTS.md` / Swagger.
### 9. `.pnpmfile.cjs` dual-mode install is undocumented in CI
`pnpm install:common-plat` vs `pnpm install:gitea` is only mentioned in the README. The CI workflow uses `install:common-plat`, which only works if the runner has the sibling `learning_ai_common_plat` checkout available. That assumption isn't asserted anywhere.
- Action: add a pre-install check that fails fast with a clear message if the expected workspace path is missing, and document the runner prerequisites in the CI file.
### 10. No production logging / metrics story
`backend/src/server.ts` uses Fastify's default logger only. There is a `web/src/lib/telemetry.ts` file but nothing wires it to a backend. The dashboard advertises "monitoring" but doesn't emit its own structured telemetry.
- Action: decide on a target (pino transports → stdout for container logs is enough for now) and write down the choice. If Prometheus / OpenTelemetry is in scope, file a tracked issue rather than leaving it implied.
### 11. E2E tests aren't wired into local workflow
`web/e2e/dashboard.spec.ts` and `web/e2e/hermes.spec.ts` exist and `pnpm test:e2e` is defined, but nothing documents how to start the backend+web before running them, and CI's E2E step (visible in `.gitea/workflows/ci.yml`) is cut off in the file — need to confirm it actually launches the stack.
- Action: read the bottom half of `ci.yml` and confirm the E2E job sets up backend+web; if not, fix it. Add a `pnpm test:e2e` recipe to README that explicitly says "run `pnpm dev` first" or use Playwright's `webServer` config.
---
## Suggested execution order
1. Fix the CI path (#1) — unblocks everything else.
2. Reconcile the SSE TODO (#4) — either remove the claim or ship the feature.
3. Add real linting (#2) and tighten test coverage on auth/csrf/orchestrator (#3).
4. Documentation pass: ports, deployment docs, README trim (#5, #8).
5. Privilege/operational hardening (#6, #10).
6. Convention + DX polish (#7, #9, #11).
Each item above is small enough to land as a single PR.