Closes the Phase 5 P1 doc-drift checkbox and REVIEW_ACTIONS #5. The 3000-vs-3049 confusion came from prose claims in three docs that each picked a different "right" answer. The truth is: the web container listens on :3000; docker-compose maps `127.0.0.1:3049:3000`; production is fronted by Traefik on `https://devops.bytelyst.com`. Encoding that explicitly so future readers don't have to dig through compose files: - DEPLOYMENT.md becomes canonical. Its content is now the (more accurate) old DEPLOYMENT_GUIDE.md merged with a "Ports — quick reference" table covering Local dev / Docker Compose / Production Traefik, plus a Local-development section for `pnpm dev`. - DEPLOYMENT_GUIDE.md → 5-line redirect stub pointing at DEPLOYMENT.md (kept for `deploy.sh` and any external links). - deploy.sh updated to point at DEPLOYMENT.md. - README.md "Web port: 3000" line rewritten to spell out container vs Compose-host vs dev-mode and link to the port table. - ENDPOINTS.md gets a top-of-file note: every `localhost:3000` URL in that file is the `pnpm dev` workflow; substitute `:3049` for the Dockerized stack. - REVIEW_ACTIONS.md #5 marked RESOLVED with the rationale. No code, behavior, lint, or test changes. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
94 lines
8.0 KiB
Markdown
94 lines
8.0 KiB
Markdown
# Dashboard Repo Review — Top Actions
|
|
|
|
Reviewed: 2026-05-27. Scope: `/opt/bytelyst/learning_ai_devops_tools/dashboard` (the ByteLyst DevOps Dashboard pnpm workspace: `backend/` Fastify 5 + `web/` Next.js 16).
|
|
|
|
Baseline state (verified during review):
|
|
- `pnpm typecheck` — passes for both backend and web.
|
|
- `pnpm test:run` — passes (backend 9 tests / 1 file, web 11 tests / 2 files).
|
|
- `pnpm secret-scan` — clean.
|
|
- `.env` is gitignored; only `.env.example` files are tracked.
|
|
|
|
The dashboard is functional and well-structured, but several issues block CI, hide regressions, and create operational risk. Actions are ordered by priority.
|
|
|
|
---
|
|
|
|
## P0 — Broken / Urgent
|
|
|
|
### 1. CI workflow points at a non-existent path
|
|
`.gitea/workflows/ci.yml` runs everything from `/opt/bytelyst/bytelyst-devops-tools/dashboard`, but the actual checkout lives at `/opt/bytelyst/learning_ai_devops_tools/dashboard`. The same wrong path is hard-coded in `DEPLOYMENT.md` and `scripts/deploy-hotcopy.sh`.
|
|
|
|
- Action: replace the hard-coded path with `${{ gitea.workspace }}` (or a single `WORKDIR` env var) in <ref_file file="/opt/bytelyst/learning_ai_devops_tools/dashboard/.gitea/workflows/ci.yml" />, then fix the two other references in <ref_file file="/opt/bytelyst/learning_ai_devops_tools/dashboard/DEPLOYMENT.md" /> and <ref_file file="/opt/bytelyst/learning_ai_devops_tools/dashboard/scripts/deploy-hotcopy.sh" />.
|
|
- Verify: trigger a CI run on a throwaway branch and confirm green.
|
|
|
|
### 2. "Lint" steps are no-ops
|
|
Both `backend/package.json` and `web/package.json` define `lint` as `echo 'No linting configured...'`. The CI step "Lint" therefore always passes regardless of code quality. There is no ESLint, Biome, or equivalent configured anywhere in the workspace.
|
|
|
|
- Action: pick one tool (recommend ESLint + `@typescript-eslint` for backend, Next.js's built-in ESLint config for web, since `next` already ships it). Wire `next lint` into `web/package.json` and add a minimal `.eslintrc` to backend.
|
|
- Verify: `pnpm lint` returns a non-zero exit on a deliberately bad change.
|
|
|
|
---
|
|
|
|
## P1 — Important Gaps
|
|
|
|
### 3. Test coverage is extremely thin
|
|
Backend has 12 modules (`services`, `deployments`, `health`, `audit`, `backup`, `system`, `env`, `azure-config`, `code-quality`, `cosmos-config`, `hermes-ops`, `vm`) but only `services` has a test file. The deployment orchestrator (`backend/src/modules/deployments/orchestrator.ts`), CSRF (`backend/src/lib/csrf.ts`), and auth (`backend/src/lib/auth.ts`) — the highest-risk surfaces — have no tests at all.
|
|
|
|
- Action: add `*.test.ts` for at least `auth`, `csrf`, `deployments/orchestrator`, and `health` repository before adding more features. Mirror the style of <ref_file file="/opt/bytelyst/learning_ai_devops_tools/dashboard/backend/src/modules/services/services.test.ts" />.
|
|
- Add `pnpm test:coverage` to CI and fail under a threshold (start at 50 %, raise over time).
|
|
|
|
### 4. SSE deployment-log streaming is disabled — RESOLVED (removed)
|
|
The TODO has been resolved by **removing the SSE claim**, not by shipping it: the `fastify-sse-v2` dependency is gone from `backend/package.json`, the commented-out import + plugin registration are gone from `backend/src/server.ts`, and the deployment-log endpoint is now documented as JSON-polled. The web client never used `EventSource` (`web/src/lib/api.ts` already polls `/api/deployments/:id/logs` via the normal `apiRequest` helper), so no UI change was required. README/DEPLOYMENT.md updated to match. If a real-time stream is wanted later, ship it explicitly via `reply.raw` and update the docs in the same change.
|
|
|
|
### 5. Documentation drift — RESOLVED
|
|
- `DEPLOYMENT.md` is now the single canonical deployment guide; `DEPLOYMENT_GUIDE.md` is reduced to a one-line redirect for backwards compat with `deploy.sh` and external links. `deploy.sh` updated to reference `DEPLOYMENT.md`.
|
|
- `DEPLOYMENT.md` carries an explicit **Ports — quick reference** table that distinguishes container port (`:3000`), Compose host port (`:3049`), and the Traefik production URL — so the 3000-vs-3049 question has one truthful answer per deployment mode rather than three contradictory prose claims.
|
|
- `README.md` "Web port: 3000" rewritten to call out container vs Compose host vs dev-mode explicitly and link to the port table.
|
|
- `ENDPOINTS.md` got a top-of-file note: every `localhost:3000` URL in the file refers to `pnpm dev`; substitute `:3049` for the Dockerized stack. The `https://api.bytelyst.com/api/devops` vs `/devops` ambiguity was already resolved by the existing "URL Note" section (kept).
|
|
|
|
### 6. Docker socket + host log mounts are very privileged
|
|
`docker-compose.yml` mounts `/var/run/docker.sock`, the host `scripts` directory, and three host log paths into `devops-backend`. This is the same risk profile as Portainer but with custom code reading/writing those mounts. There is no documentation of which backend module talks to the docker socket or what commands it issues.
|
|
|
|
- Action: document the privilege surface (which routes shell out, which call `docker`), and consider a thin allow-list wrapper instead of mounting the raw socket. At minimum, add a section to `DEPLOYMENT.md` enumerating these mounts and their purpose so reviewers know the blast radius.
|
|
|
|
---
|
|
|
|
## P2 — Hygiene
|
|
|
|
### 7. Backend module structure isn't enforced
|
|
Most modules follow the `routes.ts / repository.ts / types.ts` triple, but a few have extras (`deployments/orchestrator.ts`). There is no architectural test, README, or generator. New contributors will diverge.
|
|
|
|
- Action: add a short `backend/src/modules/README.md` describing the convention, and (optionally) an architectural test using `dependency-cruiser` or a custom vitest.
|
|
|
|
### 8. README is unfocused
|
|
`README.md` mixes "Recent Improvements" (a changelog), feature list, setup, env vars, and full API docs into one 219-line file. The first cat of the file even shows it begins with two blank lines after the title — easy to miss content.
|
|
|
|
- Action: trim README to: what / quickstart / pointers. Move "Recent Improvements" into `CHANGELOG.md` and keep API docs only in `ENDPOINTS.md` / Swagger.
|
|
|
|
### 9. `.pnpmfile.cjs` dual-mode install is undocumented in CI
|
|
`pnpm install:common-plat` vs `pnpm install:gitea` is only mentioned in the README. The CI workflow uses `install:common-plat`, which only works if the runner has the sibling `learning_ai_common_plat` checkout available. That assumption isn't asserted anywhere.
|
|
|
|
- Action: add a pre-install check that fails fast with a clear message if the expected workspace path is missing, and document the runner prerequisites in the CI file.
|
|
|
|
### 10. No production logging / metrics story
|
|
`backend/src/server.ts` uses Fastify's default logger only. There is a `web/src/lib/telemetry.ts` file but nothing wires it to a backend. The dashboard advertises "monitoring" but doesn't emit its own structured telemetry.
|
|
|
|
- Action: decide on a target (pino transports → stdout for container logs is enough for now) and write down the choice. If Prometheus / OpenTelemetry is in scope, file a tracked issue rather than leaving it implied.
|
|
|
|
### 11. E2E tests aren't wired into local workflow
|
|
`web/e2e/dashboard.spec.ts` and `web/e2e/hermes.spec.ts` exist and `pnpm test:e2e` is defined, but nothing documents how to start the backend+web before running them, and CI's E2E step (visible in `.gitea/workflows/ci.yml`) is cut off in the file — need to confirm it actually launches the stack.
|
|
|
|
- Action: read the bottom half of `ci.yml` and confirm the E2E job sets up backend+web; if not, fix it. Add a `pnpm test:e2e` recipe to README that explicitly says "run `pnpm dev` first" or use Playwright's `webServer` config.
|
|
|
|
---
|
|
|
|
## Suggested execution order
|
|
|
|
1. Fix the CI path (#1) — unblocks everything else.
|
|
2. Reconcile the SSE TODO (#4) — either remove the claim or ship the feature.
|
|
3. Add real linting (#2) and tighten test coverage on auth/csrf/orchestrator (#3).
|
|
4. Documentation pass: ports, deployment docs, README trim (#5, #8).
|
|
5. Privilege/operational hardening (#6, #10).
|
|
6. Convention + DX polish (#7, #9, #11).
|
|
|
|
Each item above is small enough to land as a single PR.
|