bytelyst-devops-tools/dashboard/REVIEW_ACTIONS.md
Hermes VM 3ee4e7104e fix(dashboard): Phase 5 P0 — correct CI workspace path + real ESLint
- ci.yml: actions/checkout into the runner workspace instead of cd-ing into a
  hard-coded host path and `git reset --hard origin/main` on the live checkout;
  install via `pnpm install:gitea` (self-contained, no sibling common-plat
  checkout); E2E step left as a TODO pointer (ci-e2e-hardening, Phase 5 P2).
- Fix the same stale /opt/bytelyst/bytelyst-devops-tools path in deploy.sh,
  scripts/deploy-hotcopy.sh, DEPLOYMENT.md, DEPLOYMENT_GUIDE.md.
- Replace the no-op `lint` echoes with real ESLint 9 flat configs (js +
  typescript-eslint recommended) for backend and web; add a root `pnpm lint`.
- Fix the 10 errors lint surfaced, incl. require('os') in an ESM backend
  (system/repository.ts -> import * as os), prefer-const x4, and a ternary
  expression-statement in web vm/page.tsx.

Verified locally: secret-scan, lint (0 errors; correctly fails on bad code),
typecheck, unit tests (backend 9 / web 11), and build all green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-30 06:50:32 +00:00

7.7 KiB

Dashboard Repo Review — Top Actions

Reviewed: 2026-05-27. Scope: /opt/bytelyst/learning_ai_devops_tools/dashboard (the ByteLyst DevOps Dashboard pnpm workspace: backend/ Fastify 5 + web/ Next.js 16).

Baseline state (verified during review):

  • pnpm typecheck — passes for both backend and web.
  • pnpm test:run — passes (backend 9 tests / 1 file, web 11 tests / 2 files).
  • pnpm secret-scan — clean.
  • .env is gitignored; only .env.example files are tracked.

The dashboard is functional and well-structured, but several issues block CI, hide regressions, and create operational risk. Actions are ordered by priority.


P0 — Broken / Urgent

1. CI workflow points at a non-existent path

.gitea/workflows/ci.yml runs everything from /opt/bytelyst/bytelyst-devops-tools/dashboard, but the actual checkout lives at /opt/bytelyst/learning_ai_devops_tools/dashboard. The same wrong path is hard-coded in DEPLOYMENT.md and scripts/deploy-hotcopy.sh.

  • Action: replace the hard-coded path with ${{ gitea.workspace }} (or a single WORKDIR env var) in <ref_file file="/opt/bytelyst/learning_ai_devops_tools/dashboard/.gitea/workflows/ci.yml" />, then fix the two other references in <ref_file file="/opt/bytelyst/learning_ai_devops_tools/dashboard/DEPLOYMENT.md" /> and <ref_file file="/opt/bytelyst/learning_ai_devops_tools/dashboard/scripts/deploy-hotcopy.sh" />.
  • Verify: trigger a CI run on a throwaway branch and confirm green.

2. "Lint" steps are no-ops

Both backend/package.json and web/package.json define lint as echo 'No linting configured...'. The CI step "Lint" therefore always passes regardless of code quality. There is no ESLint, Biome, or equivalent configured anywhere in the workspace.

  • Action: pick one tool (recommend ESLint + @typescript-eslint for backend, Next.js's built-in ESLint config for web, since next already ships it). Wire next lint into web/package.json and add a minimal .eslintrc to backend.
  • Verify: pnpm lint returns a non-zero exit on a deliberately bad change.

P1 — Important Gaps

3. Test coverage is extremely thin

Backend has 12 modules (services, deployments, health, audit, backup, system, env, azure-config, code-quality, cosmos-config, hermes-ops, vm) but only services has a test file. The deployment orchestrator (backend/src/modules/deployments/orchestrator.ts), CSRF (backend/src/lib/csrf.ts), and auth (backend/src/lib/auth.ts) — the highest-risk surfaces — have no tests at all.

  • Action: add *.test.ts for at least auth, csrf, deployments/orchestrator, and health repository before adding more features. Mirror the style of <ref_file file="/opt/bytelyst/learning_ai_devops_tools/dashboard/backend/src/modules/services/services.test.ts" />.
  • Add pnpm test:coverage to CI and fail under a threshold (start at 50 %, raise over time).

4. SSE deployment-log streaming is disabled

backend/src/server.ts and backend/src/modules/deployments/routes.ts contain TODO: fastify-sse-v2 has compatibility issues with Fastify 5, with the SSE plugin commented out. The README still advertises real-time log streaming and the frontend code in <ref_file file="/opt/bytelyst/learning_ai_devops_tools/dashboard/web/src/app/page.tsx" /> imports LogViewer, so the user-facing feature is silently broken.

  • Action: pin a Fastify-5-compatible SSE library (@fastify/eventsource, fastify-sse-v2 >= 5, or a small handcrafted handler using reply.raw) and re-enable the route, OR remove the SSE claims from README.md / ENDPOINTS.md until it ships. Choose one — do not leave the gap.

5. Documentation drift

  • README.md says "Web port: 3000" but docker-compose.yml exposes web as 3049:3000.

  • README.md lists API endpoints inline; ENDPOINTS.md is the canonical source and contradicts in places (e.g. note about https://api.bytelyst.com/api/devops vs https://api.bytelyst.com/devops).

  • DEPLOYMENT.md and DEPLOYMENT_GUIDE.md overlap; unclear which is canonical.

  • Action: pick ENDPOINTS.md as the single source for URLs and reduce README.md to a pointer. Merge the two deployment docs into one (DEPLOYMENT.md) and delete the loser. Fix the 3000 vs 3049 mismatch.

6. Docker socket + host log mounts are very privileged

docker-compose.yml mounts /var/run/docker.sock, the host scripts directory, and three host log paths into devops-backend. This is the same risk profile as Portainer but with custom code reading/writing those mounts. There is no documentation of which backend module talks to the docker socket or what commands it issues.

  • Action: document the privilege surface (which routes shell out, which call docker), and consider a thin allow-list wrapper instead of mounting the raw socket. At minimum, add a section to DEPLOYMENT.md enumerating these mounts and their purpose so reviewers know the blast radius.

P2 — Hygiene

7. Backend module structure isn't enforced

Most modules follow the routes.ts / repository.ts / types.ts triple, but a few have extras (deployments/orchestrator.ts). There is no architectural test, README, or generator. New contributors will diverge.

  • Action: add a short backend/src/modules/README.md describing the convention, and (optionally) an architectural test using dependency-cruiser or a custom vitest.

8. README is unfocused

README.md mixes "Recent Improvements" (a changelog), feature list, setup, env vars, and full API docs into one 219-line file. The first cat of the file even shows it begins with two blank lines after the title — easy to miss content.

  • Action: trim README to: what / quickstart / pointers. Move "Recent Improvements" into CHANGELOG.md and keep API docs only in ENDPOINTS.md / Swagger.

9. .pnpmfile.cjs dual-mode install is undocumented in CI

pnpm install:common-plat vs pnpm install:gitea is only mentioned in the README. The CI workflow uses install:common-plat, which only works if the runner has the sibling learning_ai_common_plat checkout available. That assumption isn't asserted anywhere.

  • Action: add a pre-install check that fails fast with a clear message if the expected workspace path is missing, and document the runner prerequisites in the CI file.

10. No production logging / metrics story

backend/src/server.ts uses Fastify's default logger only. There is a web/src/lib/telemetry.ts file but nothing wires it to a backend. The dashboard advertises "monitoring" but doesn't emit its own structured telemetry.

  • Action: decide on a target (pino transports → stdout for container logs is enough for now) and write down the choice. If Prometheus / OpenTelemetry is in scope, file a tracked issue rather than leaving it implied.

11. E2E tests aren't wired into local workflow

web/e2e/dashboard.spec.ts and web/e2e/hermes.spec.ts exist and pnpm test:e2e is defined, but nothing documents how to start the backend+web before running them, and CI's E2E step (visible in .gitea/workflows/ci.yml) is cut off in the file — need to confirm it actually launches the stack.

  • Action: read the bottom half of ci.yml and confirm the E2E job sets up backend+web; if not, fix it. Add a pnpm test:e2e recipe to README that explicitly says "run pnpm dev first" or use Playwright's webServer config.

Suggested execution order

  1. Fix the CI path (#1) — unblocks everything else.
  2. Reconcile the SSE TODO (#4) — either remove the claim or ship the feature.
  3. Add real linting (#2) and tighten test coverage on auth/csrf/orchestrator (#3).
  4. Documentation pass: ports, deployment docs, README trim (#5, #8).
  5. Privilege/operational hardening (#6, #10).
  6. Convention + DX polish (#7, #9, #11).

Each item above is small enough to land as a single PR.