bytelyst-devops-tools/dashboard
Hermes VM 74a8ee0993 feat(dashboard): close 3 of 5 Phase 5 P2 mitigation items (allow-list, projectPath, audit-log)
Closes the three Phase 5 P2 follow-ups from the DEPLOYMENT.md
mitigation roadmap that don't need infra changes. Two P2 items remain
(non-root container, docker-proxy daemon) — both genuinely need
container/orchestration work and stay queued.

1. Allow-list shell wrapper (P1)
   New `lib/shell.ts`:
     - `execAllowed(cmd, args, opts)` — `execFile`-only, no shell, no
       interpolation. Single escape hatch for ad-hoc invocations.
     - `dockerRestart(name)` — name validated against
       `[a-zA-Z0-9][a-zA-Z0-9._-]{0,127}`; throws InvalidShellArgError
       on anything else (including non-strings, shell metacharacters,
       command-substitution attempts). Tests cover all of these.
     - `dockerPrune(kind, {all?})` — kind constrained to
       {container,image,volume,builder}; `--all` only valid for image.
     - `runBashScript(path, args, {allowedRoots})` — script path AND
       cwd both checked against allowed roots; rejects `..` escapes
       and prefix-matching siblings (`/opt/projects-evil` vs
       `/opt/projects`).
     - `runNpmScript(script, {cwd, allowedRoots})` — script ∈
       {typecheck,lint,build,test,test:run,start}; cwd inside roots.
   17 unit tests cover every rejection path. Module added to the
   coverage gate (≥95% lines).

   Migrated highest-risk callers off template-literal `exec`:
     - `vm/repository.ts:restartContainer` → `dockerRestart`. Was
       previously `await execAsync(\`docker restart "${name}"\`)`
       with only a regex check; now goes through the wrapper.
     - `system/repository.ts:dockerCleanup` → `dockerPrune` per kind
       + `execAllowed` for `docker system df`. Drops the array of
       template-literal command strings entirely.
     - `code-quality/repository.ts` → `runNpmScript` for every
       lifecycle invocation. cwd is now the resolved (normalised,
       `..`-collapsed) path, not the raw input.

2. projectPath validation for /code-quality/check (P1)
   `runCodeQualityCheck` now calls
   `assertPathInAllowedRoots(projectPath, getAllowedRoots())` before
   any subprocess spawns. `getAllowedRoots()` reads
   `CODE_QUALITY_ALLOWED_ROOTS` (colon-separated env, defaults to
   `/opt/bytelyst`). Rejection happens with a clear error message
   listing the configured roots so operators know what to allow.

3. Audit-log every privileged shell-out (P2)
   `audit/types.ts` extended: `action` now includes `'shell-exec'`,
   `entityType` includes `'host'`. The migration is additive — old
   audit rows still validate.

   Three privileged routes now write a `shell-exec` audit row with
   actor (authUserId / authRole), entity id, and a sanitized details
   payload before responding:
     - `POST /docker/cleanup` — `entityId: docker-cleanup:<type>`,
       details include {type, force, freedSpace}.
     - `POST /vm/cleanup` — `entityId: vm-cleanup:<mode>`.
     - `POST /vm/containers/:name/restart` — `entityId:
       container-restart:<name>`, details include {success, message}.
       Audited even on failure so attempted privileged actions are
       still recorded.
   Audit writes are best-effort — a Cosmos hiccup logs a warn but
   never fails the request the operator was running.

Verified: backend typecheck , 74/74 unit tests  (17 new for
shell.ts + audit changes), 7/7 E2E , lint 0 errors, coverage gate
≥95% lines on every gated file (which now includes shell.ts).

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2026-05-30 08:18:50 +00:00
..
.gitea/workflows ci(dashboard): Phase 5 P2 — wire Playwright E2E into Gitea CI 2026-05-30 07:28:50 +00:00
backend feat(dashboard): close 3 of 5 Phase 5 P2 mitigation items (allow-list, projectPath, audit-log) 2026-05-30 08:18:50 +00:00
scripts fix(dashboard): Phase 5 P0 — correct CI workspace path + real ESLint 2026-05-30 06:50:32 +00:00
shared feat(devops): adopt trading web deployment model with docker-compose 2026-05-11 03:24:11 +00:00
web feat(dashboard): Phase 6 — severity-tagged alerts + per-instance actions + deep links 2026-05-30 08:03:57 +00:00
.gitignore feat(devops): adopt trading web deployment model with docker-compose 2026-05-11 03:24:11 +00:00
.pnpmfile.cjs feat(devops): adopt trading web deployment model with docker-compose 2026-05-11 03:24:11 +00:00
deploy.sh docs(dashboard): Phase 5 P1 — fix port/endpoint drift, dedupe deployment docs 2026-05-30 07:03:05 +00:00
DEPLOYMENT_GUIDE.md docs(dashboard): Phase 5 P1 — fix port/endpoint drift, dedupe deployment docs 2026-05-30 07:03:05 +00:00
DEPLOYMENT.md feat(dashboard): close 3 of 5 Phase 5 P2 mitigation items (allow-list, projectPath, audit-log) 2026-05-30 08:18:50 +00:00
docker-compose.yml feat(infra): Phase 2.3 — memory limits across all active Docker stacks 2026-05-30 05:26:49 +00:00
ENDPOINTS.md docs(dashboard): Phase 5 P1 — fix port/endpoint drift, dedupe deployment docs 2026-05-30 07:03:05 +00:00
package.json fix(dashboard): Phase 5 P0 — correct CI workspace path + real ESLint 2026-05-30 06:50:32 +00:00
pnpm-lock.yaml chore(dashboard): Phase 5 P1 — remove dead SSE log-stream claim 2026-05-30 07:00:07 +00:00
pnpm-workspace.yaml feat(devops): adopt trading web deployment model with docker-compose 2026-05-11 03:24:11 +00:00
README.md docs(dashboard): Phase 5 P1 — fix port/endpoint drift, dedupe deployment docs 2026-05-30 07:03:05 +00:00
REVIEW_ACTIONS.md docs(dashboard): Phase 5 P1 — document privilege surface; gate /code-quality/check 2026-05-30 07:05:51 +00:00

ByteLyst DevOps Dashboard

Internal DevOps dashboard for deployment orchestration and service monitoring across ByteLyst products.

Architecture

dashboard/
├── backend/          # Fastify 5 backend (port 4004)
│   └── src/
│       ├── lib/      # Config, auth, Cosmos
│       └── modules/  # Services, deployments, health
├── web/              # Next.js 16 frontend (port 3000)
│   └── src/
│       ├── app/      # Pages
│       └── lib/      # API client, auth
└── shared/
    └── product.json  # Product identity

Features

  • Service Registry: Manage all ByteLyst services (trading, notes, clock, etc.)
  • Deployment Orchestration: Trigger deployments via existing bash scripts
  • Health Monitoring: Real-time health checks for all services with caching
  • Deployment History: Audit trail of all deployments with captured logs (JSON-polled by the web client; no SSE)
  • Cross-Navigation: One-click link to Platform Admin dashboard
  • Hermes Mission Control: Read-only mock dashboard for portfolio-wide execution, task ledger, product health, history, agents, and settings
  • Testing: Vitest for backend, React Testing Library for frontend
  • Security: Rate limiting, CORS, security headers, Zod validation
  • Auto-Refresh: Automatic health status updates every 60 seconds

Recent Improvements

Testing Infrastructure

  • Added Vitest for backend testing with test files for services and deployments
  • Added React Testing Library for frontend with API client tests
  • Test scripts: pnpm test (watch mode), pnpm test:run (CI mode)

Health Monitoring

  • Implemented actual HTTP health checks with 10-second timeout
  • Added 30-second caching to avoid overwhelming services
  • Added User-Agent header for health check requests
  • Added admin endpoint to clear health cache (DELETE /api/health/cache)
  • Health status determined by response time: >5s = degraded

API Validation

  • Added Zod schemas for all API routes (services, deployments, health)
  • Proper error handling with BadRequestError from @bytelyst/errors
  • Validated path parameters, query parameters, and request bodies
  • Strict validation on update operations to prevent accidental field changes

Deployment Logs

  • Endpoint GET /api/deployments/:id/logs returns the full captured stdout/stderr + current status as a single JSON payload (admin only).
  • The web client polls this endpoint while a deployment is running. There is intentionally no SSE/WebSocket stream — the previous attempt with fastify-sse-v2 was incompatible with Fastify 5 and was removed. If a real-time stream is needed later, implement it explicitly via reply.raw and update this section in the same change.

Security Enhancements

  • Added rate limiting: 100 requests per minute per IP
  • Improved CORS with allowed origins whitelist
  • Added security headers: X-Content-Type-Options, X-Frame-Options, X-XSS-Protection, HSTS, Referrer-Policy
  • OPTIONS preflight request handling
  • Credentials support for authenticated requests

Auto-Refresh

  • Automatic health status refresh every 60 seconds
  • Manual refresh button to clear cache and force health checks
  • Visual feedback with spinning icon during refresh
  • Last health check timestamp displayed on service cards

Setup

Prerequisites

  • Node.js 22+
  • pnpm 10.6.5
  • Azure Cosmos DB credentials
  • Platform Service URL
  • Access to @bytelyst/* packages (via common-plat workspace or Gitea registry)

Installation

The dashboard uses the .pnpmfile.cjs pattern for dynamic dependency resolution, supporting both local workspace and Gitea registry modes.

# For local development (uses workspace links to learning_ai_common_plat)
pnpm install:common-plat

# For production (uses Gitea registry at localhost:3300)
pnpm install:gitea

Backend

cd backend
cp .env.example .env  # Add your credentials
pnpm dev              # Runs on port 4004

Frontend

cd web
cp .env.local.example .env.local  # Add your URLs
pnpm dev              # Next dev server on http://localhost:3000 (no Docker)

Running Both

# From dashboard root
pnpm dev

Environment Variables

Backend (.env)

PORT=4004
PLATFORM_SERVICE_URL=http://localhost:4003
COSMOS_ENDPOINT=https://your-cosmos.documents.azure.com:443/
COSMOS_KEY=your-cosmos-key
COSMOS_DATABASE=bytelyst-platform
JWT_SECRET=your-jwt-secret

Frontend (.env.local)

NEXT_PUBLIC_DEVOPS_API_URL=http://localhost:4004
NEXT_PUBLIC_PLATFORM_URL=http://localhost:4003

Production deployments use https://api.bytelyst.com/devops for NEXT_PUBLIC_DEVOPS_API_URL and https://api.bytelyst.com/platform/api for NEXT_PUBLIC_PLATFORM_URL.

Usage

  1. Seed Services: Click "Seed Services" on the dashboard to register default services
  2. Deploy: Click "Deploy" on any service card to trigger deployment
  3. Monitor: View real-time health status and deployment history
  4. Platform Admin: Click "Platform Admin" link to jump to the admin dashboard
  5. Hermes Mission Control: Visit /hermes for the mock executive command center and the companion routes /hermes/tasks, /hermes/tasks/[id], /hermes/products, /hermes/history, /hermes/agents, and /hermes/settings

Integration with Platform Admin

  • DevOps dashboard links to admin-web at http://localhost:3001
  • Admin-web should have a reciprocal link back to DevOps dashboard
  • Both use platform-service for authentication

API Endpoints

Services

  • GET /api/services - List all services
  • GET /api/services/:id - Get single service
  • POST /api/services - Create service (admin only)
  • PUT /api/services/:id - Update service (admin only)
  • DELETE /api/services/:id - Delete service (admin only)

Deployments

  • GET /api/deployments - Recent deployments (with ?limit= query param)
  • GET /api/deployments/service/:serviceId - Deployments for specific service
  • GET /api/deployments/:id - Single deployment
  • GET /api/deployments/:id/logs - Get captured deployment logs as JSON (web client polls this; no SSE)
  • POST /api/deployments/trigger/:serviceId - Trigger deployment (admin only)

Health

  • GET /api/health - Health of all services
  • GET /api/health/:serviceId - Health of specific service
  • DELETE /api/health/cache - Clear health cache (admin only)

Seed

  • POST /api/seed - Seed default services (admin only)

Development

# Backend typecheck
cd backend && pnpm typecheck

# Frontend typecheck
cd web && pnpm typecheck

# Run tests (watch mode)
pnpm test

# Run tests (CI mode)
pnpm test:run

# Run both
pnpm --filter backend dev & pnpm --filter web dev

Deployment

See DEPLOYMENT.md for detailed deployment instructions.

Deploy as a ByteLyst product:

  • Product ID: devops-internal
  • Backend port: 4004 (host) / 4004 (container)
  • Web port: 3000 (container) — exposed on host as localhost:3049 under Docker Compose; dev mode (pnpm dev) listens directly on localhost:3000. See DEPLOYMENT.md for the full port table.
  • Use existing deployment scripts in parent directory
  • Public API base: https://api.bytelyst.com/devops

Production Features

The dashboard includes comprehensive production-ready features:

  • CI/CD Pipeline: Gitea Actions with build, test, typecheck, lint, E2E tests
  • Security: CSRF protection, rate limiting, CORS, security headers
  • Monitoring: System metrics, Docker management, performance tracking
  • Operations: Database migrations, backup/restore, audit logging
  • Accessibility: ARIA labels, keyboard navigation, skip links
  • PWA: Web app manifest, mobile-friendly
  • Documentation: OpenAPI/Swagger at /docs

See DEPLOYMENT.md for complete deployment guide.