Closes the three Phase 5 P2 follow-ups from the DEPLOYMENT.md
mitigation roadmap that don't need infra changes. Two P2 items remain
(non-root container, docker-proxy daemon) — both genuinely need
container/orchestration work and stay queued.
1. Allow-list shell wrapper (P1)
New `lib/shell.ts`:
- `execAllowed(cmd, args, opts)` — `execFile`-only, no shell, no
interpolation. Single escape hatch for ad-hoc invocations.
- `dockerRestart(name)` — name validated against
`[a-zA-Z0-9][a-zA-Z0-9._-]{0,127}`; throws InvalidShellArgError
on anything else (including non-strings, shell metacharacters,
command-substitution attempts). Tests cover all of these.
- `dockerPrune(kind, {all?})` — kind constrained to
{container,image,volume,builder}; `--all` only valid for image.
- `runBashScript(path, args, {allowedRoots})` — script path AND
cwd both checked against allowed roots; rejects `..` escapes
and prefix-matching siblings (`/opt/projects-evil` vs
`/opt/projects`).
- `runNpmScript(script, {cwd, allowedRoots})` — script ∈
{typecheck,lint,build,test,test:run,start}; cwd inside roots.
17 unit tests cover every rejection path. Module added to the
coverage gate (≥95% lines).
Migrated highest-risk callers off template-literal `exec`:
- `vm/repository.ts:restartContainer` → `dockerRestart`. Was
previously `await execAsync(\`docker restart "${name}"\`)`
with only a regex check; now goes through the wrapper.
- `system/repository.ts:dockerCleanup` → `dockerPrune` per kind
+ `execAllowed` for `docker system df`. Drops the array of
template-literal command strings entirely.
- `code-quality/repository.ts` → `runNpmScript` for every
lifecycle invocation. cwd is now the resolved (normalised,
`..`-collapsed) path, not the raw input.
2. projectPath validation for /code-quality/check (P1)
`runCodeQualityCheck` now calls
`assertPathInAllowedRoots(projectPath, getAllowedRoots())` before
any subprocess spawns. `getAllowedRoots()` reads
`CODE_QUALITY_ALLOWED_ROOTS` (colon-separated env, defaults to
`/opt/bytelyst`). Rejection happens with a clear error message
listing the configured roots so operators know what to allow.
3. Audit-log every privileged shell-out (P2)
`audit/types.ts` extended: `action` now includes `'shell-exec'`,
`entityType` includes `'host'`. The migration is additive — old
audit rows still validate.
Three privileged routes now write a `shell-exec` audit row with
actor (authUserId / authRole), entity id, and a sanitized details
payload before responding:
- `POST /docker/cleanup` — `entityId: docker-cleanup:<type>`,
details include {type, force, freedSpace}.
- `POST /vm/cleanup` — `entityId: vm-cleanup:<mode>`.
- `POST /vm/containers/:name/restart` — `entityId:
container-restart:<name>`, details include {success, message}.
Audited even on failure so attempted privileged actions are
still recorded.
Audit writes are best-effort — a Cosmos hiccup logs a warn but
never fails the request the operator was running.
Verified: backend typecheck ✅, 74/74 unit tests ✅ (17 new for
shell.ts + audit changes), 7/7 E2E ✅, lint 0 errors, coverage gate
≥95% lines on every gated file (which now includes shell.ts).
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
|
||
|---|---|---|
| .. | ||
| .gitea/workflows | ||
| backend | ||
| scripts | ||
| shared | ||
| web | ||
| .gitignore | ||
| .pnpmfile.cjs | ||
| deploy.sh | ||
| DEPLOYMENT_GUIDE.md | ||
| DEPLOYMENT.md | ||
| docker-compose.yml | ||
| ENDPOINTS.md | ||
| package.json | ||
| pnpm-lock.yaml | ||
| pnpm-workspace.yaml | ||
| README.md | ||
| REVIEW_ACTIONS.md | ||
ByteLyst DevOps Dashboard
Internal DevOps dashboard for deployment orchestration and service monitoring across ByteLyst products.
Architecture
dashboard/
├── backend/ # Fastify 5 backend (port 4004)
│ └── src/
│ ├── lib/ # Config, auth, Cosmos
│ └── modules/ # Services, deployments, health
├── web/ # Next.js 16 frontend (port 3000)
│ └── src/
│ ├── app/ # Pages
│ └── lib/ # API client, auth
└── shared/
└── product.json # Product identity
Features
- Service Registry: Manage all ByteLyst services (trading, notes, clock, etc.)
- Deployment Orchestration: Trigger deployments via existing bash scripts
- Health Monitoring: Real-time health checks for all services with caching
- Deployment History: Audit trail of all deployments with captured logs (JSON-polled by the web client; no SSE)
- Cross-Navigation: One-click link to Platform Admin dashboard
- Hermes Mission Control: Read-only mock dashboard for portfolio-wide execution, task ledger, product health, history, agents, and settings
- Testing: Vitest for backend, React Testing Library for frontend
- Security: Rate limiting, CORS, security headers, Zod validation
- Auto-Refresh: Automatic health status updates every 60 seconds
Recent Improvements
Testing Infrastructure
- Added Vitest for backend testing with test files for services and deployments
- Added React Testing Library for frontend with API client tests
- Test scripts:
pnpm test(watch mode),pnpm test:run(CI mode)
Health Monitoring
- Implemented actual HTTP health checks with 10-second timeout
- Added 30-second caching to avoid overwhelming services
- Added User-Agent header for health check requests
- Added admin endpoint to clear health cache (
DELETE /api/health/cache) - Health status determined by response time: >5s = degraded
API Validation
- Added Zod schemas for all API routes (services, deployments, health)
- Proper error handling with BadRequestError from @bytelyst/errors
- Validated path parameters, query parameters, and request bodies
- Strict validation on update operations to prevent accidental field changes
Deployment Logs
- Endpoint
GET /api/deployments/:id/logsreturns the full captured stdout/stderr + current status as a single JSON payload (admin only). - The web client polls this endpoint while a deployment is
running. There is intentionally no SSE/WebSocket stream — the previous attempt withfastify-sse-v2was incompatible with Fastify 5 and was removed. If a real-time stream is needed later, implement it explicitly viareply.rawand update this section in the same change.
Security Enhancements
- Added rate limiting: 100 requests per minute per IP
- Improved CORS with allowed origins whitelist
- Added security headers: X-Content-Type-Options, X-Frame-Options, X-XSS-Protection, HSTS, Referrer-Policy
- OPTIONS preflight request handling
- Credentials support for authenticated requests
Auto-Refresh
- Automatic health status refresh every 60 seconds
- Manual refresh button to clear cache and force health checks
- Visual feedback with spinning icon during refresh
- Last health check timestamp displayed on service cards
Setup
Prerequisites
- Node.js 22+
- pnpm 10.6.5
- Azure Cosmos DB credentials
- Platform Service URL
- Access to @bytelyst/* packages (via common-plat workspace or Gitea registry)
Installation
The dashboard uses the .pnpmfile.cjs pattern for dynamic dependency resolution, supporting both local workspace and Gitea registry modes.
# For local development (uses workspace links to learning_ai_common_plat)
pnpm install:common-plat
# For production (uses Gitea registry at localhost:3300)
pnpm install:gitea
Backend
cd backend
cp .env.example .env # Add your credentials
pnpm dev # Runs on port 4004
Frontend
cd web
cp .env.local.example .env.local # Add your URLs
pnpm dev # Next dev server on http://localhost:3000 (no Docker)
Running Both
# From dashboard root
pnpm dev
Environment Variables
Backend (.env)
PORT=4004
PLATFORM_SERVICE_URL=http://localhost:4003
COSMOS_ENDPOINT=https://your-cosmos.documents.azure.com:443/
COSMOS_KEY=your-cosmos-key
COSMOS_DATABASE=bytelyst-platform
JWT_SECRET=your-jwt-secret
Frontend (.env.local)
NEXT_PUBLIC_DEVOPS_API_URL=http://localhost:4004
NEXT_PUBLIC_PLATFORM_URL=http://localhost:4003
Production deployments use https://api.bytelyst.com/devops for NEXT_PUBLIC_DEVOPS_API_URL and https://api.bytelyst.com/platform/api for NEXT_PUBLIC_PLATFORM_URL.
Usage
- Seed Services: Click "Seed Services" on the dashboard to register default services
- Deploy: Click "Deploy" on any service card to trigger deployment
- Monitor: View real-time health status and deployment history
- Platform Admin: Click "Platform Admin" link to jump to the admin dashboard
- Hermes Mission Control: Visit
/hermesfor the mock executive command center and the companion routes/hermes/tasks,/hermes/tasks/[id],/hermes/products,/hermes/history,/hermes/agents, and/hermes/settings
Integration with Platform Admin
- DevOps dashboard links to admin-web at
http://localhost:3001 - Admin-web should have a reciprocal link back to DevOps dashboard
- Both use platform-service for authentication
API Endpoints
Services
GET /api/services- List all servicesGET /api/services/:id- Get single servicePOST /api/services- Create service (admin only)PUT /api/services/:id- Update service (admin only)DELETE /api/services/:id- Delete service (admin only)
Deployments
GET /api/deployments- Recent deployments (with?limit=query param)GET /api/deployments/service/:serviceId- Deployments for specific serviceGET /api/deployments/:id- Single deploymentGET /api/deployments/:id/logs- Get captured deployment logs as JSON (web client polls this; no SSE)POST /api/deployments/trigger/:serviceId- Trigger deployment (admin only)
Health
GET /api/health- Health of all servicesGET /api/health/:serviceId- Health of specific serviceDELETE /api/health/cache- Clear health cache (admin only)
Seed
POST /api/seed- Seed default services (admin only)
Development
# Backend typecheck
cd backend && pnpm typecheck
# Frontend typecheck
cd web && pnpm typecheck
# Run tests (watch mode)
pnpm test
# Run tests (CI mode)
pnpm test:run
# Run both
pnpm --filter backend dev & pnpm --filter web dev
Deployment
See DEPLOYMENT.md for detailed deployment instructions.
Deploy as a ByteLyst product:
- Product ID:
devops-internal - Backend port: 4004 (host) / 4004 (container)
- Web port: 3000 (container) — exposed on host as
localhost:3049under Docker Compose; dev mode (pnpm dev) listens directly onlocalhost:3000. SeeDEPLOYMENT.mdfor the full port table. - Use existing deployment scripts in parent directory
- Public API base:
https://api.bytelyst.com/devops
Production Features
The dashboard includes comprehensive production-ready features:
- CI/CD Pipeline: Gitea Actions with build, test, typecheck, lint, E2E tests
- Security: CSRF protection, rate limiting, CORS, security headers
- Monitoring: System metrics, Docker management, performance tracking
- Operations: Database migrations, backup/restore, audit logging
- Accessibility: ARIA labels, keyboard navigation, skip links
- PWA: Web app manifest, mobile-friendly
- Documentation: OpenAPI/Swagger at
/docs
See DEPLOYMENT.md for complete deployment guide.