bytelyst-devops-tools/dashboard/backend/src/modules/system/routes.ts
Hermes VM 74a8ee0993 feat(dashboard): close 3 of 5 Phase 5 P2 mitigation items (allow-list, projectPath, audit-log)
Closes the three Phase 5 P2 follow-ups from the DEPLOYMENT.md
mitigation roadmap that don't need infra changes. Two P2 items remain
(non-root container, docker-proxy daemon) — both genuinely need
container/orchestration work and stay queued.

1. Allow-list shell wrapper (P1)
   New `lib/shell.ts`:
     - `execAllowed(cmd, args, opts)` — `execFile`-only, no shell, no
       interpolation. Single escape hatch for ad-hoc invocations.
     - `dockerRestart(name)` — name validated against
       `[a-zA-Z0-9][a-zA-Z0-9._-]{0,127}`; throws InvalidShellArgError
       on anything else (including non-strings, shell metacharacters,
       command-substitution attempts). Tests cover all of these.
     - `dockerPrune(kind, {all?})` — kind constrained to
       {container,image,volume,builder}; `--all` only valid for image.
     - `runBashScript(path, args, {allowedRoots})` — script path AND
       cwd both checked against allowed roots; rejects `..` escapes
       and prefix-matching siblings (`/opt/projects-evil` vs
       `/opt/projects`).
     - `runNpmScript(script, {cwd, allowedRoots})` — script ∈
       {typecheck,lint,build,test,test:run,start}; cwd inside roots.
   17 unit tests cover every rejection path. Module added to the
   coverage gate (≥95% lines).

   Migrated highest-risk callers off template-literal `exec`:
     - `vm/repository.ts:restartContainer` → `dockerRestart`. Was
       previously `await execAsync(\`docker restart "${name}"\`)`
       with only a regex check; now goes through the wrapper.
     - `system/repository.ts:dockerCleanup` → `dockerPrune` per kind
       + `execAllowed` for `docker system df`. Drops the array of
       template-literal command strings entirely.
     - `code-quality/repository.ts` → `runNpmScript` for every
       lifecycle invocation. cwd is now the resolved (normalised,
       `..`-collapsed) path, not the raw input.

2. projectPath validation for /code-quality/check (P1)
   `runCodeQualityCheck` now calls
   `assertPathInAllowedRoots(projectPath, getAllowedRoots())` before
   any subprocess spawns. `getAllowedRoots()` reads
   `CODE_QUALITY_ALLOWED_ROOTS` (colon-separated env, defaults to
   `/opt/bytelyst`). Rejection happens with a clear error message
   listing the configured roots so operators know what to allow.

3. Audit-log every privileged shell-out (P2)
   `audit/types.ts` extended: `action` now includes `'shell-exec'`,
   `entityType` includes `'host'`. The migration is additive — old
   audit rows still validate.

   Three privileged routes now write a `shell-exec` audit row with
   actor (authUserId / authRole), entity id, and a sanitized details
   payload before responding:
     - `POST /docker/cleanup` — `entityId: docker-cleanup:<type>`,
       details include {type, force, freedSpace}.
     - `POST /vm/cleanup` — `entityId: vm-cleanup:<mode>`.
     - `POST /vm/containers/:name/restart` — `entityId:
       container-restart:<name>`, details include {success, message}.
       Audited even on failure so attempted privileged actions are
       still recorded.
   Audit writes are best-effort — a Cosmos hiccup logs a warn but
   never fails the request the operator was running.

Verified: backend typecheck , 74/74 unit tests  (17 new for
shell.ts + audit changes), 7/7 E2E , lint 0 errors, coverage gate
≥95% lines on every gated file (which now includes shell.ts).

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2026-05-30 08:18:50 +00:00

65 lines
2.5 KiB
TypeScript

import type { FastifyInstance } from 'fastify';
import { getSystemMetrics, getDockerStats, dockerCleanup } from './repository.js';
import { DockerCleanupParamsSchema } from './types.js';
import { requireAdmin } from '../../lib/auth.js';
import { createAuditLog } from '../audit/repository.js';
import { productId } from '../../lib/config.js';
export async function systemRoutes(fastify: FastifyInstance) {
// Get system metrics (admin only)
fastify.get('/system/metrics', {
preHandler: async (req) => requireAdmin(req),
}, async (req, reply) => {
try {
const metrics = await getSystemMetrics();
return reply.send(metrics);
} catch (error) {
fastify.log.error(error, 'Failed to get system metrics');
return reply.code(500).send({ error: 'Failed to get system metrics' });
}
});
// Get Docker stats (admin only)
fastify.get('/docker/stats', {
preHandler: async (req) => requireAdmin(req),
}, async (req, reply) => {
try {
const stats = await getDockerStats();
return reply.send(stats);
} catch (error) {
fastify.log.error(error, 'Failed to get Docker stats');
return reply.code(500).send({ error: 'Failed to get Docker stats' });
}
});
// Docker cleanup (admin only). Privileged shell-out → audit-logged.
fastify.post('/docker/cleanup', {
preHandler: async (req) => requireAdmin(req),
}, async (req, reply) => {
try {
const params = DockerCleanupParamsSchema.parse(req.body);
const result = await dockerCleanup(params.type, params.force);
// Phase 5 P2 mitigation: every privileged shell-out lands in the audit
// log so a leaked admin token's actions are reconstructable from
// Cosmos, not only from container stdout. Best-effort — don't fail
// the request if audit write breaks.
const authReq = req as unknown as { authUserId?: string; authRole?: string };
await createAuditLog({
action: 'shell-exec',
entityType: 'host',
entityId: `docker-cleanup:${params.type}`,
userId: authReq.authUserId ?? 'unknown',
role: authReq.authRole ?? 'unknown',
productId,
details: { type: params.type, force: params.force, freedSpace: result.freedSpace },
}).catch((err) => fastify.log.warn({ err }, 'audit log write failed'));
return reply.send(result);
} catch (error: any) {
fastify.log.error(error, 'Docker cleanup failed');
return reply.code(500).send({ error: error.message || 'Docker cleanup failed' });
}
});
}