Closes the three Phase 5 P2 follow-ups from the DEPLOYMENT.md
mitigation roadmap that don't need infra changes. Two P2 items remain
(non-root container, docker-proxy daemon) — both genuinely need
container/orchestration work and stay queued.
1. Allow-list shell wrapper (P1)
New `lib/shell.ts`:
- `execAllowed(cmd, args, opts)` — `execFile`-only, no shell, no
interpolation. Single escape hatch for ad-hoc invocations.
- `dockerRestart(name)` — name validated against
`[a-zA-Z0-9][a-zA-Z0-9._-]{0,127}`; throws InvalidShellArgError
on anything else (including non-strings, shell metacharacters,
command-substitution attempts). Tests cover all of these.
- `dockerPrune(kind, {all?})` — kind constrained to
{container,image,volume,builder}; `--all` only valid for image.
- `runBashScript(path, args, {allowedRoots})` — script path AND
cwd both checked against allowed roots; rejects `..` escapes
and prefix-matching siblings (`/opt/projects-evil` vs
`/opt/projects`).
- `runNpmScript(script, {cwd, allowedRoots})` — script ∈
{typecheck,lint,build,test,test:run,start}; cwd inside roots.
17 unit tests cover every rejection path. Module added to the
coverage gate (≥95% lines).
Migrated highest-risk callers off template-literal `exec`:
- `vm/repository.ts:restartContainer` → `dockerRestart`. Was
previously `await execAsync(\`docker restart "${name}"\`)`
with only a regex check; now goes through the wrapper.
- `system/repository.ts:dockerCleanup` → `dockerPrune` per kind
+ `execAllowed` for `docker system df`. Drops the array of
template-literal command strings entirely.
- `code-quality/repository.ts` → `runNpmScript` for every
lifecycle invocation. cwd is now the resolved (normalised,
`..`-collapsed) path, not the raw input.
2. projectPath validation for /code-quality/check (P1)
`runCodeQualityCheck` now calls
`assertPathInAllowedRoots(projectPath, getAllowedRoots())` before
any subprocess spawns. `getAllowedRoots()` reads
`CODE_QUALITY_ALLOWED_ROOTS` (colon-separated env, defaults to
`/opt/bytelyst`). Rejection happens with a clear error message
listing the configured roots so operators know what to allow.
3. Audit-log every privileged shell-out (P2)
`audit/types.ts` extended: `action` now includes `'shell-exec'`,
`entityType` includes `'host'`. The migration is additive — old
audit rows still validate.
Three privileged routes now write a `shell-exec` audit row with
actor (authUserId / authRole), entity id, and a sanitized details
payload before responding:
- `POST /docker/cleanup` — `entityId: docker-cleanup:<type>`,
details include {type, force, freedSpace}.
- `POST /vm/cleanup` — `entityId: vm-cleanup:<mode>`.
- `POST /vm/containers/:name/restart` — `entityId:
container-restart:<name>`, details include {success, message}.
Audited even on failure so attempted privileged actions are
still recorded.
Audit writes are best-effort — a Cosmos hiccup logs a warn but
never fails the request the operator was running.
Verified: backend typecheck ✅, 74/74 unit tests ✅ (17 new for
shell.ts + audit changes), 7/7 E2E ✅, lint 0 errors, coverage gate
≥95% lines on every gated file (which now includes shell.ts).
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
65 lines
2.5 KiB
TypeScript
65 lines
2.5 KiB
TypeScript
import type { FastifyInstance } from 'fastify';
|
|
import { getSystemMetrics, getDockerStats, dockerCleanup } from './repository.js';
|
|
import { DockerCleanupParamsSchema } from './types.js';
|
|
import { requireAdmin } from '../../lib/auth.js';
|
|
import { createAuditLog } from '../audit/repository.js';
|
|
import { productId } from '../../lib/config.js';
|
|
|
|
export async function systemRoutes(fastify: FastifyInstance) {
|
|
// Get system metrics (admin only)
|
|
fastify.get('/system/metrics', {
|
|
preHandler: async (req) => requireAdmin(req),
|
|
}, async (req, reply) => {
|
|
try {
|
|
const metrics = await getSystemMetrics();
|
|
return reply.send(metrics);
|
|
} catch (error) {
|
|
fastify.log.error(error, 'Failed to get system metrics');
|
|
return reply.code(500).send({ error: 'Failed to get system metrics' });
|
|
}
|
|
});
|
|
|
|
// Get Docker stats (admin only)
|
|
fastify.get('/docker/stats', {
|
|
preHandler: async (req) => requireAdmin(req),
|
|
}, async (req, reply) => {
|
|
try {
|
|
const stats = await getDockerStats();
|
|
return reply.send(stats);
|
|
} catch (error) {
|
|
fastify.log.error(error, 'Failed to get Docker stats');
|
|
return reply.code(500).send({ error: 'Failed to get Docker stats' });
|
|
}
|
|
});
|
|
|
|
// Docker cleanup (admin only). Privileged shell-out → audit-logged.
|
|
fastify.post('/docker/cleanup', {
|
|
preHandler: async (req) => requireAdmin(req),
|
|
}, async (req, reply) => {
|
|
try {
|
|
const params = DockerCleanupParamsSchema.parse(req.body);
|
|
const result = await dockerCleanup(params.type, params.force);
|
|
|
|
// Phase 5 P2 mitigation: every privileged shell-out lands in the audit
|
|
// log so a leaked admin token's actions are reconstructable from
|
|
// Cosmos, not only from container stdout. Best-effort — don't fail
|
|
// the request if audit write breaks.
|
|
const authReq = req as unknown as { authUserId?: string; authRole?: string };
|
|
await createAuditLog({
|
|
action: 'shell-exec',
|
|
entityType: 'host',
|
|
entityId: `docker-cleanup:${params.type}`,
|
|
userId: authReq.authUserId ?? 'unknown',
|
|
role: authReq.authRole ?? 'unknown',
|
|
productId,
|
|
details: { type: params.type, force: params.force, freedSpace: result.freedSpace },
|
|
}).catch((err) => fastify.log.warn({ err }, 'audit log write failed'));
|
|
|
|
return reply.send(result);
|
|
} catch (error: any) {
|
|
fastify.log.error(error, 'Docker cleanup failed');
|
|
return reply.code(500).send({ error: error.message || 'Docker cleanup failed' });
|
|
}
|
|
});
|
|
}
|