Closes the three Phase 5 P2 follow-ups from the DEPLOYMENT.md
mitigation roadmap that don't need infra changes. Two P2 items remain
(non-root container, docker-proxy daemon) — both genuinely need
container/orchestration work and stay queued.
1. Allow-list shell wrapper (P1)
New `lib/shell.ts`:
- `execAllowed(cmd, args, opts)` — `execFile`-only, no shell, no
interpolation. Single escape hatch for ad-hoc invocations.
- `dockerRestart(name)` — name validated against
`[a-zA-Z0-9][a-zA-Z0-9._-]{0,127}`; throws InvalidShellArgError
on anything else (including non-strings, shell metacharacters,
command-substitution attempts). Tests cover all of these.
- `dockerPrune(kind, {all?})` — kind constrained to
{container,image,volume,builder}; `--all` only valid for image.
- `runBashScript(path, args, {allowedRoots})` — script path AND
cwd both checked against allowed roots; rejects `..` escapes
and prefix-matching siblings (`/opt/projects-evil` vs
`/opt/projects`).
- `runNpmScript(script, {cwd, allowedRoots})` — script ∈
{typecheck,lint,build,test,test:run,start}; cwd inside roots.
17 unit tests cover every rejection path. Module added to the
coverage gate (≥95% lines).
Migrated highest-risk callers off template-literal `exec`:
- `vm/repository.ts:restartContainer` → `dockerRestart`. Was
previously `await execAsync(\`docker restart "${name}"\`)`
with only a regex check; now goes through the wrapper.
- `system/repository.ts:dockerCleanup` → `dockerPrune` per kind
+ `execAllowed` for `docker system df`. Drops the array of
template-literal command strings entirely.
- `code-quality/repository.ts` → `runNpmScript` for every
lifecycle invocation. cwd is now the resolved (normalised,
`..`-collapsed) path, not the raw input.
2. projectPath validation for /code-quality/check (P1)
`runCodeQualityCheck` now calls
`assertPathInAllowedRoots(projectPath, getAllowedRoots())` before
any subprocess spawns. `getAllowedRoots()` reads
`CODE_QUALITY_ALLOWED_ROOTS` (colon-separated env, defaults to
`/opt/bytelyst`). Rejection happens with a clear error message
listing the configured roots so operators know what to allow.
3. Audit-log every privileged shell-out (P2)
`audit/types.ts` extended: `action` now includes `'shell-exec'`,
`entityType` includes `'host'`. The migration is additive — old
audit rows still validate.
Three privileged routes now write a `shell-exec` audit row with
actor (authUserId / authRole), entity id, and a sanitized details
payload before responding:
- `POST /docker/cleanup` — `entityId: docker-cleanup:<type>`,
details include {type, force, freedSpace}.
- `POST /vm/cleanup` — `entityId: vm-cleanup:<mode>`.
- `POST /vm/containers/:name/restart` — `entityId:
container-restart:<name>`, details include {success, message}.
Audited even on failure so attempted privileged actions are
still recorded.
Audit writes are best-effort — a Cosmos hiccup logs a warn but
never fails the request the operator was running.
Verified: backend typecheck ✅, 74/74 unit tests ✅ (17 new for
shell.ts + audit changes), 7/7 E2E ✅, lint 0 errors, coverage gate
≥95% lines on every gated file (which now includes shell.ts).
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
507 lines
22 KiB
Markdown
507 lines
22 KiB
Markdown
# DevOps & Admin Dashboard Deployment Guide
|
|
|
|
> Canonical deployment doc for `dashboard/`. The previous `DEPLOYMENT_GUIDE.md`
|
|
> has been folded into this file; it remains as a one-line redirect for
|
|
> backwards compatibility with `deploy.sh` and external links.
|
|
|
|
## Overview
|
|
|
|
This guide covers deploying both the DevOps Dashboard and Platform Admin Dashboard using the existing Traefik gateway infrastructure, following the same pattern as the trading dashboard (https://invttrdg.bytelyst.com).
|
|
|
|
## Public URLs
|
|
|
|
- **DevOps Dashboard**: `https://devops.bytelyst.com`
|
|
- **Admin Dashboard**: `https://admin.bytelyst.com`
|
|
- **API Gateway**: `https://api.bytelyst.com`
|
|
- Platform API: `https://api.bytelyst.com/platform/api`
|
|
- DevOps API: `https://api.bytelyst.com/api/devops`
|
|
|
|
## Ports — quick reference
|
|
|
|
The web container always listens on **3000** internally; what changes is what
|
|
the host exposes. Memorize the column for the deployment mode you're in:
|
|
|
|
| Mode | Web (host) | Backend (host) | Notes |
|
|
|-------------------------------------|--------------------|-------------------|--------------------------------------------------------------------|
|
|
| Local dev (`pnpm dev`) | `localhost:3000` | `localhost:4004` | Next listens directly on 3000. |
|
|
| Docker Compose (this repo) | `localhost:3049` | `localhost:4004` | `docker-compose.yml` maps `127.0.0.1:3049:3000` (loopback only). |
|
|
| Production (Traefik) | `https://devops.bytelyst.com` | `https://api.bytelyst.com/api/devops` | Traefik label `loadbalancer.server.port=3000` targets the container port. |
|
|
|
|
Whenever a doc says "the dashboard runs on port 3000", it means the **container
|
|
port** seen by Traefik / Next dev mode — not the host port for the deployed
|
|
stack. Use the table above instead of relying on prose.
|
|
|
|
## Architecture
|
|
|
|
```
|
|
Internet → Traefik Gateway → Services
|
|
├─ DevOps Web (container :3000, host :3049)
|
|
├─ DevOps Backend (:4004)
|
|
├─ Admin Web (:3001)
|
|
├─ Platform Service (:4003)
|
|
└─ Trading Dashboard (:3085)
|
|
```
|
|
|
|
- **Traefik**: API gateway and reverse proxy.
|
|
- **Docker network**: All services connect via `learning_ai_common_plat_default`.
|
|
- **Domain routing**: Traefik routes by host header.
|
|
- **SSL/TLS**: Managed by Traefik with Let's Encrypt.
|
|
|
|
## Prerequisites
|
|
|
|
1. Platform stack running with Traefik gateway.
|
|
2. Docker and Docker Compose installed.
|
|
3. Domain names configured with DNS pointing to your server.
|
|
4. Azure Cosmos DB account (shared with platform-service).
|
|
5. Platform Service running and accessible.
|
|
|
|
## Quick Start
|
|
|
|
### 1. Start the platform stack (if not running)
|
|
|
|
```bash
|
|
cd /opt/bytelyst/learning_ai_common_plat
|
|
docker-compose up -d
|
|
```
|
|
|
|
### 2. Deploy the dashboards
|
|
|
|
```bash
|
|
cd /opt/bytelyst/learning_ai_devops_tools/dashboard
|
|
./deploy.sh
|
|
```
|
|
|
|
This will:
|
|
- Deploy the DevOps Dashboard (backend + web)
|
|
- Deploy the Admin Dashboard via the platform stack
|
|
- Run health checks
|
|
- Print deployment information
|
|
|
|
## Local development
|
|
|
|
If you only need a non-containerized iteration loop (no Traefik, no Docker):
|
|
|
|
```bash
|
|
cd /opt/bytelyst/learning_ai_devops_tools/dashboard
|
|
|
|
# Resolve workspace deps
|
|
pnpm install:common-plat # uses sibling learning_ai_common_plat checkout
|
|
# or
|
|
pnpm install:gitea # uses local Gitea registry at localhost:3300
|
|
|
|
pnpm dev # backend on 4004, web on 3000 (NOT 3049)
|
|
```
|
|
|
|
Required env vars are documented under **Environment Configuration** below; for
|
|
local dev a minimal `.env` with `JWT_SECRET`, `COSMOS_*`, and
|
|
`PLATFORM_SERVICE_URL` is enough.
|
|
|
|
## Manual Docker deployment
|
|
|
|
### Deploy DevOps Dashboard
|
|
|
|
```bash
|
|
cd /opt/bytelyst/learning_ai_devops_tools/dashboard
|
|
docker-compose up -d --build
|
|
```
|
|
|
|
### Deploy Admin Dashboard
|
|
|
|
```bash
|
|
cd /opt/bytelyst/learning_ai_common_plat
|
|
docker-compose up -d admin-web
|
|
```
|
|
|
|
## Environment Configuration
|
|
|
|
### DevOps Dashboard (`.env`)
|
|
|
|
```bash
|
|
# Backend
|
|
PORT=4004
|
|
PLATFORM_SERVICE_URL=http://platform-service:4003
|
|
COSMOS_ENDPOINT=https://your-cosmos-account.documents.azure.com:443/
|
|
COSMOS_KEY=your-cosmos-primary-key
|
|
COSMOS_DATABASE=bytelyst-platform
|
|
JWT_SECRET=your-production-jwt-secret
|
|
CSRF_SECRET=your-production-csrf-secret
|
|
ENCRYPTION_KEY=your-production-encryption-key
|
|
PRODUCT_ID=bytelyst-devops
|
|
PRODUCT_NAME=ByteLyst DevOps Dashboard
|
|
|
|
# Azure Key Vault (optional)
|
|
AZURE_TENANT_ID=your-tenant-id
|
|
AZURE_CLIENT_ID=your-client-id
|
|
AZURE_CLIENT_SECRET=your-client-secret
|
|
AZURE_KEY_VAULT_URL=https://your-keyvault.vault.azure.net/
|
|
|
|
# Frontend
|
|
NEXT_PUBLIC_DEVOPS_API_URL=https://api.bytelyst.com/devops
|
|
NEXT_PUBLIC_PLATFORM_URL=https://api.bytelyst.com/platform/api
|
|
NEXT_PUBLIC_ADMIN_WEB_URL=https://admin.bytelyst.com
|
|
NEXT_PUBLIC_PRODUCT_ID=bytelyst-devops
|
|
NEXT_PUBLIC_PRODUCT_NAME=ByteLyst DevOps Dashboard
|
|
```
|
|
|
|
### Platform Dashboard (`.env`)
|
|
|
|
Add to your platform `.env`:
|
|
|
|
```bash
|
|
# Admin Web Dashboard
|
|
NEXT_PUBLIC_PLATFORM_URL=https://api.bytelyst.com/platform/api
|
|
NEXT_PUBLIC_DEVOPS_WEB_URL=https://devops.bytelyst.com
|
|
```
|
|
|
|
## Traefik Configuration
|
|
|
|
Both dashboards use Traefik labels for routing.
|
|
|
|
### DevOps Web
|
|
|
|
```yaml
|
|
labels:
|
|
- 'traefik.enable=true'
|
|
- 'traefik.http.routers.devops-web.rule=Host(`devops.bytelyst.com`)'
|
|
- 'traefik.http.services.devops-web.loadbalancer.server.port=3000' # container port
|
|
```
|
|
|
|
### DevOps Backend API
|
|
|
|
```yaml
|
|
labels:
|
|
- 'traefik.enable=true'
|
|
- 'traefik.http.routers.devops-api.rule=PathPrefix(`/api/devops`)'
|
|
- 'traefik.http.services.devops-api.loadbalancer.server.port=4004'
|
|
```
|
|
|
|
### Admin Web
|
|
|
|
```yaml
|
|
labels:
|
|
- 'traefik.enable=true'
|
|
- 'traefik.http.routers.admin-web.rule=Host(`admin.bytelyst.com`)'
|
|
- 'traefik.http.services.admin-web.loadbalancer.server.port=3001'
|
|
```
|
|
|
|
## DNS Configuration
|
|
|
|
Add DNS records pointing to your Traefik gateway server:
|
|
|
|
```
|
|
devops.bytelyst.com A <your-server-ip>
|
|
admin.bytelyst.com A <your-server-ip>
|
|
api.bytelyst.com A <your-server-ip>
|
|
```
|
|
|
|
## SSL/TLS Configuration
|
|
|
|
Traefik can automatically handle SSL certificates with Let's Encrypt:
|
|
|
|
```yaml
|
|
command:
|
|
- '--certificatesresolvers.myresolver.acme.tlschallenge=true'
|
|
- '--certificatesresolvers.myresolver.acme.email=admin@bytelyst.com'
|
|
- '--certificatesresolvers.myresolver.acme.storage=/letsencrypt/acme.json'
|
|
```
|
|
|
|
Then update router labels:
|
|
|
|
```yaml
|
|
labels:
|
|
- 'traefik.http.routers.devops-web.tls=true'
|
|
- 'traefik.http.routers.devops-web.tls.certresolver=myresolver'
|
|
```
|
|
|
|
## Cross-Navigation
|
|
|
|
### DevOps Dashboard → Admin Dashboard
|
|
- Header includes a "Platform Admin" link with Shield icon.
|
|
- Opens admin dashboard in a new tab.
|
|
- Uses `NEXT_PUBLIC_ADMIN_WEB_URL`.
|
|
|
|
### Admin Dashboard → DevOps Dashboard
|
|
- Sidebar includes a "DevOps Dashboard" link with Server icon.
|
|
- Opens devops dashboard in a new tab.
|
|
- Uses `NEXT_PUBLIC_DEVOPS_WEB_URL`.
|
|
|
|
## Shared Authentication
|
|
|
|
1. **Platform Service Auth**: Both authenticate against platform-service.
|
|
2. **JWT Tokens**: Same `JWT_SECRET` validates tokens across services.
|
|
3. **Per-Product Access**: Admin access is checked per-product via membership roles.
|
|
4. **Single Sign-On**: Users stay logged in across both dashboards.
|
|
|
|
### Granting Access
|
|
|
|
To grant a user access to both dashboards:
|
|
|
|
1. Ensure user exists in platform-service.
|
|
2. Add admin membership for both products:
|
|
|
|
```json
|
|
{
|
|
"memberships": [
|
|
{ "productId": "bytelyst-devops", "role": "admin", "plan": "pro" },
|
|
{ "productId": "bytelyst-platform", "role": "admin", "plan": "pro" }
|
|
]
|
|
}
|
|
```
|
|
|
|
## Health Checks
|
|
|
|
- DevOps Backend: `http://localhost:4004/health`
|
|
- DevOps Web: `http://localhost:3049` (Docker Compose host port; container :3000)
|
|
- Admin Web: `http://localhost:3001`
|
|
- Traefik Dashboard: `http://localhost:8080`
|
|
|
|
## Troubleshooting
|
|
|
|
### Network issues
|
|
```bash
|
|
# Check if the platform network exists
|
|
docker network inspect learning_ai_common_plat_default
|
|
|
|
# Check container connectivity
|
|
docker network inspect learning_ai_common_plat_default | grep devops
|
|
```
|
|
|
|
### Traefik routing
|
|
```bash
|
|
# Traefik dashboard
|
|
http://localhost:8080
|
|
|
|
# Traefik logs
|
|
docker logs $(docker ps -q -f name=gateway)
|
|
|
|
# Router config for the devops web container
|
|
docker inspect devops-web | grep -A 10 Labels
|
|
```
|
|
|
|
### Authentication failures
|
|
- Verify `JWT_SECRET` matches across all services.
|
|
- Check platform-service is accessible: `curl http://localhost:4003/health`.
|
|
- Ensure the user has the right product memberships.
|
|
|
|
### Service not starting
|
|
```bash
|
|
docker logs devops-backend
|
|
docker logs devops-web
|
|
docker logs admin-web
|
|
docker ps
|
|
docker inspect devops-backend | grep -A 5 Health
|
|
```
|
|
|
|
### Workspace dependency errors
|
|
```bash
|
|
pnpm install:common-plat # local sibling checkout
|
|
pnpm install:gitea # local Gitea registry
|
|
```
|
|
|
|
## Service Management
|
|
|
|
### Stop services
|
|
```bash
|
|
cd /opt/bytelyst/learning_ai_devops_tools/dashboard
|
|
docker-compose down
|
|
|
|
cd /opt/bytelyst/learning_ai_common_plat
|
|
docker-compose stop admin-web
|
|
```
|
|
|
|
### Restart services
|
|
```bash
|
|
cd /opt/bytelyst/learning_ai_devops_tools/dashboard
|
|
docker-compose restart
|
|
|
|
cd /opt/bytelyst/learning_ai_common_plat
|
|
docker-compose restart admin-web
|
|
```
|
|
|
|
### View logs
|
|
```bash
|
|
# DevOps
|
|
docker logs -f devops-backend
|
|
docker logs -f devops-web
|
|
|
|
# Admin
|
|
docker logs -f admin-web
|
|
|
|
# Traefik
|
|
docker logs -f gateway
|
|
```
|
|
|
|
## Comparison with Trading Dashboard
|
|
|
|
| Feature | Trading | DevOps | Admin |
|
|
|--------------|----------------------|-------------------------|------------------------|
|
|
| Domain | invttrdg.bytelyst.com| devops.bytelyst.com | admin.bytelyst.com |
|
|
| Web Port | 3085 (host) | 3049 (host) / 3000 (ctr)| 3001 (host) |
|
|
| Backend Port | 4018 | 4004 | N/A |
|
|
| Network | platform_net | platform_net | default |
|
|
| Traefik | Yes | Yes | Yes |
|
|
| Auth | Platform | Platform | Platform |
|
|
|
|
## Privilege Surface (Docker socket + host mounts)
|
|
|
|
The `devops-backend` container has root-equivalent access to the host. This
|
|
section documents exactly what is mounted, which routes use each mount, and
|
|
what the blast radius looks like if an admin token leaks. It exists so reviewers
|
|
don't have to reverse-engineer this from `docker-compose.yml` and the route
|
|
handlers — and so any future change to the mount set is reviewed against this
|
|
list rather than slipped in.
|
|
|
|
### Mounts (from `docker-compose.yml`)
|
|
|
|
| Host path | Container path | Mode | Purpose |
|
|
|------------------------------------|-----------------------------------|------|-------------------------------------------------------------------------|
|
|
| `/var/run/docker.sock` | `/var/run/docker.sock` | rw | Allows `docker` CLI inside the container to control the host daemon. Used by the `system` and `vm` modules. **Equivalent to root on the host.** |
|
|
| `/opt/bytelyst/learning_ai_devops_tools/scripts` | `/vm-scripts` | ro | Bash scripts the `vm` module shells out to (`HostingerVM/*.sh`). Read-only mount; the container cannot modify the script set. |
|
|
| `/var/log/vm-cleanup.log` | `/host-logs/vm-cleanup.log` | rw | The `vm` cleanup script appends here; backend reads it via `/api/vm/cleanup-log`. |
|
|
| `/var/log/vm-health-check.log` | `/host-logs/vm-health-check.log` | rw | Health-check probe output; backend reads it via `/api/vm/health`. |
|
|
| `/var/log/docker-watchdog.log` | `/host-logs/docker-watchdog.log` | rw | Watchdog tail used by the VM panel. |
|
|
| `extra_hosts: host-gateway` | `host.docker.internal`-equivalent | — | Lets the container reach `host:11434` (Ollama) and other host-only services. Not a filesystem mount, but a privilege-relevant capability — the container can talk to anything bound to `127.0.0.1` on the host. |
|
|
|
|
The container's listening port (`4004`) is bound to `127.0.0.1` only, so the
|
|
API is **not** exposed to the public internet by this compose file — access is
|
|
expected via Tailscale or an SSH tunnel. Any reverse proxy in front of it
|
|
(Traefik in production) is responsible for its own auth + TLS.
|
|
|
|
### What shells out + which routes (auth column = effective gate)
|
|
|
|
| Route | Handler module | What it executes | Auth |
|
|
|--------------------------------------------------|-------------------------------|-----------------------------------------------------------------------------------|-------------|
|
|
| `GET /system/metrics` | `system/repository.ts` | `df -h ...` | `requireAdmin` |
|
|
| `GET /docker/stats` | `system/repository.ts` | `docker images / ps / volume ls / system df` (read-only) | `requireAdmin` |
|
|
| `POST /docker/cleanup` | `system/repository.ts` | `docker container prune -f`, `docker image prune -a -f`, `docker volume prune -f`, `docker builder prune -f` (a fixed allow-list — request body picks one of the four "types") | `requireAdmin` |
|
|
| `GET /vm/health` | `vm/repository.ts` | `bash $VM_SCRIPTS_PATH/vm-health-check.sh --json` | `requireAdmin` |
|
|
| `GET /vm/cleanup-log` | `vm/repository.ts` | reads `/host-logs/vm-cleanup.log` | `requireAdmin` |
|
|
| `GET /vm/cron-status` | `vm/repository.ts` | `crontab -l` | `requireAdmin` |
|
|
| `POST /vm/cleanup` | `vm/repository.ts` | `bash $VM_SCRIPTS_PATH/vm-cleanup.sh` | `requireAdmin` |
|
|
| `GET /vm/containers`, `.../unhealthy`, `.../:name/logs` | `vm/repository.ts` | `docker ps`, `docker inspect`, `docker stats`, `docker logs` | `requireAdmin` |
|
|
| `POST /vm/containers/:name/restart` | `vm/repository.ts` | `docker restart "<name>"` (name is a path param — see "Known sharp edges" below) | `requireAdmin` |
|
|
| `GET /vm/ollama/models`, `DELETE /vm/ollama/models/:name` | `vm/repository.ts` | HTTP-only (talks to host Ollama via `host-gateway`). No shell-out. | `requireAdmin` |
|
|
| `POST /code-quality/check` | `code-quality/repository.ts` | `npm run typecheck`, `npm run lint`, `npm run build`, `npm run test:run` in the request-supplied `projectPath`. | `requireAdmin` *(added concurrently with this doc; previously unauthenticated — see the Phase 5 P1 commit)* |
|
|
| `POST /deployments/trigger/:serviceId` | `deployments/orchestrator.ts` | `bash <service.scriptPath>` from the registered service registry (paths are stored at create-time, not request-time). | `requireAdmin` |
|
|
| `/hermes/ops` (snapshot) | `hermes-ops/repository.ts` | Read-only probes: `systemctl is-active/is-enabled`, `git status`, `du -sh`, `ps`, `tailscale ip`, `runuser -u uma -- systemctl --user ...`. No state-changing commands. | `requireAdmin` *(Phase 7 — private-only)* |
|
|
| `/hermes/telemetry/:instance` | `hermes-telemetry/repository.ts` | Read-only: `runuser -u <user> -- hermes sessions/cron/memory/skills list --json`, `git -C <backup-repo> log`, tail of the watchdog log. No state-changing commands. | `requireAdmin` |
|
|
|
|
### Blast radius if an admin token is leaked
|
|
|
|
Anyone holding a valid admin JWT for this product can, today:
|
|
|
|
- Run any of the four pre-defined `docker prune` commands (data loss for
|
|
containers/images/volumes), restart any container, read any container's logs.
|
|
- Trigger the host VM cleanup script and crontab listing.
|
|
- Trigger any deployment script registered in the service registry.
|
|
- Run `npm run` lifecycle scripts in any directory the container can read
|
|
(since `code-quality/check` takes a caller-supplied `projectPath`).
|
|
- Read the three host logs that are mounted in.
|
|
|
|
In other words, an admin token is **equivalent to a host shell**, modulo the
|
|
specific commands the codebase chooses to wrap. There is currently **no
|
|
allow-list wrapper** between the backend and the docker socket; the backend
|
|
constructs `docker ...` shell strings directly with `execAsync`.
|
|
|
|
### Known sharp edges (track and shrink)
|
|
|
|
1. **Container name is interpolated into a shell string.** `docker restart
|
|
"${name}"` and similar paths in `vm/repository.ts` use `execAsync` with a
|
|
template literal. The `:name` path parameter is admin-only but is not
|
|
validated against a `^[a-zA-Z0-9._-]+$` allow-list. Lock this down before
|
|
exposing the dashboard to a wider admin pool.
|
|
2. **`projectPath` for `/code-quality/check` is unvalidated.** The handler
|
|
passes the caller-supplied path straight into `execAsync({ cwd })`. Even
|
|
with `requireAdmin` added, this should be constrained to a known set of
|
|
project roots (or rejected if it escapes the workspace).
|
|
3. **No per-route audit-log on shell-outs.** `audit/repository.ts` records
|
|
deployment triggers but not `/docker/cleanup` or `/vm/cleanup`. A leaked
|
|
token's actions are reconstructable only from container stdout + host logs.
|
|
4. **The container runs as root.** Both the backend `Dockerfile` and the bind-
|
|
mounts assume root. A non-root user with `docker` group membership would
|
|
shrink the in-container blast radius without losing functionality (the
|
|
socket is still root on the host); revisit when ready.
|
|
5. **`fastify-rate-limit` is global, not per-route.** A leaked admin token
|
|
currently isn't slowed down on the destructive endpoints any more than it
|
|
is on read-only ones.
|
|
|
|
### Mitigation roadmap (incremental, not all at once)
|
|
|
|
- [x] **P1:** Allow-list wrapper around shell-outs. *(`lib/shell.ts` ships with
|
|
`execAllowed` (no shell, just `execFile` with an explicit argv) plus
|
|
per-command helpers — `dockerRestart(name)` validates against
|
|
`[a-zA-Z0-9][a-zA-Z0-9._-]{0,127}`, `dockerPrune(kind, {all?})` validates
|
|
kind ∈ {container,image,volume,builder} and rejects `--all` on non-image,
|
|
`runBashScript(path, args, {allowedRoots})` and `runNpmScript(script,
|
|
{cwd, allowedRoots})` lock both the script path and cwd to a configured
|
|
set of roots. 17 unit tests cover the rejection paths; `vm/restartContainer`
|
|
and `system/dockerCleanup` migrated. Module covered by the test:coverage
|
|
gate (≥95% lines).)*
|
|
- [x] **P1:** Validate `/code-quality/check`'s `projectPath` against a
|
|
configured set of allowed roots. *(`runCodeQualityCheck` now calls
|
|
`assertPathInAllowedRoots(projectPath, getAllowedRoots())` before any
|
|
lifecycle script runs; `getAllowedRoots()` reads
|
|
`CODE_QUALITY_ALLOWED_ROOTS` (colon-separated) with a default of
|
|
`/opt/bytelyst`. The path is also re-resolved (normalised, `..`
|
|
collapsed) before being passed to `runNpmScript`, which lifts it to its
|
|
own argv slot — no shell interpolation.)*
|
|
- [x] **P2:** Audit-log every shell-out (command + arg vector + actor + result).
|
|
*(Audit schema extended with `action: 'shell-exec'` + `entityType: 'host'`.
|
|
`POST /docker/cleanup`, `POST /vm/cleanup`, `POST /vm/containers/:name/restart`
|
|
now write a Cosmos audit row including the actor (`authUserId`/`authRole`),
|
|
entity id (`docker-cleanup:<type>` etc.), and a sanitized details payload.
|
|
Audit writes are best-effort — a Cosmos hiccup logs a warn but never
|
|
fails the request.)*
|
|
- [ ] **P2:** Run the backend container as a non-root user with `docker` group
|
|
membership; rebuild the Dockerfile accordingly.
|
|
- [ ] **P3:** Move from `docker.sock` to a thin daemon (`docker-proxy`-style)
|
|
that exposes only the verbs the dashboard actually needs (`stats`,
|
|
`restart`, `logs`, the four `prune` variants).
|
|
|
|
Operators reviewing whether to grant a new admin should read this whole section
|
|
before doing so. Adding a new shell-out path in code is a **privilege change**
|
|
and must update this table in the same commit.
|
|
|
|
## Production Checklist
|
|
|
|
- [ ] Platform stack running with Traefik.
|
|
- [ ] DNS records configured.
|
|
- [ ] SSL/TLS certificates configured in Traefik.
|
|
- [ ] Environment variables set for production.
|
|
- [ ] Cosmos DB connection configured.
|
|
- [ ] `JWT_SECRET` matches across all services.
|
|
- [ ] User memberships configured for access.
|
|
- [ ] Health checks passing.
|
|
- [ ] Cross-navigation links working.
|
|
- [ ] Monitoring and logging configured.
|
|
|
|
## Features Implemented
|
|
|
|
### Backend (port 4004)
|
|
- ✅ CI/CD pipeline with Gitea Actions
|
|
- ✅ E2E tests with Playwright (gated; see `.gitea/workflows/ci.yml`)
|
|
- ✅ Telemetry integration
|
|
- ✅ Error boundary
|
|
- ✅ CSRF protection with token refresh
|
|
- ✅ Service CRUD operations
|
|
- ✅ Deployment log retrieval (JSON polling — no SSE; see backend README)
|
|
- ✅ Audit logging
|
|
- ✅ Structured logging
|
|
- ✅ Database migrations
|
|
- ✅ Backup/restore functionality
|
|
- ✅ Performance monitoring (APM)
|
|
- ✅ System metrics (CPU, memory, disk)
|
|
- ✅ Docker cleanup endpoints
|
|
- ✅ OpenAPI/Swagger documentation at `/docs`
|
|
|
|
### Frontend (container :3000, host :3049 under Compose)
|
|
- ✅ Service management UI
|
|
- ✅ Deployment monitoring
|
|
- ✅ Health dashboard
|
|
- ✅ Metrics/charts page
|
|
- ✅ System management page
|
|
- ✅ Log viewer (poll-based)
|
|
- ✅ Accessibility features (ARIA, keyboard nav)
|
|
- ✅ PWA manifest
|
|
- ✅ Responsive design
|