Closes the remaining tractable items from the carry-forward queue.
1. Drop-root scaffold for the backend container (P2 mitigation)
`backend/Dockerfile` adds non-root `app` user (uid 1001) + `docker`
group (gid via `DOCKER_GID` build arg, default 999). `BACKEND_USER`
build arg defaults to `root` so existing deployments keep working;
set it to `app` plus `DOCKER_GID=$(getent group docker | cut -d: -f3)`
to flip the runtime non-root. `dashboard/DEPLOYMENT.md` gets a new
"Running non-root" section with the exact `chgrp`/`chmod` recipe
for the bind-mounted log files (the host-side prep that pairs with
the build flip). DEPLOYMENT.md mitigation roadmap updated.
2. Phase 6 trend cards
`lib/hermes-ops-history.ts` keeps the last 24 ops snapshots in
localStorage (de-duped on `generatedAt`, schema-guarded on read,
degrades silently on quota exceeded). Three trend cards in the
ops panel:
- Warning-volume sparkline + current count
- Healthy-instance count sparkline (X/2)
- Per-instance "minutes since last backup commit" with a 30m
stale threshold
SVG polyline sparklines, no chart library — `<svg viewBox="0 0
100 100" preserveAspectRatio="none">` with `vector-effect:
non-scaling-stroke` so the line stays 2px regardless of the
parent's width.
3. Phase 6 theme toggle
`components/theme-toggle.tsx` Sun/Moon button mounted in the
Hermes layout next to the instance switcher. Persists in
localStorage `bytelyst.theme.v1`. The design system already
defined `[data-theme="light"]` overrides in `styles/tokens.css`;
the toggle just sets the attribute. FOUC-prevention inline script
in the root layout reads the same key BEFORE React hydrates so
the first paint matches the user's last choice.
4. Phase 3 partial close: Agents pane → telemetry inventory
`/hermes/agents` now renders a "Memory & Skills inventory (live)"
SectionCard backed by the Phase 3 telemetry endpoint per instance
— `hermes memory list` and `hermes skills list` rendered with
per-section probe-status badges (`up`/`unknown`), item counts,
and the first N entries each. Agent **health** statuses (latency,
failure rate, last-success/failure) stay seed-data — observability
for those needs a separate ingestion contract that the telemetry
endpoint doesn't provide today.
5. Phase 0 reconfirmation
Roadmap Phase 0 ticked with explicit verification notes for each
guardrail (no public listener, manual approvals, secret hygiene,
Caddy review). Remains "must hold throughout" — the ticks reflect
today's verified state, not single-checkbox completion.
Verified: backend typecheck ✅, 74/74 backend unit tests ✅, web
typecheck ✅, 7/7 E2E ✅, lint 0 errors, build green, coverage gate
≥95% lines on every gated file.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
546 lines
23 KiB
Markdown
546 lines
23 KiB
Markdown
# DevOps & Admin Dashboard Deployment Guide
|
|
|
|
> Canonical deployment doc for `dashboard/`. The previous `DEPLOYMENT_GUIDE.md`
|
|
> has been folded into this file; it remains as a one-line redirect for
|
|
> backwards compatibility with `deploy.sh` and external links.
|
|
|
|
## Overview
|
|
|
|
This guide covers deploying both the DevOps Dashboard and Platform Admin Dashboard using the existing Traefik gateway infrastructure, following the same pattern as the trading dashboard (https://invttrdg.bytelyst.com).
|
|
|
|
## Public URLs
|
|
|
|
- **DevOps Dashboard**: `https://devops.bytelyst.com`
|
|
- **Admin Dashboard**: `https://admin.bytelyst.com`
|
|
- **API Gateway**: `https://api.bytelyst.com`
|
|
- Platform API: `https://api.bytelyst.com/platform/api`
|
|
- DevOps API: `https://api.bytelyst.com/api/devops`
|
|
|
|
## Ports — quick reference
|
|
|
|
The web container always listens on **3000** internally; what changes is what
|
|
the host exposes. Memorize the column for the deployment mode you're in:
|
|
|
|
| Mode | Web (host) | Backend (host) | Notes |
|
|
|-------------------------------------|--------------------|-------------------|--------------------------------------------------------------------|
|
|
| Local dev (`pnpm dev`) | `localhost:3000` | `localhost:4004` | Next listens directly on 3000. |
|
|
| Docker Compose (this repo) | `localhost:3049` | `localhost:4004` | `docker-compose.yml` maps `127.0.0.1:3049:3000` (loopback only). |
|
|
| Production (Traefik) | `https://devops.bytelyst.com` | `https://api.bytelyst.com/api/devops` | Traefik label `loadbalancer.server.port=3000` targets the container port. |
|
|
|
|
Whenever a doc says "the dashboard runs on port 3000", it means the **container
|
|
port** seen by Traefik / Next dev mode — not the host port for the deployed
|
|
stack. Use the table above instead of relying on prose.
|
|
|
|
## Architecture
|
|
|
|
```
|
|
Internet → Traefik Gateway → Services
|
|
├─ DevOps Web (container :3000, host :3049)
|
|
├─ DevOps Backend (:4004)
|
|
├─ Admin Web (:3001)
|
|
├─ Platform Service (:4003)
|
|
└─ Trading Dashboard (:3085)
|
|
```
|
|
|
|
- **Traefik**: API gateway and reverse proxy.
|
|
- **Docker network**: All services connect via `learning_ai_common_plat_default`.
|
|
- **Domain routing**: Traefik routes by host header.
|
|
- **SSL/TLS**: Managed by Traefik with Let's Encrypt.
|
|
|
|
## Prerequisites
|
|
|
|
1. Platform stack running with Traefik gateway.
|
|
2. Docker and Docker Compose installed.
|
|
3. Domain names configured with DNS pointing to your server.
|
|
4. Azure Cosmos DB account (shared with platform-service).
|
|
5. Platform Service running and accessible.
|
|
|
|
## Quick Start
|
|
|
|
### 1. Start the platform stack (if not running)
|
|
|
|
```bash
|
|
cd /opt/bytelyst/learning_ai_common_plat
|
|
docker-compose up -d
|
|
```
|
|
|
|
### 2. Deploy the dashboards
|
|
|
|
```bash
|
|
cd /opt/bytelyst/learning_ai_devops_tools/dashboard
|
|
./deploy.sh
|
|
```
|
|
|
|
This will:
|
|
- Deploy the DevOps Dashboard (backend + web)
|
|
- Deploy the Admin Dashboard via the platform stack
|
|
- Run health checks
|
|
- Print deployment information
|
|
|
|
## Local development
|
|
|
|
If you only need a non-containerized iteration loop (no Traefik, no Docker):
|
|
|
|
```bash
|
|
cd /opt/bytelyst/learning_ai_devops_tools/dashboard
|
|
|
|
# Resolve workspace deps
|
|
pnpm install:common-plat # uses sibling learning_ai_common_plat checkout
|
|
# or
|
|
pnpm install:gitea # uses local Gitea registry at localhost:3300
|
|
|
|
pnpm dev # backend on 4004, web on 3000 (NOT 3049)
|
|
```
|
|
|
|
Required env vars are documented under **Environment Configuration** below; for
|
|
local dev a minimal `.env` with `JWT_SECRET`, `COSMOS_*`, and
|
|
`PLATFORM_SERVICE_URL` is enough.
|
|
|
|
## Manual Docker deployment
|
|
|
|
### Deploy DevOps Dashboard
|
|
|
|
```bash
|
|
cd /opt/bytelyst/learning_ai_devops_tools/dashboard
|
|
docker-compose up -d --build
|
|
```
|
|
|
|
### Deploy Admin Dashboard
|
|
|
|
```bash
|
|
cd /opt/bytelyst/learning_ai_common_plat
|
|
docker-compose up -d admin-web
|
|
```
|
|
|
|
## Environment Configuration
|
|
|
|
### DevOps Dashboard (`.env`)
|
|
|
|
```bash
|
|
# Backend
|
|
PORT=4004
|
|
PLATFORM_SERVICE_URL=http://platform-service:4003
|
|
COSMOS_ENDPOINT=https://your-cosmos-account.documents.azure.com:443/
|
|
COSMOS_KEY=your-cosmos-primary-key
|
|
COSMOS_DATABASE=bytelyst-platform
|
|
JWT_SECRET=your-production-jwt-secret
|
|
CSRF_SECRET=your-production-csrf-secret
|
|
ENCRYPTION_KEY=your-production-encryption-key
|
|
PRODUCT_ID=bytelyst-devops
|
|
PRODUCT_NAME=ByteLyst DevOps Dashboard
|
|
|
|
# Azure Key Vault (optional)
|
|
AZURE_TENANT_ID=your-tenant-id
|
|
AZURE_CLIENT_ID=your-client-id
|
|
AZURE_CLIENT_SECRET=your-client-secret
|
|
AZURE_KEY_VAULT_URL=https://your-keyvault.vault.azure.net/
|
|
|
|
# Frontend
|
|
NEXT_PUBLIC_DEVOPS_API_URL=https://api.bytelyst.com/devops
|
|
NEXT_PUBLIC_PLATFORM_URL=https://api.bytelyst.com/platform/api
|
|
NEXT_PUBLIC_ADMIN_WEB_URL=https://admin.bytelyst.com
|
|
NEXT_PUBLIC_PRODUCT_ID=bytelyst-devops
|
|
NEXT_PUBLIC_PRODUCT_NAME=ByteLyst DevOps Dashboard
|
|
```
|
|
|
|
### Platform Dashboard (`.env`)
|
|
|
|
Add to your platform `.env`:
|
|
|
|
```bash
|
|
# Admin Web Dashboard
|
|
NEXT_PUBLIC_PLATFORM_URL=https://api.bytelyst.com/platform/api
|
|
NEXT_PUBLIC_DEVOPS_WEB_URL=https://devops.bytelyst.com
|
|
```
|
|
|
|
## Traefik Configuration
|
|
|
|
Both dashboards use Traefik labels for routing.
|
|
|
|
### DevOps Web
|
|
|
|
```yaml
|
|
labels:
|
|
- 'traefik.enable=true'
|
|
- 'traefik.http.routers.devops-web.rule=Host(`devops.bytelyst.com`)'
|
|
- 'traefik.http.services.devops-web.loadbalancer.server.port=3000' # container port
|
|
```
|
|
|
|
### DevOps Backend API
|
|
|
|
```yaml
|
|
labels:
|
|
- 'traefik.enable=true'
|
|
- 'traefik.http.routers.devops-api.rule=PathPrefix(`/api/devops`)'
|
|
- 'traefik.http.services.devops-api.loadbalancer.server.port=4004'
|
|
```
|
|
|
|
### Admin Web
|
|
|
|
```yaml
|
|
labels:
|
|
- 'traefik.enable=true'
|
|
- 'traefik.http.routers.admin-web.rule=Host(`admin.bytelyst.com`)'
|
|
- 'traefik.http.services.admin-web.loadbalancer.server.port=3001'
|
|
```
|
|
|
|
## DNS Configuration
|
|
|
|
Add DNS records pointing to your Traefik gateway server:
|
|
|
|
```
|
|
devops.bytelyst.com A <your-server-ip>
|
|
admin.bytelyst.com A <your-server-ip>
|
|
api.bytelyst.com A <your-server-ip>
|
|
```
|
|
|
|
## SSL/TLS Configuration
|
|
|
|
Traefik can automatically handle SSL certificates with Let's Encrypt:
|
|
|
|
```yaml
|
|
command:
|
|
- '--certificatesresolvers.myresolver.acme.tlschallenge=true'
|
|
- '--certificatesresolvers.myresolver.acme.email=admin@bytelyst.com'
|
|
- '--certificatesresolvers.myresolver.acme.storage=/letsencrypt/acme.json'
|
|
```
|
|
|
|
Then update router labels:
|
|
|
|
```yaml
|
|
labels:
|
|
- 'traefik.http.routers.devops-web.tls=true'
|
|
- 'traefik.http.routers.devops-web.tls.certresolver=myresolver'
|
|
```
|
|
|
|
## Cross-Navigation
|
|
|
|
### DevOps Dashboard → Admin Dashboard
|
|
- Header includes a "Platform Admin" link with Shield icon.
|
|
- Opens admin dashboard in a new tab.
|
|
- Uses `NEXT_PUBLIC_ADMIN_WEB_URL`.
|
|
|
|
### Admin Dashboard → DevOps Dashboard
|
|
- Sidebar includes a "DevOps Dashboard" link with Server icon.
|
|
- Opens devops dashboard in a new tab.
|
|
- Uses `NEXT_PUBLIC_DEVOPS_WEB_URL`.
|
|
|
|
## Shared Authentication
|
|
|
|
1. **Platform Service Auth**: Both authenticate against platform-service.
|
|
2. **JWT Tokens**: Same `JWT_SECRET` validates tokens across services.
|
|
3. **Per-Product Access**: Admin access is checked per-product via membership roles.
|
|
4. **Single Sign-On**: Users stay logged in across both dashboards.
|
|
|
|
### Granting Access
|
|
|
|
To grant a user access to both dashboards:
|
|
|
|
1. Ensure user exists in platform-service.
|
|
2. Add admin membership for both products:
|
|
|
|
```json
|
|
{
|
|
"memberships": [
|
|
{ "productId": "bytelyst-devops", "role": "admin", "plan": "pro" },
|
|
{ "productId": "bytelyst-platform", "role": "admin", "plan": "pro" }
|
|
]
|
|
}
|
|
```
|
|
|
|
## Health Checks
|
|
|
|
- DevOps Backend: `http://localhost:4004/health`
|
|
- DevOps Web: `http://localhost:3049` (Docker Compose host port; container :3000)
|
|
- Admin Web: `http://localhost:3001`
|
|
- Traefik Dashboard: `http://localhost:8080`
|
|
|
|
## Troubleshooting
|
|
|
|
### Network issues
|
|
```bash
|
|
# Check if the platform network exists
|
|
docker network inspect learning_ai_common_plat_default
|
|
|
|
# Check container connectivity
|
|
docker network inspect learning_ai_common_plat_default | grep devops
|
|
```
|
|
|
|
### Traefik routing
|
|
```bash
|
|
# Traefik dashboard
|
|
http://localhost:8080
|
|
|
|
# Traefik logs
|
|
docker logs $(docker ps -q -f name=gateway)
|
|
|
|
# Router config for the devops web container
|
|
docker inspect devops-web | grep -A 10 Labels
|
|
```
|
|
|
|
### Authentication failures
|
|
- Verify `JWT_SECRET` matches across all services.
|
|
- Check platform-service is accessible: `curl http://localhost:4003/health`.
|
|
- Ensure the user has the right product memberships.
|
|
|
|
### Service not starting
|
|
```bash
|
|
docker logs devops-backend
|
|
docker logs devops-web
|
|
docker logs admin-web
|
|
docker ps
|
|
docker inspect devops-backend | grep -A 5 Health
|
|
```
|
|
|
|
### Workspace dependency errors
|
|
```bash
|
|
pnpm install:common-plat # local sibling checkout
|
|
pnpm install:gitea # local Gitea registry
|
|
```
|
|
|
|
## Service Management
|
|
|
|
### Stop services
|
|
```bash
|
|
cd /opt/bytelyst/learning_ai_devops_tools/dashboard
|
|
docker-compose down
|
|
|
|
cd /opt/bytelyst/learning_ai_common_plat
|
|
docker-compose stop admin-web
|
|
```
|
|
|
|
### Restart services
|
|
```bash
|
|
cd /opt/bytelyst/learning_ai_devops_tools/dashboard
|
|
docker-compose restart
|
|
|
|
cd /opt/bytelyst/learning_ai_common_plat
|
|
docker-compose restart admin-web
|
|
```
|
|
|
|
### View logs
|
|
```bash
|
|
# DevOps
|
|
docker logs -f devops-backend
|
|
docker logs -f devops-web
|
|
|
|
# Admin
|
|
docker logs -f admin-web
|
|
|
|
# Traefik
|
|
docker logs -f gateway
|
|
```
|
|
|
|
## Comparison with Trading Dashboard
|
|
|
|
| Feature | Trading | DevOps | Admin |
|
|
|--------------|----------------------|-------------------------|------------------------|
|
|
| Domain | invttrdg.bytelyst.com| devops.bytelyst.com | admin.bytelyst.com |
|
|
| Web Port | 3085 (host) | 3049 (host) / 3000 (ctr)| 3001 (host) |
|
|
| Backend Port | 4018 | 4004 | N/A |
|
|
| Network | platform_net | platform_net | default |
|
|
| Traefik | Yes | Yes | Yes |
|
|
| Auth | Platform | Platform | Platform |
|
|
|
|
## Privilege Surface (Docker socket + host mounts)
|
|
|
|
The `devops-backend` container has root-equivalent access to the host. This
|
|
section documents exactly what is mounted, which routes use each mount, and
|
|
what the blast radius looks like if an admin token leaks. It exists so reviewers
|
|
don't have to reverse-engineer this from `docker-compose.yml` and the route
|
|
handlers — and so any future change to the mount set is reviewed against this
|
|
list rather than slipped in.
|
|
|
|
### Mounts (from `docker-compose.yml`)
|
|
|
|
| Host path | Container path | Mode | Purpose |
|
|
|------------------------------------|-----------------------------------|------|-------------------------------------------------------------------------|
|
|
| `/var/run/docker.sock` | `/var/run/docker.sock` | rw | Allows `docker` CLI inside the container to control the host daemon. Used by the `system` and `vm` modules. **Equivalent to root on the host.** |
|
|
| `/opt/bytelyst/learning_ai_devops_tools/scripts` | `/vm-scripts` | ro | Bash scripts the `vm` module shells out to (`HostingerVM/*.sh`). Read-only mount; the container cannot modify the script set. |
|
|
| `/var/log/vm-cleanup.log` | `/host-logs/vm-cleanup.log` | rw | The `vm` cleanup script appends here; backend reads it via `/api/vm/cleanup-log`. |
|
|
| `/var/log/vm-health-check.log` | `/host-logs/vm-health-check.log` | rw | Health-check probe output; backend reads it via `/api/vm/health`. |
|
|
| `/var/log/docker-watchdog.log` | `/host-logs/docker-watchdog.log` | rw | Watchdog tail used by the VM panel. |
|
|
| `extra_hosts: host-gateway` | `host.docker.internal`-equivalent | — | Lets the container reach `host:11434` (Ollama) and other host-only services. Not a filesystem mount, but a privilege-relevant capability — the container can talk to anything bound to `127.0.0.1` on the host. |
|
|
|
|
The container's listening port (`4004`) is bound to `127.0.0.1` only, so the
|
|
API is **not** exposed to the public internet by this compose file — access is
|
|
expected via Tailscale or an SSH tunnel. Any reverse proxy in front of it
|
|
(Traefik in production) is responsible for its own auth + TLS.
|
|
|
|
### What shells out + which routes (auth column = effective gate)
|
|
|
|
| Route | Handler module | What it executes | Auth |
|
|
|--------------------------------------------------|-------------------------------|-----------------------------------------------------------------------------------|-------------|
|
|
| `GET /system/metrics` | `system/repository.ts` | `df -h ...` | `requireAdmin` |
|
|
| `GET /docker/stats` | `system/repository.ts` | `docker images / ps / volume ls / system df` (read-only) | `requireAdmin` |
|
|
| `POST /docker/cleanup` | `system/repository.ts` | `docker container prune -f`, `docker image prune -a -f`, `docker volume prune -f`, `docker builder prune -f` (a fixed allow-list — request body picks one of the four "types") | `requireAdmin` |
|
|
| `GET /vm/health` | `vm/repository.ts` | `bash $VM_SCRIPTS_PATH/vm-health-check.sh --json` | `requireAdmin` |
|
|
| `GET /vm/cleanup-log` | `vm/repository.ts` | reads `/host-logs/vm-cleanup.log` | `requireAdmin` |
|
|
| `GET /vm/cron-status` | `vm/repository.ts` | `crontab -l` | `requireAdmin` |
|
|
| `POST /vm/cleanup` | `vm/repository.ts` | `bash $VM_SCRIPTS_PATH/vm-cleanup.sh` | `requireAdmin` |
|
|
| `GET /vm/containers`, `.../unhealthy`, `.../:name/logs` | `vm/repository.ts` | `docker ps`, `docker inspect`, `docker stats`, `docker logs` | `requireAdmin` |
|
|
| `POST /vm/containers/:name/restart` | `vm/repository.ts` | `docker restart "<name>"` (name is a path param — see "Known sharp edges" below) | `requireAdmin` |
|
|
| `GET /vm/ollama/models`, `DELETE /vm/ollama/models/:name` | `vm/repository.ts` | HTTP-only (talks to host Ollama via `host-gateway`). No shell-out. | `requireAdmin` |
|
|
| `POST /code-quality/check` | `code-quality/repository.ts` | `npm run typecheck`, `npm run lint`, `npm run build`, `npm run test:run` in the request-supplied `projectPath`. | `requireAdmin` *(added concurrently with this doc; previously unauthenticated — see the Phase 5 P1 commit)* |
|
|
| `POST /deployments/trigger/:serviceId` | `deployments/orchestrator.ts` | `bash <service.scriptPath>` from the registered service registry (paths are stored at create-time, not request-time). | `requireAdmin` |
|
|
| `/hermes/ops` (snapshot) | `hermes-ops/repository.ts` | Read-only probes: `systemctl is-active/is-enabled`, `git status`, `du -sh`, `ps`, `tailscale ip`, `runuser -u uma -- systemctl --user ...`. No state-changing commands. | `requireAdmin` *(Phase 7 — private-only)* |
|
|
| `/hermes/telemetry/:instance` | `hermes-telemetry/repository.ts` | Read-only: `runuser -u <user> -- hermes sessions/cron/memory/skills list --json`, `git -C <backup-repo> log`, tail of the watchdog log. No state-changing commands. | `requireAdmin` |
|
|
|
|
### Blast radius if an admin token is leaked
|
|
|
|
Anyone holding a valid admin JWT for this product can, today:
|
|
|
|
- Run any of the four pre-defined `docker prune` commands (data loss for
|
|
containers/images/volumes), restart any container, read any container's logs.
|
|
- Trigger the host VM cleanup script and crontab listing.
|
|
- Trigger any deployment script registered in the service registry.
|
|
- Run `npm run` lifecycle scripts in any directory the container can read
|
|
(since `code-quality/check` takes a caller-supplied `projectPath`).
|
|
- Read the three host logs that are mounted in.
|
|
|
|
In other words, an admin token is **equivalent to a host shell**, modulo the
|
|
specific commands the codebase chooses to wrap. There is currently **no
|
|
allow-list wrapper** between the backend and the docker socket; the backend
|
|
constructs `docker ...` shell strings directly with `execAsync`.
|
|
|
|
### Known sharp edges (track and shrink)
|
|
|
|
1. **Container name is interpolated into a shell string.** `docker restart
|
|
"${name}"` and similar paths in `vm/repository.ts` use `execAsync` with a
|
|
template literal. The `:name` path parameter is admin-only but is not
|
|
validated against a `^[a-zA-Z0-9._-]+$` allow-list. Lock this down before
|
|
exposing the dashboard to a wider admin pool.
|
|
2. **`projectPath` for `/code-quality/check` is unvalidated.** The handler
|
|
passes the caller-supplied path straight into `execAsync({ cwd })`. Even
|
|
with `requireAdmin` added, this should be constrained to a known set of
|
|
project roots (or rejected if it escapes the workspace).
|
|
3. **No per-route audit-log on shell-outs.** `audit/repository.ts` records
|
|
deployment triggers but not `/docker/cleanup` or `/vm/cleanup`. A leaked
|
|
token's actions are reconstructable only from container stdout + host logs.
|
|
4. **The container runs as root.** Both the backend `Dockerfile` and the bind-
|
|
mounts assume root. A non-root user with `docker` group membership would
|
|
shrink the in-container blast radius without losing functionality (the
|
|
socket is still root on the host); revisit when ready.
|
|
5. **`fastify-rate-limit` is global, not per-route.** A leaked admin token
|
|
currently isn't slowed down on the destructive endpoints any more than it
|
|
is on read-only ones.
|
|
|
|
### Mitigation roadmap (incremental, not all at once)
|
|
|
|
- [x] **P1:** Allow-list wrapper around shell-outs. *(`lib/shell.ts` ships with
|
|
`execAllowed` (no shell, just `execFile` with an explicit argv) plus
|
|
per-command helpers — `dockerRestart(name)` validates against
|
|
`[a-zA-Z0-9][a-zA-Z0-9._-]{0,127}`, `dockerPrune(kind, {all?})` validates
|
|
kind ∈ {container,image,volume,builder} and rejects `--all` on non-image,
|
|
`runBashScript(path, args, {allowedRoots})` and `runNpmScript(script,
|
|
{cwd, allowedRoots})` lock both the script path and cwd to a configured
|
|
set of roots. 17 unit tests cover the rejection paths; `vm/restartContainer`
|
|
and `system/dockerCleanup` migrated. Module covered by the test:coverage
|
|
gate (≥95% lines).)*
|
|
- [x] **P1:** Validate `/code-quality/check`'s `projectPath` against a
|
|
configured set of allowed roots. *(`runCodeQualityCheck` now calls
|
|
`assertPathInAllowedRoots(projectPath, getAllowedRoots())` before any
|
|
lifecycle script runs; `getAllowedRoots()` reads
|
|
`CODE_QUALITY_ALLOWED_ROOTS` (colon-separated) with a default of
|
|
`/opt/bytelyst`. The path is also re-resolved (normalised, `..`
|
|
collapsed) before being passed to `runNpmScript`, which lifts it to its
|
|
own argv slot — no shell interpolation.)*
|
|
- [x] **P2:** Audit-log every shell-out (command + arg vector + actor + result).
|
|
*(Audit schema extended with `action: 'shell-exec'` + `entityType: 'host'`.
|
|
`POST /docker/cleanup`, `POST /vm/cleanup`, `POST /vm/containers/:name/restart`
|
|
now write a Cosmos audit row including the actor (`authUserId`/`authRole`),
|
|
entity id (`docker-cleanup:<type>` etc.), and a sanitized details payload.
|
|
Audit writes are best-effort — a Cosmos hiccup logs a warn but never
|
|
fails the request.)*
|
|
- [x] **P2:** Run the backend container as a non-root user with `docker` group
|
|
membership; rebuild the Dockerfile accordingly. *(Dockerfile scaffolds
|
|
a non-root `app` user (uid 1001) with `docker` group membership at a
|
|
build-arg-configurable GID. Default `BACKEND_USER=root` preserves the
|
|
current behaviour so existing deployments don't break; set
|
|
`BACKEND_USER=app` and `DOCKER_GID=$(getent group docker | cut -d: -f3)`
|
|
to flip it on. Requires host-side prep on the bind-mounted log files —
|
|
see "Running non-root" below for the exact `chmod`/`chgrp` recipe.)*
|
|
- [ ] **P3:** Move from `docker.sock` to a thin daemon (`docker-proxy`-style)
|
|
that exposes only the verbs the dashboard actually needs (`stats`,
|
|
`restart`, `logs`, the four `prune` variants).
|
|
|
|
### Running non-root
|
|
|
|
Concrete recipe to flip the backend off root:
|
|
|
|
```bash
|
|
# 1. Find the host's docker group GID
|
|
DOCKER_GID=$(getent group docker | cut -d: -f3)
|
|
|
|
# 2. Make the bind-mounted log files group-owned by docker and group-writable
|
|
# so the in-container `app` user (gid=$DOCKER_GID) can read/write them.
|
|
sudo chgrp docker /var/log/vm-cleanup.log /var/log/vm-health-check.log /var/log/docker-watchdog.log
|
|
sudo chmod g+rw /var/log/vm-cleanup.log /var/log/vm-health-check.log /var/log/docker-watchdog.log
|
|
|
|
# 3. Confirm the VM scripts mount is world-readable (it's read-only inside
|
|
# the container, so 0o755 on the directory is enough).
|
|
sudo chmod -R o+rX /opt/bytelyst/learning_ai_devops_tools/scripts
|
|
|
|
# 4. Rebuild the backend image with BACKEND_USER=app and the host's GID.
|
|
cd /opt/bytelyst/learning_ai_devops_tools/dashboard
|
|
docker compose build --build-arg BACKEND_USER=app --build-arg DOCKER_GID=$DOCKER_GID backend
|
|
|
|
# 5. Restart and verify
|
|
docker compose up -d backend
|
|
docker exec devops-backend whoami # → app
|
|
docker exec devops-backend id # uid=1001(app) gid=$DOCKER_GID(docker)
|
|
curl -fsS http://localhost:4004/health
|
|
```
|
|
|
|
If the backend can't reach the docker socket after the flip, double-check
|
|
the in-container `id` matches `getent group docker` on the host. The
|
|
`docker.sock` bind-mount carries its host ownership into the container,
|
|
so the in-container gid must match.
|
|
|
|
Operators reviewing whether to grant a new admin should read this whole section
|
|
before doing so. Adding a new shell-out path in code is a **privilege change**
|
|
and must update this table in the same commit.
|
|
|
|
## Production Checklist
|
|
|
|
- [ ] Platform stack running with Traefik.
|
|
- [ ] DNS records configured.
|
|
- [ ] SSL/TLS certificates configured in Traefik.
|
|
- [ ] Environment variables set for production.
|
|
- [ ] Cosmos DB connection configured.
|
|
- [ ] `JWT_SECRET` matches across all services.
|
|
- [ ] User memberships configured for access.
|
|
- [ ] Health checks passing.
|
|
- [ ] Cross-navigation links working.
|
|
- [ ] Monitoring and logging configured.
|
|
|
|
## Features Implemented
|
|
|
|
### Backend (port 4004)
|
|
- ✅ CI/CD pipeline with Gitea Actions
|
|
- ✅ E2E tests with Playwright (gated; see `.gitea/workflows/ci.yml`)
|
|
- ✅ Telemetry integration
|
|
- ✅ Error boundary
|
|
- ✅ CSRF protection with token refresh
|
|
- ✅ Service CRUD operations
|
|
- ✅ Deployment log retrieval (JSON polling — no SSE; see backend README)
|
|
- ✅ Audit logging
|
|
- ✅ Structured logging
|
|
- ✅ Database migrations
|
|
- ✅ Backup/restore functionality
|
|
- ✅ Performance monitoring (APM)
|
|
- ✅ System metrics (CPU, memory, disk)
|
|
- ✅ Docker cleanup endpoints
|
|
- ✅ OpenAPI/Swagger documentation at `/docs`
|
|
|
|
### Frontend (container :3000, host :3049 under Compose)
|
|
- ✅ Service management UI
|
|
- ✅ Deployment monitoring
|
|
- ✅ Health dashboard
|
|
- ✅ Metrics/charts page
|
|
- ✅ System management page
|
|
- ✅ Log viewer (poll-based)
|
|
- ✅ Accessibility features (ARIA, keyboard nav)
|
|
- ✅ PWA manifest
|
|
- ✅ Responsive design
|