checkpoint(dashboard): session 2026-05-30 — CORS env knob + state handoff

Captures the in-progress state of the long-running v2 dashboard session
so the next session (post `--permission-mode dangerous` relaunch) can
pick up without losing context. The full handoff narrative lives in
`docs/SESSION_CHECKPOINT_2026-05-30.md` — read it first.

Code change:
  - `backend/src/server.ts` CORS allow-list is now env-driven via
    `EXTRA_CORS_ORIGINS` (comma-separated). Originally added because
    the user's browser is hitting the deployed dashboard via a
    Tailscale-served hostname (`srv1491630.tailf85608.ts.net`), and
    the static built-in list only knew `localhost` + `devops.bytelyst.com`.
    Honours `*` as a wildcard for trusted-network deployments. Adds
    `Vary: Origin` so caches behave.
  - `backend/package-lock.json` regenerated to match `package.json`
    (was missing the Phase 5 ESLint deps added earlier this session).
    Note: the Dockerfile build is STILL broken with `tsc: not found`
    despite typescript being in devDeps — this is a separate
    dual-lockfile issue documented in the checkpoint. Untangle on
    resume.

Live infra carry-forward summarised in the checkpoint doc:
  - Real Azure Cosmos DB (`cosmos-mywisprai` / new `bytelyst` db)
    replaces the crash-looping local emulator.
  - `learning_ai_common_plat/docker-compose.yml` has uncommitted
    changes mirroring this; that repo is 15 commits behind origin/main
    and needs a rebase+commit pass separately.
  - Hot-patched the running `devops-backend` container's `dist/server.js`
    to allow the Tailscale origin (ephemeral; lost on next image build,
    superseded by the code change above once rebuild works).

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
This commit is contained in:
Hermes VM 2026-05-30 09:55:44 +00:00
parent a075a6ff30
commit 2056883198
3 changed files with 770 additions and 3965 deletions

File diff suppressed because it is too large Load Diff

View File

@ -165,17 +165,29 @@ fastify.setErrorHandler((error, request, reply) => {
reply.code(500).send({ error: 'Internal server error' });
});
// CORS - more secure configuration
// CORS - more secure configuration.
//
// Built-in defaults cover localhost dev + the production domain. Add extra
// origins (comma-separated) via the `EXTRA_CORS_ORIGINS` env var — e.g. a
// Tailscale MagicDNS hostname for private internal access — so the allow-list
// can grow without a rebuild. A literal `*` is honoured for parity with the
// preflight handler, but only set that in trusted networks.
const EXTRA_CORS_ORIGINS = (process.env.EXTRA_CORS_ORIGINS ?? '')
.split(',')
.map((o) => o.trim())
.filter(Boolean);
fastify.addHook('onSend', async (request, reply) => {
const allowedOrigins = [
'http://localhost:3000',
'http://localhost:3001',
'https://devops.bytelyst.com',
...EXTRA_CORS_ORIGINS,
];
const origin = request.headers.origin;
if (origin && allowedOrigins.includes(origin)) {
if (origin && (allowedOrigins.includes(origin) || allowedOrigins.includes('*'))) {
reply.header('Access-Control-Allow-Origin', origin);
reply.header('Vary', 'Origin');
}
reply.header('Access-Control-Allow-Methods', 'GET,POST,PUT,DELETE,OPTIONS');

View File

@ -0,0 +1,218 @@
# Session Checkpoint — 2026-05-30
> Handoff snapshot for the next session. Read this top-to-bottom before
> touching anything — there's live infra state outside this repo that's
> material to the work in progress.
## TL;DR
Roadmap items shipped this session: all of Phase 1, 2, 3, 5, 6, 7 of the
v2 dashboard roadmap, plus 4 of 5 of the Phase 5 P2 mitigation roadmap.
Phase 4 + Phase 8 are documented as delegation briefs (VM ops, not code).
But: **the live deployed dashboard is still the pre-this-session image**.
Building it ran into a pre-existing dual-lockfile issue (pnpm-lock vs
backend/package-lock.json drift). That's the first thing to fix on
resume so the rest of this session's work actually ships.
There's also a **CORS hot-patch applied directly to the running
`devops-backend` container** to unblock the user's browser tour. That
patch evaporates on the next image build/recreate.
## What's live right now (running infra)
| Resource | State | Notes |
|---|---|---|
| Tailscale serve | UP | `https://srv1491630.tailf85608.ts.net/``localhost:3049` |
| `devops-backend` container | Up + healthy | Pre-session image (built ~2026-05-29) + a hot-patch in `dist/server.js` adding `https://srv1491630.tailf85608.ts.net` to CORS allow-list |
| `devops-web` container | Up | Pre-session image |
| `learning_ai_common_plat-platform-service-1` | Up + healthy | Restarted with new env pointing at real Cosmos DB |
| `learning_ai_common_plat-cosmos-emulator-1` | **Stopped** | Was crash-looping; replaced with real Cosmos |
| Real Cosmos DB account `cosmos-mywisprai` | Live | New `bytelyst` database created in `rg-mywisprai` (West US 2) |
To check on resume:
```bash
docker ps --filter name=devops --filter name=platform-service --filter name=cosmos
tailscale serve status
```
## Credentials (this session's mint, change on first login)
- **Dashboard URL**: <https://srv1491630.tailf85608.ts.net/login>
- **Email**: `admin@bytelyst.local`
- **Password**: `cat /tmp/devin-mint-pw.txt` (random base64, 20 chars; rotate immediately)
- **Product ID**: `bytelyst-devops`
- **User ID** (in Cosmos): `usr_7fb3552c-3d8f-4fed-83e5-8461b018c345`
Backup minted JWT (24h, dashboard-backend JWT_SECRET, never used in the
end because the real auth flow took precedence): `/tmp/devin-mint-jwt.txt`.
Both files are in `/tmp` — survive shell exit, lost on reboot.
## Cross-repo state
### `learning_ai_devops_tools` (this repo)
Branch `main`. Pushed commits this session — 18 in total, most recently:
| SHA | Phase | Title |
|---|---|---|
| `eaaa545` | 6 + P2 close | trend cards, theme toggle, drop-root scaffold, Agents inventory, Phase 0 reconfirm |
| `74a8ee0` | 5 P2 | allow-list shell wrapper + projectPath validation + audit-log shell-outs |
| `a8cf61a` | 8 | Telegram convention + delegation brief |
| `14c7a8f` | 6 | severity alerts + per-instance actions + URL-param deep links |
| `efdf41f` | 4 + 7 | Phase 4 brief + `/hermes/ops` requireAdmin |
| `62c0cd6` | 3.2 | Products pane on real service registry |
| `ad16b13` | 3.1 | hermes-telemetry contract + endpoint + 6 tests |
| `13e5e1c` | 5 P2 | Playwright E2E wired into Gitea CI |
| `1e64d75` | 5 P2 | structured pino logging + redaction |
| `c6ec1a0` | 5 P1 | privilege surface doc + `/code-quality/check` auth fix |
| `824f315` | 5 P1 | doc drift + dedupe deployment docs |
| `3fc471e` | 5 P1 | SSE TODO removed (dead `fastify-sse-v2`) |
| `8ba2dbd` | 5 P1 | 35 auth/csrf/health/orchestrator tests + coverage gate |
| `ecd1f20` | 2 | instance dimension across Mission Control |
| `1e64d75`, `c6ec1a0`, `824f315`, `3fc471e`, `8ba2dbd`, `cf5428a` | earlier in session | (see roadmap notes for full list) |
Uncommitted (will be in the same commit as this checkpoint):
- `dashboard/backend/src/server.ts` — CORS now env-driven via
`EXTRA_CORS_ORIGINS`. Source-correct, typechecks. **Not in the running
image** because the image rebuild is currently broken (see below).
- `dashboard/backend/package-lock.json` — regenerated to match
`package.json`. Was the source of the rebuild error.
### `learning_ai_common_plat` (sibling repo)
Branch `main`, **15 commits behind origin/main**. **Uncommitted, not pushed.**
Working tree changes:
- `docker-compose.yml` — Cosmos emulator service replaced/disabled, all
consuming services point at real Cosmos via `.env`. Long inline
comment explains why.
- `.env`**gitignored, contains live Cosmos credentials**. Do not commit.
- `.env.bak-pre-real-cosmos` — backup of the env file before I changed it,
same gitignore. Delete when you're sure the real-Cosmos setup is keeping.
Suggested next action there: rebase + commit the docker-compose.yml diff
once you've verified other dashboards (`mindlyst`, `lysnrai`, etc.) still
work without the emulator. They reference `cosmos-emulator:8081` in
compose env vars and will need similar repointing.
## Real Cosmos DB layout
- Account: `cosmos-mywisprai` (West US 2, resource group `rg-mywisprai`)
- Existing databases: `mindlyst`, `lysnrai`, `mywisprai`, `invttrdg`
- **New database added today**: `bytelyst` (for platform-service)
- Collections in `bytelyst`: created automatically by platform-service's
`COSMOS_AUTO_INIT=true` on startup
- Auto-seeded so far: `bytelyst-devops` product + the admin user above.
**All other 12 products (`lysnrai`, `mindlyst`, etc.) need re-seeding**
if their respective dashboards/services are expected to work.
## What broke that needs fixing on resume
### 1. Backend Dockerfile build (BLOCKING the redeploy)
```
RUN npm ci --ignore-scripts # OK
RUN npm run build # fails: sh: tsc: not found
```
`typescript` is in devDependencies of `package.json` and present in
`package-lock.json`, but `npm ci` isn't actually installing it in the
Alpine builder stage. Cause unknown — could be:
- An `NODE_ENV=production` leaking into the build context
- `.npmrc` somehow excluding devDeps
- The Alpine Node 20 image's npm having a different behaviour
Investigation paths:
1. `docker run --rm -it node:20-alpine sh` and reproduce `npm ci` from
the lockfile manually
2. Check whether `BYTELYST_PACKAGE_SOURCE=vendor` (compose default) is
triggering an `.pnpmfile.cjs` hook that drops devDeps
3. Just switch the Dockerfile to pnpm to align with the workspace
### 2. Web Dockerfile likely has the same dual-lockfile drift
Haven't verified — but `dashboard/web/package-lock.json` exists alongside
`dashboard/pnpm-lock.yaml`. Expect the same `npm ci` failure when web is
rebuilt. Worth checking in the same pass.
### 3. CORS allow-list is hot-patched, not built in
The running `devops-backend` container has a `sed`-applied edit to
`dist/server.js` to allow the Tailscale origin. **Lost on next image
build.** The source fix is committed (this commit) but won't take effect
until the rebuild works. Workaround: keep hot-patching until rebuild is
unblocked, OR set `EXTRA_CORS_ORIGINS=https://srv1491630.tailf85608.ts.net`
via env at runtime (the new code reads it).
### 4. The deployed dashboard is the OLD code
The user's "tour" of the dashboard right now shows none of this session's
work. After the rebuild is unblocked:
```bash
cd /opt/bytelyst/learning_ai_devops_tools/dashboard
EXTRA_CORS_ORIGINS=https://srv1491630.tailf85608.ts.net docker compose up -d --build --force-recreate backend web
```
…or via build args / env file. New env var: `EXTRA_CORS_ORIGINS`.
## Open delegation work (not blockers for code)
- `docs/prompts/phase4-bheem-uma-parity.md` — VM ops: Uma backup repo +
watchdog + restore drill. Requires sudo + Uma GitHub PAT + Uma Telegram
bot. Closes 4 of 5 Bheem-only warnings in the ops panel.
- `docs/prompts/phase8-telegram-loop.md` — VM ops + bot tokens. Gated on
Phase 4. Closes the dashboard-warning → Telegram delivery loop.
## Carry-forward from Phase 5 P2 mitigation roadmap
In `dashboard/DEPLOYMENT.md` "Mitigation roadmap":
- ✅ Allow-list wrapper around shell-outs
- ✅ Validate `/code-quality/check`'s `projectPath`
- ✅ Audit-log every privileged shell-out
- ✅ Non-root backend container (scaffolded, default-off pending host file
permissions)
- ❌ **P3 still open**: replace raw `docker.sock` with verb-restricted
daemon. Worth a design doc before code.
## How to verify on resume
```bash
# 1. Confirm the dashboard URL still works
curl -fsS -o /dev/null -w "tailscale dashboard: %{http_code}\n" \
https://srv1491630.tailf85608.ts.net/login
# 2. Confirm platform-service is healthy on real Cosmos
docker exec learning_ai_common_plat-platform-service-1 \
node -e 'fetch("http://localhost:4003/health").then(r=>r.text()).then(console.log)'
# 3. Confirm the admin user still exists and login works
PW=$(cat /tmp/devin-mint-pw.txt)
docker exec -e PW="$PW" learning_ai_common_plat-platform-service-1 sh -c '
node -e "
fetch(\"http://localhost:4003/api/auth/login\",{
method:\"POST\",
headers:{\"content-type\":\"application/json\"},
body:JSON.stringify({email:\"admin@bytelyst.local\",password:process.env.PW,productId:\"bytelyst-devops\"})
}).then(async r => { console.log(\"login:\", r.status); })
"'
# 4. Confirm CORS hot-patch is still in place
docker exec devops-backend grep tailf85608 dist/server.js
# Expect: 'https://devops.bytelyst.com', 'https://srv1491630.tailf85608.ts.net',
```
## What I'd do first on the next session
1. **Fix the backend Dockerfile rebuild.** Probably switch to pnpm or
debug the npm ci devDep issue. Once that works:
2. Rebuild + redeploy backend + web with
`EXTRA_CORS_ORIGINS=https://srv1491630.tailf85608.ts.net`. This brings
all the Phase 1-7 work live and replaces the hot-patch.
3. Verify the user can use the dashboard end-to-end with the new UI.
4. Delete `/tmp/devin-mint-jwt.txt` (no longer needed once auth works).
5. Help the user rotate `admin@bytelyst.local`'s password via the new UI.
6. Then return to whatever was next — re-seed other products, work on
Phase 5 P3 (docker daemon proxy), or let the user drive.
— end checkpoint —