Captures the in-progress state of the long-running v2 dashboard session
so the next session (post `--permission-mode dangerous` relaunch) can
pick up without losing context. The full handoff narrative lives in
`docs/SESSION_CHECKPOINT_2026-05-30.md` — read it first.
Code change:
- `backend/src/server.ts` CORS allow-list is now env-driven via
`EXTRA_CORS_ORIGINS` (comma-separated). Originally added because
the user's browser is hitting the deployed dashboard via a
Tailscale-served hostname (`srv1491630.tailf85608.ts.net`), and
the static built-in list only knew `localhost` + `devops.bytelyst.com`.
Honours `*` as a wildcard for trusted-network deployments. Adds
`Vary: Origin` so caches behave.
- `backend/package-lock.json` regenerated to match `package.json`
(was missing the Phase 5 ESLint deps added earlier this session).
Note: the Dockerfile build is STILL broken with `tsc: not found`
despite typescript being in devDeps — this is a separate
dual-lockfile issue documented in the checkpoint. Untangle on
resume.
Live infra carry-forward summarised in the checkpoint doc:
- Real Azure Cosmos DB (`cosmos-mywisprai` / new `bytelyst` db)
replaces the crash-looping local emulator.
- `learning_ai_common_plat/docker-compose.yml` has uncommitted
changes mirroring this; that repo is 15 commits behind origin/main
and needs a rebase+commit pass separately.
- Hot-patched the running `devops-backend` container's `dist/server.js`
to allow the Tailscale origin (ephemeral; lost on next image build,
superseded by the code change above once rebuild works).
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
9.8 KiB
Session Checkpoint — 2026-05-30
Handoff snapshot for the next session. Read this top-to-bottom before touching anything — there's live infra state outside this repo that's material to the work in progress.
TL;DR
Roadmap items shipped this session: all of Phase 1, 2, 3, 5, 6, 7 of the v2 dashboard roadmap, plus 4 of 5 of the Phase 5 P2 mitigation roadmap. Phase 4 + Phase 8 are documented as delegation briefs (VM ops, not code).
But: the live deployed dashboard is still the pre-this-session image. Building it ran into a pre-existing dual-lockfile issue (pnpm-lock vs backend/package-lock.json drift). That's the first thing to fix on resume so the rest of this session's work actually ships.
There's also a CORS hot-patch applied directly to the running
devops-backend container to unblock the user's browser tour. That
patch evaporates on the next image build/recreate.
What's live right now (running infra)
| Resource | State | Notes |
|---|---|---|
| Tailscale serve | UP | https://srv1491630.tailf85608.ts.net/ → localhost:3049 |
devops-backend container |
Up + healthy | Pre-session image (built ~2026-05-29) + a hot-patch in dist/server.js adding https://srv1491630.tailf85608.ts.net to CORS allow-list |
devops-web container |
Up | Pre-session image |
learning_ai_common_plat-platform-service-1 |
Up + healthy | Restarted with new env pointing at real Cosmos DB |
learning_ai_common_plat-cosmos-emulator-1 |
Stopped | Was crash-looping; replaced with real Cosmos |
Real Cosmos DB account cosmos-mywisprai |
Live | New bytelyst database created in rg-mywisprai (West US 2) |
To check on resume:
docker ps --filter name=devops --filter name=platform-service --filter name=cosmos
tailscale serve status
Credentials (this session's mint, change on first login)
- Dashboard URL: https://srv1491630.tailf85608.ts.net/login
- Email:
admin@bytelyst.local - Password:
cat /tmp/devin-mint-pw.txt(random base64, 20 chars; rotate immediately) - Product ID:
bytelyst-devops - User ID (in Cosmos):
usr_7fb3552c-3d8f-4fed-83e5-8461b018c345
Backup minted JWT (24h, dashboard-backend JWT_SECRET, never used in the
end because the real auth flow took precedence): /tmp/devin-mint-jwt.txt.
Both files are in /tmp — survive shell exit, lost on reboot.
Cross-repo state
learning_ai_devops_tools (this repo)
Branch main. Pushed commits this session — 18 in total, most recently:
| SHA | Phase | Title |
|---|---|---|
eaaa545 |
6 + P2 close | trend cards, theme toggle, drop-root scaffold, Agents inventory, Phase 0 reconfirm |
74a8ee0 |
5 P2 | allow-list shell wrapper + projectPath validation + audit-log shell-outs |
a8cf61a |
8 | Telegram convention + delegation brief |
14c7a8f |
6 | severity alerts + per-instance actions + URL-param deep links |
efdf41f |
4 + 7 | Phase 4 brief + /hermes/ops requireAdmin |
62c0cd6 |
3.2 | Products pane on real service registry |
ad16b13 |
3.1 | hermes-telemetry contract + endpoint + 6 tests |
13e5e1c |
5 P2 | Playwright E2E wired into Gitea CI |
1e64d75 |
5 P2 | structured pino logging + redaction |
c6ec1a0 |
5 P1 | privilege surface doc + /code-quality/check auth fix |
824f315 |
5 P1 | doc drift + dedupe deployment docs |
3fc471e |
5 P1 | SSE TODO removed (dead fastify-sse-v2) |
8ba2dbd |
5 P1 | 35 auth/csrf/health/orchestrator tests + coverage gate |
ecd1f20 |
2 | instance dimension across Mission Control |
1e64d75, c6ec1a0, 824f315, 3fc471e, 8ba2dbd, cf5428a |
earlier in session | (see roadmap notes for full list) |
Uncommitted (will be in the same commit as this checkpoint):
dashboard/backend/src/server.ts— CORS now env-driven viaEXTRA_CORS_ORIGINS. Source-correct, typechecks. Not in the running image because the image rebuild is currently broken (see below).dashboard/backend/package-lock.json— regenerated to matchpackage.json. Was the source of the rebuild error.
learning_ai_common_plat (sibling repo)
Branch main, 15 commits behind origin/main. Uncommitted, not pushed.
Working tree changes:
docker-compose.yml— Cosmos emulator service replaced/disabled, all consuming services point at real Cosmos via.env. Long inline comment explains why..env— gitignored, contains live Cosmos credentials. Do not commit..env.bak-pre-real-cosmos— backup of the env file before I changed it, same gitignore. Delete when you're sure the real-Cosmos setup is keeping.
Suggested next action there: rebase + commit the docker-compose.yml diff
once you've verified other dashboards (mindlyst, lysnrai, etc.) still
work without the emulator. They reference cosmos-emulator:8081 in
compose env vars and will need similar repointing.
Real Cosmos DB layout
- Account:
cosmos-mywisprai(West US 2, resource grouprg-mywisprai) - Existing databases:
mindlyst,lysnrai,mywisprai,invttrdg - New database added today:
bytelyst(for platform-service) - Collections in
bytelyst: created automatically by platform-service'sCOSMOS_AUTO_INIT=trueon startup - Auto-seeded so far:
bytelyst-devopsproduct + the admin user above. All other 12 products (lysnrai,mindlyst, etc.) need re-seeding if their respective dashboards/services are expected to work.
What broke that needs fixing on resume
1. Backend Dockerfile build (BLOCKING the redeploy)
RUN npm ci --ignore-scripts # OK
RUN npm run build # fails: sh: tsc: not found
typescript is in devDependencies of package.json and present in
package-lock.json, but npm ci isn't actually installing it in the
Alpine builder stage. Cause unknown — could be:
- An
NODE_ENV=productionleaking into the build context .npmrcsomehow excluding devDeps- The Alpine Node 20 image's npm having a different behaviour
Investigation paths:
docker run --rm -it node:20-alpine shand reproducenpm cifrom the lockfile manually- Check whether
BYTELYST_PACKAGE_SOURCE=vendor(compose default) is triggering an.pnpmfile.cjshook that drops devDeps - Just switch the Dockerfile to pnpm to align with the workspace
2. Web Dockerfile likely has the same dual-lockfile drift
Haven't verified — but dashboard/web/package-lock.json exists alongside
dashboard/pnpm-lock.yaml. Expect the same npm ci failure when web is
rebuilt. Worth checking in the same pass.
3. CORS allow-list is hot-patched, not built in
The running devops-backend container has a sed-applied edit to
dist/server.js to allow the Tailscale origin. Lost on next image
build. The source fix is committed (this commit) but won't take effect
until the rebuild works. Workaround: keep hot-patching until rebuild is
unblocked, OR set EXTRA_CORS_ORIGINS=https://srv1491630.tailf85608.ts.net
via env at runtime (the new code reads it).
4. The deployed dashboard is the OLD code
The user's "tour" of the dashboard right now shows none of this session's work. After the rebuild is unblocked:
cd /opt/bytelyst/learning_ai_devops_tools/dashboard
EXTRA_CORS_ORIGINS=https://srv1491630.tailf85608.ts.net docker compose up -d --build --force-recreate backend web
…or via build args / env file. New env var: EXTRA_CORS_ORIGINS.
Open delegation work (not blockers for code)
docs/prompts/phase4-bheem-uma-parity.md— VM ops: Uma backup repo + watchdog + restore drill. Requires sudo + Uma GitHub PAT + Uma Telegram bot. Closes 4 of 5 Bheem-only warnings in the ops panel.docs/prompts/phase8-telegram-loop.md— VM ops + bot tokens. Gated on Phase 4. Closes the dashboard-warning → Telegram delivery loop.
Carry-forward from Phase 5 P2 mitigation roadmap
In dashboard/DEPLOYMENT.md "Mitigation roadmap":
- ✅ Allow-list wrapper around shell-outs
- ✅ Validate
/code-quality/check'sprojectPath - ✅ Audit-log every privileged shell-out
- ✅ Non-root backend container (scaffolded, default-off pending host file permissions)
- ❌ P3 still open: replace raw
docker.sockwith verb-restricted daemon. Worth a design doc before code.
How to verify on resume
# 1. Confirm the dashboard URL still works
curl -fsS -o /dev/null -w "tailscale dashboard: %{http_code}\n" \
https://srv1491630.tailf85608.ts.net/login
# 2. Confirm platform-service is healthy on real Cosmos
docker exec learning_ai_common_plat-platform-service-1 \
node -e 'fetch("http://localhost:4003/health").then(r=>r.text()).then(console.log)'
# 3. Confirm the admin user still exists and login works
PW=$(cat /tmp/devin-mint-pw.txt)
docker exec -e PW="$PW" learning_ai_common_plat-platform-service-1 sh -c '
node -e "
fetch(\"http://localhost:4003/api/auth/login\",{
method:\"POST\",
headers:{\"content-type\":\"application/json\"},
body:JSON.stringify({email:\"admin@bytelyst.local\",password:process.env.PW,productId:\"bytelyst-devops\"})
}).then(async r => { console.log(\"login:\", r.status); })
"'
# 4. Confirm CORS hot-patch is still in place
docker exec devops-backend grep tailf85608 dist/server.js
# Expect: 'https://devops.bytelyst.com', 'https://srv1491630.tailf85608.ts.net',
What I'd do first on the next session
- Fix the backend Dockerfile rebuild. Probably switch to pnpm or debug the npm ci devDep issue. Once that works:
- Rebuild + redeploy backend + web with
EXTRA_CORS_ORIGINS=https://srv1491630.tailf85608.ts.net. This brings all the Phase 1-7 work live and replaces the hot-patch. - Verify the user can use the dashboard end-to-end with the new UI.
- Delete
/tmp/devin-mint-jwt.txt(no longer needed once auth works). - Help the user rotate
admin@bytelyst.local's password via the new UI. - Then return to whatever was next — re-seed other products, work on Phase 5 P3 (docker daemon proxy), or let the user drive.
— end checkpoint —