checkpoint(dashboard): session 2026-05-31 — Tier 1 closed, dashboard live
- Backend + web rebuilt from the previous session's Dockerfile fixes. - Phase 1-7 dashboard UI now actually live; the dist/server.js CORS hot-patch is retired (CORS is env-driven via EXTRA_CORS_ORIGINS). - Tailscale serve restored: caddy was bound to 0.0.0.0:443 and blocked tailscaled from claiming 100.87.53.10:443. Fixed via a one-line compose change in learning_ai_common_plat (commit c0db2901). - End-to-end login through real Cosmos verified at https://srv1491630.tailf85608.ts.net/login. Active-repo sweep results (clock, notes, flowmonk, invt_trdg) and HOLD repo triage are documented in the checkpoint. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
This commit is contained in:
parent
254ef2704c
commit
4326001650
251
docs/SESSION_CHECKPOINT_2026-05-31.md
Normal file
251
docs/SESSION_CHECKPOINT_2026-05-31.md
Normal file
@ -0,0 +1,251 @@
|
||||
# Session Checkpoint — 2026-05-31
|
||||
|
||||
> Resumes from `SESSION_CHECKPOINT_2026-05-30.md`. The Tier-1 dashboard
|
||||
> blocker is now closed: the new image is live, the CORS hot-patch is
|
||||
> retired, and login through `https://srv1491630.tailf85608.ts.net/login`
|
||||
> works against real Cosmos.
|
||||
|
||||
## TL;DR
|
||||
|
||||
- Tier 1 ✅ — backend + web rebuilt and redeployed; Phase 1-7 dashboard
|
||||
UI is now actually live; CORS is env-driven (`EXTRA_CORS_ORIGINS`); the
|
||||
pre-session `dist/server.js` hot-patch is gone.
|
||||
- Tailscale serve was failing with `tlsv1 alert internal error` because
|
||||
Caddy was bound to `0.0.0.0:443`, blocking `tailscaled` from claiming
|
||||
`100.87.53.10:443`. Fixed by binding Caddy to the public eth0 IP only.
|
||||
- Active-repo sweep ✅ — `learning_ai_clock`, `learning_ai_notes`,
|
||||
`learning_ai_flowmonk`, `learning_ai_invt_trdg` all green
|
||||
(install / lint / typecheck / test / build).
|
||||
- HOLD-repo sweep ⚠️ — every HOLD repo fails at `pnpm install` due to
|
||||
registry/workspace issues, none of which are repo-level bugs. See
|
||||
details below; recommended single-fix listed.
|
||||
|
||||
## What's live right now
|
||||
|
||||
| Resource | State | Notes |
|
||||
|---|---|---|
|
||||
| Tailscale serve | UP | `https://srv1491630.tailf85608.ts.net/` → `localhost:3049` (web) |
|
||||
| `devops-backend` container | Up + healthy | **New image** `dashboard-backend:latest` `sha256:8a0c284f…`, built 2026-05-30T16:31Z. CORS is env-driven; `EXTRA_CORS_ORIGINS` is read at startup. **No** more `dist/server.js` hot-patch. |
|
||||
| `devops-web` container | Up | **New image** `dashboard-web:latest` `sha256:121f356f…`, built 2026-05-30T16:31Z. |
|
||||
| `caddy` container | Up | **Now bound to `187.124.159.82:80/443` only** (not `0.0.0.0`). Public api.bytelyst.com / devops.bytelyst.com routing intact. |
|
||||
| `learning_ai_common_plat-platform-service-1` | Up + healthy | Real Azure Cosmos (`cosmos-mywisprai`, db `bytelyst`). |
|
||||
| `learning_ai_common_plat-cosmos-emulator-1` | Started, then no longer in use | Started incidentally when ecosystem compose pulled it in (orphan); not consumed by platform-service. Safe to `docker stop` if you want it gone. |
|
||||
|
||||
Verify on resume:
|
||||
|
||||
```bash
|
||||
curl -fsS -o /dev/null -w "/login: %{http_code}\n" https://srv1491630.tailf85608.ts.net/login
|
||||
# Expect: 200
|
||||
|
||||
PW=$(cat /tmp/devin-mint-pw.txt)
|
||||
curl -sS -o /dev/null -w "auth: %{http_code}\n" \
|
||||
-X POST http://localhost:4003/api/auth/login \
|
||||
-H 'content-type: application/json' \
|
||||
-d "{\"email\":\"admin@bytelyst.local\",\"password\":\"$PW\",\"productId\":\"bytelyst-devops\"}"
|
||||
# Expect: 200 (JWT in body)
|
||||
|
||||
curl -sS -I -X OPTIONS http://localhost:4004/api/auth/login \
|
||||
-H "Origin: https://srv1491630.tailf85608.ts.net" \
|
||||
-H "Access-Control-Request-Method: POST" | grep -i access-control-allow-origin
|
||||
# Expect: access-control-allow-origin: https://srv1491630.tailf85608.ts.net
|
||||
```
|
||||
|
||||
## Credentials
|
||||
|
||||
- **Dashboard URL**: <https://srv1491630.tailf85608.ts.net/login>
|
||||
- **Email**: `admin@bytelyst.local`
|
||||
- **Password**: `cat /tmp/devin-mint-pw.txt` (still preserved; rotate via UI on first login)
|
||||
- **Product ID**: `bytelyst-devops`
|
||||
- **User ID** (Cosmos): `usr_7fb3552c-3d8f-4fed-83e5-8461b018c345`
|
||||
- `/tmp/devin-mint-jwt.txt` — already removed (was no longer needed).
|
||||
|
||||
## Commits pushed this session
|
||||
|
||||
### `learning_ai_common_plat`
|
||||
|
||||
| SHA | Title |
|
||||
|---|---|
|
||||
| `c0db2901` | `fix(infra): bind caddy to public eth0 IP only` |
|
||||
|
||||
### `learning_ai_devops_tools`
|
||||
|
||||
No new commits this session — the previous-session commit `254ef27`
|
||||
(`fix(dashboard): switch backend+web Dockerfiles to pnpm; add missing
|
||||
pino dep`) was already pushed and was sufficient. The build using that
|
||||
Dockerfile produced the now-running images.
|
||||
|
||||
### `learning_ai_flowmonk`
|
||||
|
||||
| SHA | Title |
|
||||
|---|---|
|
||||
| `4f68637` | `chore: refresh pnpm-lock.yaml` |
|
||||
|
||||
### `learning_ai_invt_trdg`
|
||||
|
||||
| SHA | Title |
|
||||
|---|---|
|
||||
| `33c4bb0` | `chore: fix lint regressions in secret-hygiene & security-guards` |
|
||||
|
||||
(Two stale lint guards: `backend check:secret-hygiene` was tripping on
|
||||
`KEY=${user.X}` template literals — fixed by quoting; `backend
|
||||
check:security-guards` regex was stale vs the new `makeAuthMiddleware`
|
||||
factory pattern — regex updated.)
|
||||
|
||||
### `learning_ai_clock`, `learning_ai_notes`
|
||||
|
||||
No commits — both already fully green (clock: 622 tests pass; notes:
|
||||
backend 380 + web 177 + mobile 97 = 654 tests pass), nothing to fix.
|
||||
|
||||
## Tier 1: dashboard rebuild + redeploy
|
||||
|
||||
What I actually had to do:
|
||||
|
||||
1. The Dockerfile fix was already in place from the previous session's
|
||||
final commit (`254ef27` — switch to pnpm, fix `tsc: not found`).
|
||||
2. `BYTELYST_PACKAGE_SOURCE=gitea docker compose build backend web` ran
|
||||
clean (entirely cached — confirms the Dockerfiles are correct).
|
||||
3. New image fingerprint check: `docker run --rm dashboard-backend:latest
|
||||
grep -c "EXTRA_CORS_ORIGINS" dist/server.js` → 3 hits ✅.
|
||||
4. `docker compose up -d --force-recreate backend web` — both healthy.
|
||||
5. Discovered `/login` was returning `TLS internal error` from
|
||||
tailscaled. Root cause: `caddy` was bound to `0.0.0.0:443`, so
|
||||
`tailscaled` couldn't claim `100.87.53.10:443`. (`journalctl -u
|
||||
tailscaled` showed `bind: address already in use`.)
|
||||
6. Fix: edited `learning_ai_common_plat/docker-compose.ecosystem.yml` to
|
||||
bind caddy on `187.124.159.82:80/443` (public eth0 IP) only. After
|
||||
`docker compose -f docker-compose.ecosystem.yml up -d caddy`,
|
||||
`tailscaled` claimed `100.87.53.10:443` and `/login` returned 200.
|
||||
7. Side effect: the ecosystem compose recreate also recreated
|
||||
`platform-service` with its ecosystem env (cosmos-emulator). Restored
|
||||
real-Cosmos by re-running `docker compose up -d --force-recreate
|
||||
platform-service` from `learning_ai_common_plat/` (the regular
|
||||
compose, which uses `.env` with the Azure Cosmos endpoint).
|
||||
|
||||
After all that, end-to-end login through real Cosmos works again.
|
||||
|
||||
## Active-repo sweep results
|
||||
|
||||
| Repo | Install | Lint | Typecheck | Test | Build | Commits |
|
||||
|---|---|---|---|---|---|---|
|
||||
| `learning_ai_clock` | ✅ | ✅ (1 unused-import warning) | ✅ | ✅ 622 tests | ✅ | none |
|
||||
| `learning_ai_notes` | ✅ | ✅ (React 19 warnings) | ✅ | ✅ 654 tests | ✅ | none |
|
||||
| `learning_ai_flowmonk` | ✅ | ✅ | ✅ | ✅ 412 tests | ✅ | `4f68637` |
|
||||
| `learning_ai_invt_trdg` | ✅ | ⚠️ web lint backlog (24 pre-existing errors) | ✅ | ✅ | ✅ | `33c4bb0` |
|
||||
| `learning_ai_common_plat` | (not re-swept — was already healthy and out of scope; only the caddy compose change touched) | | | | | `c0db2901` |
|
||||
| `learning_ai_devops_tools` | (no full sweep — focus was the dashboard build/deploy) | | | | | none |
|
||||
|
||||
### `learning_ai_invt_trdg` web lint backlog
|
||||
|
||||
24 pre-existing eslint errors in `@bytelyst/trading-web`. Categories
|
||||
(detail in subagent report, summary here):
|
||||
|
||||
- `react-hooks/immutability` (function-before-declaration) in
|
||||
`AuthContext.tsx:118`, `goals/GoalsAnalytics.tsx:11`,
|
||||
`contexts/AccountContext.tsx:85`, `hooks/useGlobalTradingControl.ts:10`
|
||||
- `react-hooks/preserve-manual-memoization` in `AccountContext.tsx:89`
|
||||
- `react-hooks/static-components` in `views/ScreenerView.tsx:258`
|
||||
- `@typescript-eslint/triple-slash-reference` in `web/vite.config.ts:1`
|
||||
- `@typescript-eslint/no-unused-vars` across many `web/e2e/*.spec.ts`
|
||||
files (mostly auto-fixable)
|
||||
|
||||
Failing command:
|
||||
```bash
|
||||
cd /opt/bytelyst/learning_ai_invt_trdg
|
||||
pnpm --filter @bytelyst/trading-web lint
|
||||
```
|
||||
|
||||
Recommendation: dedicated cleanup PR — start with
|
||||
`pnpm --filter @bytelyst/trading-web lint -- --fix` to clear the unused-vars
|
||||
backlog, then refactor the hoisting / memoization / component-in-render
|
||||
issues per file.
|
||||
|
||||
## HOLD-repo sweep results
|
||||
|
||||
**Global root cause: every HOLD repo fails at `pnpm install`.** Two
|
||||
intertwined issues:
|
||||
|
||||
1. The HOLD repos depend on `@bytelyst/*` packages and reference the
|
||||
shared workspace at relative path `../learning_ai_common_plat/...`.
|
||||
That relative path resolves to `/opt/bytelyst/HOLD/learning_ai_common_plat`,
|
||||
which doesn't exist (the real common_plat is at
|
||||
`/opt/bytelyst/learning_ai_common_plat`).
|
||||
2. The Gitea NPM registry returns either `401 Unauthorized` (token
|
||||
expired) or `ERR_PNPM_TARBALL_INTEGRITY` for the few packages that
|
||||
are reachable. Some lockfiles also reference a different registry
|
||||
owner (`bytelyst/npm/...`) than current `.npmrc`
|
||||
(`learning_ai_user/npm/...`).
|
||||
|
||||
| Repo | Stack | Install error |
|
||||
|---|---|---|
|
||||
| `learning_ai_efforise` | pnpm/TS (vite, React 19) | `ERR_PNPM_FETCH_401 @bytelyst/testing` |
|
||||
| `learning_ai_fastgap` | pnpm/TS (RN/Expo + backend) | `ERR_PNPM_LINKED_PKG_DIR_NOT_FOUND ../learning_ai_common_plat/packages/react-native-platform-sdk` |
|
||||
| `learning_ai_jarvis_jr` | pnpm/TS (web + backend) | `ENOENT .docker-deps/bytelyst-auth-0.1.5.tgz` (offline tarballs not staged) |
|
||||
| `learning_ai_local_llms` | pnpm/TS (dashboard) | `ERR_PNPM_WORKSPACE_PKG_NOT_FOUND @bytelyst/design-tokens@workspace:*` |
|
||||
| `learning_ai_local_memory_gpt` | pnpm/TS | `ERR_PNPM_TARBALL_INTEGRITY` (registry owner mismatch) |
|
||||
| `learning_ai_peakpulse` | pnpm/TS (backend + ios) | `ERR_PNPM_TARBALL_INTEGRITY` |
|
||||
| `learning_ai_trails` | pnpm/TS (web + backend + sdk) | `ERR_PNPM_FETCH_401 @bytelyst/testing` |
|
||||
| `learning_multimodal_memory_agents` | pnpm/TS (backend + mindlyst) | `ERR_PNPM_TARBALL_INTEGRITY` |
|
||||
| `learning_voice_ai_agent` | pnpm/TS + Python | pnpm: same registry pattern. Python: `pyaudio` build fails (needs `apt install portaudio19-dev`). |
|
||||
|
||||
No commits made — per the user's "totally acceptable to move on"
|
||||
guidance and given these are environment/relocation issues not in-repo
|
||||
bugs.
|
||||
|
||||
### Recommended single fix to unblock most HOLD repos
|
||||
|
||||
```bash
|
||||
# 1. Make the workspace path resolve from HOLD/.
|
||||
ln -s /opt/bytelyst/learning_ai_common_plat /opt/bytelyst/HOLD/learning_ai_common_plat
|
||||
|
||||
# 2. Refresh the Gitea NPM token; export it.
|
||||
export GITEA_NPM_TOKEN=<new-token>
|
||||
|
||||
# 3. Decide on a single registry owner (current .npmrc says
|
||||
# `learning_ai_user`, several lockfiles still say `bytelyst`).
|
||||
# Either re-publish under learning_ai_user, or revert the .npmrc
|
||||
# template to `bytelyst` for these repos. Then in each repo:
|
||||
pnpm install
|
||||
```
|
||||
|
||||
After that, retry the per-repo sweep. None of these repos showed
|
||||
in-repo bugs — they just need the surrounding env restored.
|
||||
|
||||
## Files / state changes outside git
|
||||
|
||||
- `/tmp/devin-mint-pw.txt` — preserved (still needed for first-login
|
||||
password change). Delete after the user has rotated the password
|
||||
through the dashboard UI.
|
||||
- `/tmp/devin-mint-jwt.txt` — already absent.
|
||||
- The previous session's `dist/server.js` CORS hot-patch is gone (the
|
||||
new image's source has env-driven CORS; we verified
|
||||
`grep -c tailf85608 dist/server.js` returns `0`, and
|
||||
`grep -c EXTRA_CORS_ORIGINS dist/server.js` returns `3`).
|
||||
|
||||
## Live dashboard verification
|
||||
|
||||
| Check | Command | Result |
|
||||
|---|---|---|
|
||||
| Tailscale `/login` | `curl -fsS https://srv1491630.tailf85608.ts.net/login` | **200 ✅** |
|
||||
| Backend health | `curl http://localhost:4004/health` | **200 ✅** |
|
||||
| Login (real Cosmos) | `POST http://localhost:4003/api/auth/login` with admin creds | **200 + valid JWT ✅** |
|
||||
| CORS preflight | `OPTIONS http://localhost:4004/api/auth/login Origin: https://srv1491630.tailf85608.ts.net` | **204 + `access-control-allow-origin: https://srv1491630.tailf85608.ts.net` ✅** |
|
||||
| Backend image origin | `docker inspect devops-backend` | sha matches the freshly-built `dashboard-backend:latest` ✅ |
|
||||
| Hot-patch retired | `docker exec devops-backend grep -c tailf85608 dist/server.js` | **0 ✅** (env-driven, not source-hardcoded) |
|
||||
|
||||
## Open / suggested next actions
|
||||
|
||||
1. **User**: log in at <https://srv1491630.tailf85608.ts.net/login> with
|
||||
the mint password and rotate it via the UI. Then `rm /tmp/devin-mint-pw.txt`.
|
||||
2. **`learning_ai_invt_trdg`**: dedicated lint-cleanup PR — first
|
||||
`pnpm --filter @bytelyst/trading-web lint -- --fix`, then refactor
|
||||
the hoisting/memoization/component-in-render issues.
|
||||
3. **HOLD repos**: run the recommended single-fix above, then re-sweep.
|
||||
4. **Phase 5 P3** (still open from last session's mitigation roadmap):
|
||||
replace raw `docker.sock` mount with a verb-restricted daemon proxy.
|
||||
5. **Phase 4 / Phase 8** delegations (VM ops + Telegram bot) — still
|
||||
open; documented at `docs/prompts/phase4-bheem-uma-parity.md` and
|
||||
`docs/prompts/phase8-telegram-loop.md`.
|
||||
6. The `cosmos-emulator` orphan container can be `docker stop`-ed if
|
||||
you want a clean `docker ps` (not consumed by anything currently).
|
||||
|
||||
— end checkpoint —
|
||||
Loading…
Reference in New Issue
Block a user