docs: record VM container health fix
Some checks failed
pre-commit / pre-commit (push) Failing after 33s
Some checks failed
pre-commit / pre-commit (push) Failing after 33s
This commit is contained in:
parent
e2db92f3b1
commit
5a2d92f519
@ -64,7 +64,7 @@ These listeners were bound on `0.0.0.0` and/or `[::]` during review.
|
||||
| `3040` | `flowmonk-web` | `/opt/bytelyst/learning_ai_common_plat/docker-compose.ecosystem.yml` | none found in Caddy | `needs-decision` | Unhealthy; classify as private/admin or retire |
|
||||
| `3049` | `devops-web` | `/opt/bytelyst/bytelyst-devops-tools/dashboard/docker-compose.yml` | `devops.bytelyst.com` | `private-admin` with direct bypass | Fix old repo path drift, then bind loopback/private |
|
||||
| `3050` | `mindlyst-web` | `/opt/bytelyst/learning_ai_common_plat/docker-compose.ecosystem.yml` | none found in Caddy | `needs-decision` | Unhealthy; classify as private/admin or retire |
|
||||
| `3055` | `nomgap-web` | `/opt/bytelyst/learning_ai_common_plat/docker-compose.ecosystem.yml` | none found in Caddy | `needs-decision` | Unhealthy; classify as private/admin or retire |
|
||||
| `3055` | `nomgap-web` | orphan from older `/opt/bytelyst/learning_ai_common_plat/docker-compose.ecosystem.yml` | none found in Caddy | `retire` | Retired on 2026-05-27; current Compose says Nomgap web is deployed to Vercel |
|
||||
| `3060` | `actiontrail-web` | `/opt/bytelyst/learning_ai_common_plat/docker-compose.ecosystem.yml` | none found in Caddy | `needs-decision` | Unhealthy; classify as private/admin or retire |
|
||||
| `3070` | `localmemgpt-web` | `/opt/bytelyst/learning_ai_common_plat/docker-compose.ecosystem.yml` | none found in Caddy | `needs-decision` | Unhealthy; classify as private/admin or retire |
|
||||
| `3075` | `llmlab-dashboard` | `/opt/bytelyst/learning_ai_common_plat/docker-compose.ecosystem.yml` | `llmlab.bytelyst.com` | `private-admin` with direct bypass | Dashboard unhealthy; gate or retire |
|
||||
@ -113,6 +113,7 @@ These listeners were bound on `0.0.0.0` and/or `[::]` during review.
|
||||
|
||||
## Drift / Follow-Up Findings
|
||||
|
||||
- `nomgap-web` was an orphan from an older Compose revision, had no Caddy route, and was retired on 2026-05-27.
|
||||
- `devops-backend` runs from `/opt/bytelyst/learning_ai_devops_tools/dashboard/docker-compose.yml`.
|
||||
- `devops-web` runs from `/opt/bytelyst/bytelyst-devops-tools/dashboard/docker-compose.yml`, an older path. Align this before changing devops dashboard port bindings.
|
||||
- `gitea-npm-registry` has no Compose labels in Docker inspect output. Find its systemd/compose owner before changing `3300`.
|
||||
|
||||
@ -397,7 +397,7 @@ Effective `sshd -T` settings showed:
|
||||
|
||||
### Phase 2 — Operational correctness
|
||||
|
||||
- [ ] Fix/retire unhealthy containers.
|
||||
- [x] Fix/retire unhealthy containers.
|
||||
- [x] Resolve `hermes-root-backup.service` failed state.
|
||||
- [x] Decide and document Gitea runner active/disabled state.
|
||||
- [x] Add missing-script checks. Stale root cron path was fixed on 2026-05-27.
|
||||
@ -515,6 +515,31 @@ Minimum post-checks for Phase 1:
|
||||
|
||||
- The detector currently covers root crontab and failed systemd units. Full ownership inventory still needs `/etc/cron.d`, user crontabs, Hermes cron, Gitea schedules, owners, outputs, and alert channels.
|
||||
|
||||
### 2026-05-27 — Phase 2 unhealthy containers
|
||||
|
||||
**Changed:**
|
||||
|
||||
- Added `HOSTNAME=0.0.0.0` to six managed Next.js web services in `/opt/bytelyst/learning_ai_common_plat/docker-compose.ecosystem.yml`: `jarvisjr-web`, `flowmonk-web`, `mindlyst-web`, `actiontrail-web`, `localmemgpt-web`, and `llmlab-dashboard`.
|
||||
- Recreated those six services from existing images with `docker compose ... up -d --no-build`.
|
||||
- Retired the orphan `learning_ai_common_plat-nomgap-web-1` container. Current Compose already documents `nomgap-web` as deployed to Vercel and not part of the Docker stack.
|
||||
|
||||
**Verified:**
|
||||
|
||||
- `docker compose -f docker-compose.ecosystem.yml --env-file .env.ecosystem config --quiet` passed.
|
||||
- The six recreated web containers report Docker health `healthy`.
|
||||
- `docker ps --filter health=unhealthy` returns no containers.
|
||||
- Host-level smoke checks returned HTTP `200` for `3035`, `3040`, `3050`, `3060`, `3070`, and `3075`; retired orphan port `3055` is closed.
|
||||
- Host-permission `vm-health-check.sh --json` reports `container_health=OK`, `container_loops=OK`, `failed_units=OK`, and `cron_missing_paths=OK`.
|
||||
|
||||
**Committed/pushed:**
|
||||
|
||||
- `learning_ai_common_plat`: `af035e7d` (`fix: bind ecosystem Next apps on all interfaces`) pushed to GitHub.
|
||||
|
||||
**Residual risk:**
|
||||
|
||||
- Local Gitea mirror push for `learning_ai_common_plat` failed at Git HTTP transport even though fetch and health checks work; retry/fix mirror push separately.
|
||||
- This fixed health state, not public exposure. Several direct published ports remain to be loopback-bound or blocked in Phase 1.
|
||||
|
||||
## Do Not Start With
|
||||
|
||||
- Rootless Docker migration.
|
||||
|
||||
Loading…
Reference in New Issue
Block a user