Commit Graph

18 Commits

Author SHA1 Message Date
saravanakumardb1
f78d382d62 fix(infra): add Azurite + Loki to health check script
- Azurite blob storage (:10000) was missing from check-health.sh
- Loki log aggregation (:3100/ready) was missing from check-health.sh
- Now covers all 30 compose services + Gitea + Ollama = 32 endpoints
2026-03-24 13:08:12 -07:00
saravanakumardb1
1a1f7dd55c fix(infra): harden setup.sh — pre-flight checks, pipefail safety, RAM management
- Add pre-flight disk space + memory checks after root validation
- Add --batch --yes to gpg dearmor calls (idempotent on re-run)
- Fix jq abort on malformed Gitea token response (|| echo guard)
- Wrap pnpm install/build in if-blocks with explicit fail() messages
- Stop Ollama during Phase 7 Docker builds to free ~3 GB RAM
- Restart Ollama after Phase 7 builds complete (before Phase 8 health check)
2026-03-24 13:06:05 -07:00
saravanakumardb1
c2ca7f53b4 fix(infra): harden setup.sh from independent audit findings
- Replace deprecated NodeSource curl|bash with modern GPG key + apt source
- Add build-essential + python3 to apt deps (native addons: better-sqlite3)
- Add --if-present to pnpm -r build (defensive: skip workspace members without build script)
- Fix README: remove stale proxy stripping reference from Phase 3
- Add Known Limitations section: remote browser access, ARM VM, memory pressure
- Remove AUDIT_PROMPT.md (served its purpose)
2026-03-24 12:56:43 -07:00
saravanakumardb1
35021b67b9 docs(infra): fix stale service count (27→30), update prompt.md + README.md for Codex agent readiness
- prompt.md: mark tasks 1-3 as DONE, add 'Current State' section listing
  all implemented features, update bugs-fixed table (16 items), fix service
  count in architecture diagram, add CLI reference, remove stale --frozen-lockfile
- README.md: add Resume & Retry section with examples, add CLI Flags table,
  fix service count in title/phases, update build failure troubleshooting
  with build log paths and retry command
- setup.sh: fix '27 services' → '30 services' in header comment and banner
2026-03-24 12:35:59 -07:00
saravanakumardb1
acbab75aaa docs(infra): add complete CLI reference, examples, and phase docs to setup.sh + ECOSYSTEM_DEPLOYMENT.md
setup.sh header now includes:
- All 6 CLI flags (--resume, --resume-from, --phase, --reset, --status, --help)
- Phase descriptions (1-8)
- 6 usage examples (fresh install, retry, resume, jump, status, reset)
- Resume/retry explanation with state dir and build log paths

ECOSYSTEM_DEPLOYMENT.md now includes:
- Single-VM Bootstrap section with quick start
- Resume & Retry examples
- Phase table
- Per-service build & fallback explanation
- Health check script reference
2026-03-24 12:24:16 -07:00
saravanakumardb1
b634708da8 fix(infra): make ollama model pull non-fatal in setup.sh
ollama pull piped through tail with set -euo pipefail would abort the
entire 8-phase setup on a slow network or wrong model name. Only
LocalMemGPT needs the model — the other 9 products are unaffected.
2026-03-24 12:20:13 -07:00
saravanakumardb1
a3f4c6facf fix(infra): fix sequential phase gap + add phase 7 guards
1. last_completed_phase now stops at first gap — prevents --resume from
   skipping phases when --phase=N created non-sequential markers
2. Phase 7 fails early if .env.ecosystem is missing (points to --phase=6)
3. Warns if compose config JSON cache fails — graceful degradation
2026-03-24 12:17:45 -07:00
saravanakumardb1
a9414218ba fix(infra): fix 5 bugs in setup.sh per-service build + resume logic
1. set -e + pipefail: docker compose up piped through tail would abort
   script on partial startup failure before printing summary — add || true
2. Phase 7 marked done even with build failures, so --resume would skip
   it — now only marks done when all builds succeed
3. --phase=7 printed 'Phase 7 complete' even with failures — now exits
   with code 1 and points to build logs
4. docker compose config --format json called 30 times in build loop —
   now cached once (saves ~3s)
5. Build logs now saved per-service to STATE_DIR/builds/<svc>.log for
   post-failure debugging
2026-03-24 12:13:14 -07:00
saravanakumardb1
8ff9e42817 feat(infra): add resume/retry, per-service build, and fallback to setup.sh
- --resume: auto-detect last completed phase and continue from there
- --resume-from=N: resume from a specific phase
- --phase=N: run only one phase (e.g. --phase=7 to retry deploy)
- --reset: clear phase markers and start fresh
- --status: show completed phases
- Phase 7 now builds each of 27 services individually with progress
- Failed builds are skipped; remaining services still start
- Phase completion markers stored in /opt/bytelyst/.setup-state/
- GITEA_NPM_TOKEN auto-restored from saved state on resume
2026-03-24 12:03:55 -07:00
saravanakumardb1
c0bc13e10a fix(infra): improve setup.sh publish error handling — distinguish real failures from 409 conflicts 2026-03-24 11:56:26 -07:00
saravanakumardb1
52b424937a refactor(infra): remove proxy-stripping sed from setup.sh — Dockerfiles are clean at source 2026-03-24 11:17:02 -07:00
saravanakumardb1
c8a196de58 docs(infra): add bugs-already-fixed section to Codex handoff prompt 2026-03-24 11:04:11 -07:00
saravanakumardb1
ddd2db848e fix(infra): 6 bugs in setup.sh — jfrog sed, apt source, token fallback, log file 2026-03-24 11:02:16 -07:00
saravanakumardb1
6abf13d983 docs(infra): add Codex agent handoff prompt for VM setup 2026-03-24 10:53:20 -07:00
saravanakumardb1
7c34cee0ab feat(infra): install Ollama + full raw-VM bootstrap in setup.sh 2026-03-24 10:47:20 -07:00
saravanakumardb1
2b9fd71740 fix(docker): make proxy optional in dashboard Dockerfiles, strip proxy in VM setup 2026-03-24 10:35:00 -07:00
saravanakumardb1
3b31709b47 fix(infra): add extra_hosts for Linux, improve env example docs, harden setup.sh 2026-03-24 10:26:47 -07:00
saravanakumardb1
2458a9d3b0 feat(infra): add single Azure VM bootstrap script + README 2026-03-24 10:14:16 -07:00