Commit Graph

988 Commits

Author SHA1 Message Date
saravanakumardb1
fc12a8eaa2 feat(devops): add Local LLM Lab to ecosystem deployment
- docker-compose.ecosystem.yml: add llmlab-dashboard service (port 3075)
- setup.sh: add learning_ai_local_llms as 12th repo
- README.md: update to 31 services, 11 products, add Docker vs K8s recommendation
- docker/README.md: update port map, phase descriptions
- prompt.md: update repo list and service counts
2026-03-27 00:10:40 -07:00
saravanakumardb1
e3f638d609 docs(AGENTS): update __LOCAL_LLMs references — extracted to learning_ai_local_llms repo 2026-03-26 23:29:59 -07:00
saravanakumardb1
70fdc6b279 feat(devops): add Gitea CI (act_runner) to Azure VM setup
- Phase 2: install act_runner binary, register with Gitea, create systemd service
- Phase 3: push all 11 repos to VM Gitea after cloning from GitHub
- Expanded Gitea API token scopes (write:repository, write:user)
- Runner config: host mode, capacity 2, GITEA_NPM_TOKEN injected
- Enables CI on the VM for NETWORK!=corp usage
2026-03-26 23:19:37 -07:00
saravanakumardb1
aa139d5021 feat(ci): add auto-publish job for @bytelyst/* packages + update migration doc
- Add publish-packages job to CI workflow (runs after build-and-test)
- Publish 13 remaining packages to Gitea (56 total, up from 43)
- Update act_runner token to read+write scope
- Fix package counts throughout migration doc (43 → 56)
- Update CI status: all 10/10 repos now have CI workflows
- Add package inventory section (§15.1)
2026-03-26 23:18:05 -07:00
saravanakumardb1
409144a2ef chore(scripts): add lint-infra, typecheck-all, test-all cross-repo scripts 2026-03-26 23:15:16 -07:00
saravanakumardb1
f8c0da5c2a chore: retrigger CI after act_runner restart 2026-03-26 23:03:46 -07:00
saravanakumardb1
b6348fd4fe fix(security): harden npm publish — add .npmrc + publishConfig to all 57 packages
- Created .npmrc with @bytelyst scoped registry pointing to local Gitea
- Added publishConfig.registry to all 57 @bytelyst/* package.json files
- Created scripts/harden-publish-config.sh for future re-runs
- Prevents accidental publish to npmjs.org or corporate JFrog registry
2026-03-26 21:51:05 -07:00
saravanakumardb1
911539f228 fix(workflows): support all repos in agent doc sync 2026-03-24 16:10:41 -07:00
saravanakumardb1
e8d145a130 fix(workflows): normalize repo management coverage 2026-03-24 16:05:12 -07:00
saravanakumardb1
5ba9518722 docs: update Gitea registry docs for NETWORK-aware GITEA_NPM_HOST
- GITEA_NPM_REGISTRY_MIGRATION.md: update .npmrc examples, add home
  row to network topology table, note switch-network.sh sets the host
- SINGLE_VM_DEPLOYMENT.md: consolidate .npmrc example to show unified
  ${GITEA_NPM_HOST}:3300 pattern (host-side + Docker-side)
- GITEA_LOCAL_CI.md: add NPM registry host note to Key Settings
2026-03-24 15:57:20 -07:00
saravanakumardb1
f1793656f8 feat(scripts): make GITEA_NPM_HOST conditional on NETWORK
- NETWORK=corp → GITEA_NPM_HOST=localhost (local Gitea Docker)
- NETWORK=home → GITEA_NPM_HOST from ~/.gitea_vm_host (Azure VM)
- Fallback: localhost if ~/.gitea_vm_host doesn't exist

This enables all repo .npmrc files to use ${GITEA_NPM_HOST}:3300
instead of hardcoded localhost:3300, matching the existing
.npmrc.docker pattern used during Docker builds.
2026-03-24 15:45:59 -07:00
saravanakumardb1
6fbe8687ee fix(scripts): switch-network.sh — add NO_PROXY + GITEA_NPM_TOKEN management
- Add NO_PROXY/no_proxy/NPM_CONFIG_NOPROXY=localhost,127.0.0.1 when
  NETWORK=corp so local services (Gitea npm registry, Cosmos emulator,
  Azurite) bypass the corporate proxy. Previously NO_PROXY was only set
  in .zshrc line 5, making the script not self-contained.
- Add GITEA_NPM_TOKEN auto-load from ~/.gitea_npm_token file
  (regardless of NETWORK). Reads are public, but publish needs the
  token. This ensures local pnpm install resolves @bytelyst/* auth.
- Unset NO_PROXY/no_proxy/NPM_CONFIG_NOPROXY when NETWORK=home.
2026-03-24 15:36:46 -07:00
saravanakumardb1
32522b218a fix(k8s): setup-k8s.sh — fail phase 3 on build errors, fix non-root crash
- Phase 3 now exits with error if any image builds fail, preventing
  mark_phase_done from running. Previously it just warned and continued,
  which could lead to phase 5 deploying with missing images.
- Moved mkdir from top-level scope into mark_phase_done(). The old
  top-level mkdir -p /opt/bytelyst/.setup-state-k8s crashed non-root
  invocations (--status, --help) due to set -e + permission denied.
- Fixed header comment: 'containerd' → 'Docker runtime' (we use --docker).
- Added --resume to header usage block (was supported but undocumented).
2026-03-24 14:52:53 -07:00
saravanakumardb1
a25d6f7847 fix(k8s): remove YAML anchors that break across document separators
YAML anchors (&name/*name) are scoped per document. In multi-document
files (separated by ---), anchors defined in one document cannot be
referenced from another. This caused all backends/webs after the first
to fail kubectl apply with unknown alias errors.

Fixed by inlining envFrom, resources, and labels in every Deployment.
2026-03-24 14:51:48 -07:00
saravanakumardb1
8a568932b4 feat(infra): add production-grade k3s Kubernetes setup for single VM
Complete K8s deployment alternative to Docker Compose, targeting
~50 beta users on a Standard_D8s_v5 Azure VM (8 vCPU, 32 GB RAM).

setup-k8s.sh (6 phases):
  1. Pre-flight: verify docker phases 1-5 ran, disk/RAM checks
  2. Install k3s: Docker runtime, NodePort range 1024-32767
  3. Build images: docker compose build + tag as bytelyst/<svc>
  4. Config: namespaces, ConfigMap (3 copies), Secrets (JWT + blob keys), Ollama
  5. Deploy: infra -> platform -> dashboards -> products (ordered)
  6. Health check: 32 endpoints + kubectl pod status

K8s manifests (18 files):
  - 4 namespaces (infra, platform, dashboards, products)
  - 6 infra (cosmos StatefulSet+PVC, azurite StatefulSet+PVC,
    mailpit, loki StatefulSet+PVC, grafana+PVC, ollama external)
  - 3 platform (Deployment+Service+NodePort each)
  - 2 dashboards (Deployment+Service+NodePort each)
  - 10 backends + 9 webs (all with readiness+liveness probes,
    resource limits, product-specific NEXT_PUBLIC_* env vars)

Design decisions:
  - k3s --docker: reuses existing Docker images, no containerd import
  - Same ports as Docker Compose (NodePort with extended range)
  - ConfigMap replaces .env.ecosystem, copied to 3 app namespaces
  - Blob storage keys injected at deploy time via Secret (not in YAML)
  - Cross-namespace DNS: <svc>.<ns>.svc for service discovery
  - Ollama as Endpoints+Service pointing to host node IP
  - Resource limits: ~19 Gi total, fits in 32 GB with 13 GB headroom
  - Teardown: --teardown flag deletes namespaces, keeps k3s
2026-03-24 14:47:17 -07:00
saravanakumardb1
7d0c469858 refactor(infra): reorganize single_azure_vm into docker/ and k8s/ subfolders
- Move setup.sh, README.md, prompt.md into docker/ subfolder
- Create top-level README.md comparing both approaches
- Create k8s/README.md with full design doc: k3s architecture,
  namespace strategy, manifest structure, ConfigMap/Secret design,
  Cosmos emulator StatefulSet, Ollama host service, resource limits,
  5-phase implementation plan, and kubectl cheat sheet
2026-03-24 14:11:50 -07:00
saravanakumardb1
3b2d6391b9 fix(compose): add FlowMonk NEXT_PUBLIC_API_URL now that product-config supports it 2026-03-24 13:52:55 -07:00
saravanakumardb1
40731e06f4 docs(infra): update prompt.md with 15 new bug fixes and stale corrections
- Added 15 recent fixes to the Bugs Already Fixed table
- Fixed line count (~940 → ~990)
- Fixed stale lysnrai-web → lysnrai-dashboard in architecture diagram
- Fixed test plan service count (27+ → 30+)
- Updated constraint: compose/Dockerfile changes allowed with verification
2026-03-24 13:49:17 -07:00
saravanakumardb1
d64ea4fba7 fix(infra): add cd path to banner compose logs command
The banner showed bare COMPOSE_FILE filename without the directory,
making the command unusable via copy-paste. Now shows the cd first.
2026-03-24 13:48:05 -07:00
saravanakumardb1
01f2276aa8 fix(compose): correct NEXT_PUBLIC_* env var names per product code
Each product web app reads different env var names in product-config.ts.
The compose file was using generic NEXT_PUBLIC_BACKEND_URL and
NEXT_PUBLIC_PLATFORM_URL for all 9 web services, but most products
use different names. This caused SSR (server-side rendering) to miss
the correct backend/platform URLs.

Corrected per product:
- lysnrai-dashboard: PLATFORM_SERVICE_URL (server-side, not NEXT_PUBLIC)
- chronomind-web: NEXT_PUBLIC_BACKEND_URL + NEXT_PUBLIC_PLATFORM_SERVICE_URL
- jarvisjr-web: NEXT_PUBLIC_PLATFORM_SERVICE_URL (no backend client)
- flowmonk-web: NEXT_PUBLIC_PLATFORM_URL (backend is hardcoded)
- notelett-web: NEXT_PUBLIC_NOTES_API_URL + NEXT_PUBLIC_PLATFORM_SERVICE_URL
- mindlyst-web: NEXT_PUBLIC_PLATFORM_SERVICE_URL
- nomgap-web: NEXT_PUBLIC_NOMGAP_API_URL + NEXT_PUBLIC_PLATFORM_SERVICE_URL
- actiontrail-web: NEXT_PUBLIC_API_URL + NEXT_PUBLIC_PLATFORM_URL
- localmemgpt-web: already correct (unchanged)
2026-03-24 13:47:13 -07:00
saravanakumardb1
e928ec6025 fix(infra): audit round 2 — token guard, frozen-lockfile, build cache, docs
- Add require_gitea_token() guard — fail early with actionable message
  if GITEA_NPM_TOKEN is empty after restore (prevents silent failures
  in Phase 4/5/7)
- Wire require_gitea_token() into phase4_build and setup_compose_env
- Remove --frozen-lockfile from admin-web + tracker-web Dockerfiles
  (Docker context is missing services/ and scripts/ workspace members;
  Phase 4 reconciles lockfile so --frozen-lockfile is unnecessary)
- Add docker builder prune after Phase 7 builds (reclaim 20-40 GB)
- Update README: pre-flight thresholds, Ollama stop/restart behavior,
  Loki + Azurite in port map, updated memory pressure note
2026-03-24 13:37:21 -07:00
saravanakumardb1
1a8697d8ed fix(infra): fix last stale service count comment (27→30) in setup.sh 2026-03-24 13:18:12 -07:00
saravanakumardb1
f78d382d62 fix(infra): add Azurite + Loki to health check script
- Azurite blob storage (:10000) was missing from check-health.sh
- Loki log aggregation (:3100/ready) was missing from check-health.sh
- Now covers all 30 compose services + Gitea + Ollama = 32 endpoints
2026-03-24 13:08:12 -07:00
saravanakumardb1
1a1f7dd55c fix(infra): harden setup.sh — pre-flight checks, pipefail safety, RAM management
- Add pre-flight disk space + memory checks after root validation
- Add --batch --yes to gpg dearmor calls (idempotent on re-run)
- Fix jq abort on malformed Gitea token response (|| echo guard)
- Wrap pnpm install/build in if-blocks with explicit fail() messages
- Stop Ollama during Phase 7 Docker builds to free ~3 GB RAM
- Restart Ollama after Phase 7 builds complete (before Phase 8 health check)
2026-03-24 13:06:05 -07:00
saravanakumardb1
c2ca7f53b4 fix(infra): harden setup.sh from independent audit findings
- Replace deprecated NodeSource curl|bash with modern GPG key + apt source
- Add build-essential + python3 to apt deps (native addons: better-sqlite3)
- Add --if-present to pnpm -r build (defensive: skip workspace members without build script)
- Fix README: remove stale proxy stripping reference from Phase 3
- Add Known Limitations section: remote browser access, ARM VM, memory pressure
- Remove AUDIT_PROMPT.md (served its purpose)
2026-03-24 12:56:43 -07:00
saravanakumardb1
35021b67b9 docs(infra): fix stale service count (27→30), update prompt.md + README.md for Codex agent readiness
- prompt.md: mark tasks 1-3 as DONE, add 'Current State' section listing
  all implemented features, update bugs-fixed table (16 items), fix service
  count in architecture diagram, add CLI reference, remove stale --frozen-lockfile
- README.md: add Resume & Retry section with examples, add CLI Flags table,
  fix service count in title/phases, update build failure troubleshooting
  with build log paths and retry command
- setup.sh: fix '27 services' → '30 services' in header comment and banner
2026-03-24 12:35:59 -07:00
saravanakumardb1
acbab75aaa docs(infra): add complete CLI reference, examples, and phase docs to setup.sh + ECOSYSTEM_DEPLOYMENT.md
setup.sh header now includes:
- All 6 CLI flags (--resume, --resume-from, --phase, --reset, --status, --help)
- Phase descriptions (1-8)
- 6 usage examples (fresh install, retry, resume, jump, status, reset)
- Resume/retry explanation with state dir and build log paths

ECOSYSTEM_DEPLOYMENT.md now includes:
- Single-VM Bootstrap section with quick start
- Resume & Retry examples
- Phase table
- Per-service build & fallback explanation
- Health check script reference
2026-03-24 12:24:16 -07:00
saravanakumardb1
b634708da8 fix(infra): make ollama model pull non-fatal in setup.sh
ollama pull piped through tail with set -euo pipefail would abort the
entire 8-phase setup on a slow network or wrong model name. Only
LocalMemGPT needs the model — the other 9 products are unaffected.
2026-03-24 12:20:13 -07:00
saravanakumardb1
a3f4c6facf fix(infra): fix sequential phase gap + add phase 7 guards
1. last_completed_phase now stops at first gap — prevents --resume from
   skipping phases when --phase=N created non-sequential markers
2. Phase 7 fails early if .env.ecosystem is missing (points to --phase=6)
3. Warns if compose config JSON cache fails — graceful degradation
2026-03-24 12:17:45 -07:00
saravanakumardb1
a9414218ba fix(infra): fix 5 bugs in setup.sh per-service build + resume logic
1. set -e + pipefail: docker compose up piped through tail would abort
   script on partial startup failure before printing summary — add || true
2. Phase 7 marked done even with build failures, so --resume would skip
   it — now only marks done when all builds succeed
3. --phase=7 printed 'Phase 7 complete' even with failures — now exits
   with code 1 and points to build logs
4. docker compose config --format json called 30 times in build loop —
   now cached once (saves ~3s)
5. Build logs now saved per-service to STATE_DIR/builds/<svc>.log for
   post-failure debugging
2026-03-24 12:13:14 -07:00
saravanakumardb1
8ff9e42817 feat(infra): add resume/retry, per-service build, and fallback to setup.sh
- --resume: auto-detect last completed phase and continue from there
- --resume-from=N: resume from a specific phase
- --phase=N: run only one phase (e.g. --phase=7 to retry deploy)
- --reset: clear phase markers and start fresh
- --status: show completed phases
- Phase 7 now builds each of 27 services individually with progress
- Failed builds are skipped; remaining services still start
- Phase completion markers stored in /opt/bytelyst/.setup-state/
- GITEA_NPM_TOKEN auto-restored from saved state on resume
2026-03-24 12:03:55 -07:00
saravanakumardb1
c0bc13e10a fix(infra): improve setup.sh publish error handling — distinguish real failures from 409 conflicts 2026-03-24 11:56:26 -07:00
saravanakumardb1
85aca5534b fix(docker): sync all 3 service Dockerfiles with complete workspace package.json list
platform-service had 16/60, extraction-service had 14/60, mcp-server had 34/60.
All three now list all 57 packages + 4 services + 2 dashboards + scripts.
Required for pnpm install --frozen-lockfile to resolve the full workspace.
2026-03-24 11:55:47 -07:00
saravanakumardb1
52b424937a refactor(infra): remove proxy-stripping sed from setup.sh — Dockerfiles are clean at source 2026-03-24 11:17:02 -07:00
saravanakumardb1
c8a196de58 docs(infra): add bugs-already-fixed section to Codex handoff prompt 2026-03-24 11:04:11 -07:00
saravanakumardb1
ddd2db848e fix(infra): 6 bugs in setup.sh — jfrog sed, apt source, token fallback, log file 2026-03-24 11:02:16 -07:00
saravanakumardb1
6abf13d983 docs(infra): add Codex agent handoff prompt for VM setup 2026-03-24 10:53:20 -07:00
saravanakumardb1
7c34cee0ab feat(infra): install Ollama + full raw-VM bootstrap in setup.sh 2026-03-24 10:47:20 -07:00
saravanakumardb1
2b9fd71740 fix(docker): make proxy optional in dashboard Dockerfiles, strip proxy in VM setup 2026-03-24 10:35:00 -07:00
saravanakumardb1
3b31709b47 fix(infra): add extra_hosts for Linux, improve env example docs, harden setup.sh 2026-03-24 10:26:47 -07:00
saravanakumardb1
2458a9d3b0 feat(infra): add single Azure VM bootstrap script + README 2026-03-24 10:14:16 -07:00
saravanakumardb1
eac1ba3faf fix(dashboards): upgrade Dockerfiles from node:20 to node:22-alpine 2026-03-24 10:05:11 -07:00
saravanakumardb1
25a1bd5187 fix(infra): add BuildKit secrets + GITEA_NPM_HOST to ecosystem compose 2026-03-24 10:02:40 -07:00
saravanakumardb1
3a840572bf chore(infra): add .env.ecosystem.example for ecosystem compose 2026-03-24 09:08:30 -07:00
saravanakumardb1
ed17f52d14 chore(infra): add .env.ecosystem.example for ecosystem compose 2026-03-24 09:04:27 -07:00
saravanakumardb1
c93cc491ff feat(infra): add docker-compose.ecosystem.yml for full ByteLyst stack 2026-03-24 09:01:26 -07:00
saravanakumardb1
d466b8a7c4 docs: clean stale sections in GITEA_NPM_REGISTRY_MIGRATION.md 2026-03-24 08:44:29 -07:00
saravanakumardb1
4d050696c1 fix(dashboards): add typecheck script, remove stale engines.node 20.x 2026-03-24 08:39:58 -07:00
saravanakumardb1
19a1fd8aa2 docs(pnpm): add MindLyst to migration tracker, update Gitea registry status
- Add learning_multimodal_memory_agents to Wave 3 (commit e0461c7)
- Replace stale Follow-up Validation section with completed Gitea registry status
- Update Summary: all 10 product repos + common-plat on pnpm with Gitea registry
2026-03-24 08:28:11 -07:00
saravanakumardb1
fee5e87052 docs: remove versioning refs and stale transition language from deployment docs
- Remove 'Supersedes' and 'What Changed' section from enhanced plan
- Rewrite Package-Manager Strategy (transition complete, all repos on pnpm)
- Remove docker-prep.sh prerequisites, .tarballs/ references, npm variants
- Replace Dockerfile templates with current Gitea registry-backed pattern
- Remove §11.1 Package-Manager Migration Roadmap (migration complete)
- Clean up §11.2 Gitea section (remove 'Current pain', comparison table)
- Clean up §12 audit findings (remove tarball references)
- Simplify §10 Dockerization table (remove transition columns)
- Update §5.1/5.2 to reflect validated state, not open gaps
- Fix v2 tag in K3s exercise to use semver 1.1.0
- Update Summary table with current state
2026-03-24 08:10:17 -07:00