docs(devops): capture azure vm and scaling readiness gaps

This commit is contained in:
saravanakumardb1 2026-03-23 16:10:02 -07:00
parent 661bc9953a
commit fa1adf829c
2 changed files with 143 additions and 0 deletions

View File

@ -182,6 +182,43 @@ That blocker must be resolved before we can claim full local E2E completion.
---
## 5.2 Remaining Gaps From Local Mac Validation
The local rehearsal on this Mac has proven enough to validate the **host-side** registry model, but it has **not** yet proven the full VM-ready or K8s-ready path.
### Gaps still open on this Mac
- Docker / BuildKit package installs from the local Homebrew Gitea instance are still not green
- the Homebrew Gitea `ROOT_URL` / package tarball URL behavior is still fragile across host and Docker boundaries
- local Gitea Actions has not yet been validated end-to-end for the package build → publish → consumer flow
- the FlowMonk pilot is currently validated only for backend/web host-side consumption; Docker is still blocked and mobile remains outside the registry-backed pilot slice
- not all `@bytelyst/*` packages have been revalidated under the local Gitea registry with a fresh consumer install after the `pnpm pack` publish flow changes
### Why this matters for Azure
Azure should be a replication step, not a redesign step.
That means the Azure VM rollout should wait until the remaining local gaps are cleared for:
- package metadata correctness
- tarball URL correctness
- Docker consumer correctness
- local CI correctness
If we skip those validations locally, the first Azure VM trial becomes a debugging environment instead of a deployment environment.
### Concrete local-exit criteria before Azure
Before starting Azure VM rollout, we should be able to demonstrate all of the following on this Mac:
- one Gitea deployment shape whose package URLs work for both host installs and Docker builds
- one publish path for `@bytelyst/*` packages that works repeatably with `pnpm pack`
- one pilot repo that installs from the registry on the host and inside Docker without fallback tarballs
- one local Gitea Actions path that can build/publish/install with the same registry assumptions
- one documented rollback path that cleanly returns a pilot repo to tarball-based Docker consumption if needed
---
## 6. Migration Strategy
### Stage A — Local registry rehearsal
@ -463,6 +500,34 @@ After the local rehearsal is green, Azure should follow the same validated recip
Azure should be a replication step, not a redesign step.
### What Azure still needs beyond current local proof
Even though the codebase is already designed to be highly configurable, Azure VM rollout still requires a few validations that have not been fully cleared by the local Mac rehearsal yet:
- Gitea running in a deployment shape where package tarball URLs are stable for both host consumers and containerized consumers
- registry-backed Docker builds for the pilot repo without `docker-prep.sh`
- local Gitea Actions or equivalent host-runner proof for package build/publish/install
- a final decision on whether mobile-facing `@bytelyst/*` packages stay on local `file:` links during the pilot phase or join the registry migration in a later wave
- one Azure-ready secrets model for Gitea token handling, service envs, and registry auth that maps cleanly from local env vars to VM secrets/config files
### What should require minimal change when moving to Azure
The good news is that most of the ecosystem has already been implemented in a way that keeps scale-up mostly configuration-driven once the registry and image flows are proven.
Expected low-change areas:
- service code already externalizes most environment-specific values via env/config
- Compose and K8s models already map cleanly to Deployments, Services, Ingress, ConfigMaps, and Secrets
- Traefik, readiness probes, service ports, and namespace separation are already documented in a K8s-friendly way
- scaling stateless services should mostly mean changing replica counts, resource requests/limits, and HPA settings
- moving from single-node K3s to multi-node K3s or managed Kubernetes should mostly reuse the same manifests with infra-level adjustments
What is **not** yet proven enough to call low-change:
- the package-distribution layer for Docker/K8s image builds
- the exact image build/publish flow for the full ecosystem after registry migration
- the complete repo-by-repo removal of tarball-based Docker prep
---
## 14. Definition Of Done

View File

@ -669,6 +669,84 @@ spec:
---
## 5.1 Scaling-Readiness Reality Check
The ecosystem code and infra design are already **mostly** aligned for a low-effort scaling path.
That does **not** mean the whole end-to-end deployment path is proven yet.
### What is already designed for low-change scaling
- service identity, ports, and inter-service URLs are already env/config driven in most backends and dashboards
- Compose service boundaries already map cleanly to Kubernetes Deployments + Services
- Traefik-based routing already maps cleanly to Kubernetes Ingress resources
- resource requests/limits and replica counts are already represented in a K8s-friendly way in the documented manifest examples
- the namespace split (`infra`, `platform`, `products`, `web`) already gives a usable organizational model for K8s
- moving from 1 replica to N replicas for stateless services should mostly be infra config work, not application rewrites
- moving from Docker Desktop K8s to K3s or later managed K8s should mostly reuse the same manifest model
### What is not yet proven enough to call low-change
- the shared package distribution path for container builds after the Gitea npm registry migration
- the pilot Docker build path from the local Gitea registry
- the final removal of `docker-prep.sh` / tarball fallback across the wider ecosystem
- the complete image build/publish/deploy workflow for all repos under one consistent registry strategy
### Practical interpretation
If we solve the package registry + image build path cleanly, then scaling the running system should mostly mean:
- increasing replica counts
- applying HPAs where justified
- adjusting resource requests and limits
- moving from local-path storage to stronger persistent storage where needed
- adding worker nodes or moving to a managed cluster
That is the sense in which the ecosystem is already mostly configurable and scale-friendly.
The remaining work is concentrated more in the **build and package-distribution layer** than in the application service code.
---
## 5.2 Remaining Gaps Before Azure VM / K3s Rollout
### Local Mac rehearsal gaps still open
- local Homebrew Gitea package metadata and tarball URLs are not yet reliable across both host installs and Docker builds
- pilot Docker builds from the local Gitea registry are still blocked
- local Gitea Actions has not yet been fully validated for the package publish + consumer install path
- the pilot registry migration is only validated for FlowMonk backend/web host-side usage, not end-to-end across Docker and all surfaces
### Azure VM readiness gaps
Before calling the Azure VM rollout low-risk, we still need one validated answer for each of these:
- where Gitea runs and what `ROOT_URL` / package URL shape it advertises
- how Docker image builds authenticate to and consume the package registry
- whether package builds happen on the VM directly, inside CI runners, or via a dedicated package-publish pipeline
- how image publishing is handled for K3s / later multi-node rollout
- which repos are fully registry-backed versus still temporarily using tarball fallback during migration
### K3s / scaling readiness gaps
The application architecture is largely ready for scale-by-configuration, but these operational items still need proof or stronger artifacts:
- concrete K8s manifests or Helm values for the full ecosystem, not just examples
- image registry strategy for K3s and later multi-node or managed Kubernetes rollout
- persistent volume strategy for Gitea, Grafana, Loki, Azurite, and any stateful product workloads
- autoscaling thresholds and resource defaults validated under representative load
- deployment ordering and health-check gating for infra → platform → products → web surfaces
### Recommended order to close these gaps
1. clear the local Gitea package URL and Docker build problem
2. validate one full pilot path: package publish → Docker build → runtime start
3. validate the same path in local Gitea Actions
4. choose the image registry / image distribution pattern for Azure VM and K3s
5. generate and validate concrete K8s/Helm artifacts for the platform + one pilot product first
---
## 6. K3s Practice Exercises (on single VM)
These exercises simulate real production scenarios: