diff --git a/docs/runbooks/GITEA_VM_SETUP.md b/docs/runbooks/GITEA_VM_SETUP.md new file mode 100644 index 00000000..1b2996ea --- /dev/null +++ b/docs/runbooks/GITEA_VM_SETUP.md @@ -0,0 +1,405 @@ +# Gitea Cloud VM Setup — Runbook + +> **Status:** Active runbook · **Last verified:** 2026-05-27 +> **Use this when:** You have provisioned a cloud VM (Azure / wherever), Gitea +> is installed and running on `:3300`, repos are cloned, and you need to wire +> the npm registry end-to-end with your laptop. + +Assumes you're SSH'd into the VM (or running commands on the VM) and the +sibling `learning_ai_common_plat` repo is at `~/code/mygh/learning_ai_common_plat/` +on both the VM and your laptop. Adjust paths as needed. + +--- + +## Prerequisites checklist + +Before starting, confirm all of these on the **VM**: + +```bash +# 1. Gitea container is running and healthy +sudo docker ps | grep gitea +curl -fsS http://localhost:3300/api/v1/version +# Expected: {"version":"1.X.X"} + +# 2. Port 3300 is reachable from your laptop +# (run this FROM YOUR LAPTOP, not the VM) +# curl -fsS http://:3300/api/v1/version + +# 3. Repos cloned on the VM +ls ~/code/mygh/learning_ai_common_plat +# Expected: packages/ services/ scripts/ ... +``` + +If any of these fail, fix them first. Common gotchas: + +- **Port 3300 blocked:** Azure NSG → VM → Networking → "Add inbound port rule" → TCP 3300 from your home IP +- **Gitea registry disabled:** Edit `app.ini` inside `/var/lib/gitea/conf/`, add `[packages]\nENABLED = true`, then `sudo docker restart gitea` +- **`hostname` resolves to `localhost`** in the VM but you reach it via public DNS — note both, the API only needs `localhost` from inside the VM + +--- + +## Step 1 — Create Gitea admin user (skip if you already have one) + +Run **on the VM**: + +```bash +# Check if any admin exists +sudo docker exec gitea gitea admin user list + +# If empty (or you don't have admin creds), create one: +ADMIN_USER="gitea-admin" +ADMIN_PASS="$(openssl rand -base64 24 | tr -dc 'A-Za-z0-9' | head -c 24)" +echo " Admin password (save this!): $ADMIN_PASS" + +sudo docker exec gitea gitea admin user create \ + --username "$ADMIN_USER" \ + --password "$ADMIN_PASS" \ + --email "admin@bytelyst.local" \ + --admin \ + --must-change-password=false +``` + +**SAVE the admin password somewhere safe** (1Password / Bitwarden / macOS Keychain). +You'll need it only for token rotation and bootstrapping; day-to-day work uses the npm token. + +Optionally store in macOS Keychain on your laptop: + +```bash +# Run on your LAPTOP (Mac) +security add-generic-password \ + -s 'gitea-admin' \ + -a 'gitea-admin' \ + -w '' +``` + +Then `scripts/gitea/token.sh rotate` can auto-discover it later. + +--- + +## Step 2 — Create the npm owner user + +The npm registry is namespaced by owner. The canonical owner is `learning_ai_user`. + +Run **on the VM**: + +```bash +ADMIN_USER="gitea-admin" +ADMIN_PASS="" +NPM_USER="learning_ai_user" +NPM_PASS="$(openssl rand -base64 24 | tr -dc 'A-Za-z0-9' | head -c 24)" + +# Create the user via admin API +curl -fsS -u "$ADMIN_USER:$ADMIN_PASS" \ + -X POST "http://localhost:3300/api/v1/admin/users" \ + -H 'Content-Type: application/json' \ + -d "{ + \"username\": \"$NPM_USER\", + \"email\": \"npm@bytelyst.local\", + \"password\": \"$NPM_PASS\", + \"must_change_password\": false + }" +echo "" +echo " NPM user '$NPM_USER' created" +echo " NPM password (needed only to mint tokens): $NPM_PASS" +``` + +Save `NPM_PASS` temporarily — you need it for Step 3. After that the npm +token replaces it for all day-to-day use. + +If you see `{"message":"user already exists"}` — that's fine, skip ahead. +Use the existing NPM_PASS (or reset it via `sudo docker exec gitea gitea +admin user change-password --username learning_ai_user --password `). + +--- + +## Step 3 — Mint the npm token + +Run **on the VM**: + +```bash +NPM_USER="learning_ai_user" +NPM_PASS="" +TOKEN_NAME="npm-$(date +%Y%m%d-%H%M%S)-$(hostname -s)" + +RESPONSE=$(curl -fsS -u "$NPM_USER:$NPM_PASS" \ + -X POST "http://localhost:3300/api/v1/users/$NPM_USER/tokens" \ + -H 'Content-Type: application/json' \ + -d "{ + \"name\": \"$TOKEN_NAME\", + \"scopes\": [\"write:package\", \"read:package\"] + }") +echo "$RESPONSE" + +# Extract token (works for both newer "token" and older "sha1" field names) +TOKEN=$(echo "$RESPONSE" | grep -oE '"(sha1|token)":"[^"]+' | head -1 | sed 's/.*":"//') +echo "" +echo "════════════════════════════════════════" +echo " NPM TOKEN (copy this NOW):" +echo " $TOKEN" +echo "════════════════════════════════════════" +``` + +**Copy the token immediately.** Gitea never displays a token's secret value +after the first response. + +--- + +## Step 4 — Wire the token into your laptop + +Run **on your LAPTOP** (not the VM): + +```bash +# Paste the token from Step 3 here: +TOKEN="" + +# Write to the home-network-specific token file +echo -n "$TOKEN" > ~/.gitea_npm_token_home +chmod 600 ~/.gitea_npm_token_home + +# Also update the catch-all file (used by some scripts as fallback) +echo -n "$TOKEN" > ~/.gitea_npm_token +chmod 600 ~/.gitea_npm_token + +ls -la ~/.gitea_npm_token* +``` + +Expected output: two files, both `-rw-------`, 40 chars. + +--- + +## Step 5 — Tell switch-network.sh about the VM hostname + +Run **on your LAPTOP**: + +```bash +# Replace with your VM's public DNS or IP +echo "" > ~/.gitea_vm_host +# Example: echo "bytelyst-vm.eastus.cloudapp.azure.com" > ~/.gitea_vm_host + +cat ~/.gitea_vm_host +``` + +Then refresh your shell environment so `GITEA_NPM_HOST` is exported: + +```bash +source ~/.zshrc +echo "NETWORK=$NETWORK HOST=$GITEA_NPM_HOST OWNER=$GITEA_NPM_OWNER" +``` + +If you're on home network, expected output: + +``` +NETWORK=home HOST= OWNER=learning_ai_user +``` + +If `NETWORK=corp`, the host will be `localhost` (SSH tunnel mode). That's +expected for corp; the VM workflow assumes you're on home network. + +--- + +## Step 6 — Pre-flight verification + +Run **on your LAPTOP**: + +```bash +bash ~/code/mygh/learning_ai_common_plat/scripts/gitea/doctor.sh --probe @bytelyst/errors +``` + +Expected output: + +``` +✓ NETWORK=home +✓ GITEA_NPM_HOST= +✓ GITEA_NPM_OWNER=learning_ai_user +✓ Token consistent (env matches ~/.gitea_npm_token_home, 40 chars) +✓ Registry HTTP 200 on @bytelyst/errors +✗ @bytelyst/errors not found in registry (HTTP 404) +``` + +**The 404 on `@bytelyst/errors` is EXPECTED** at this point — the registry is +empty. We fix that in Step 7. + +If you see any other failure (token rejected, registry unreachable, owner +404), debug before moving on: + +| Failure | Likely cause | Fix | +| ------------------------------------ | ------------------------------------- | ---------------------------------- | +| `Registry unreachable` | Port 3300 not open to your laptop | Open NSG inbound rule for TCP 3300 | +| `Token rejected (HTTP 401)` | Token typo or scope missing | Re-run Step 3 | +| `Owner 'learning_ai_user' not found` | Step 2 was skipped or used wrong name | Re-run Step 2 | +| `DNS does not resolve` | `~/.gitea_vm_host` has typo | Re-check value | + +--- + +## Step 7 — Publish `@bytelyst/*` packages to the new VM + +Run **on the VM** (so we publish from canonical sources, not your laptop): + +```bash +cd ~/code/mygh/learning_ai_common_plat + +# Build all packages first +pnpm install --frozen-lockfile +pnpm build + +# Set env so publish targets the local Gitea +export GITEA_NPM_HOST=localhost +export GITEA_NPM_OWNER=learning_ai_user +export GITEA_NPM_TOKEN="" + +# Publish every @bytelyst/* package +bash scripts/gitea/publish-local-packages.sh +``` + +Takes ~2-3 min for ~60 packages. You'll see one `+ @bytelyst/@` +line per package. + +If you see `npm ERR! 409 Conflict` lines, that's fine — those packages were +already published (idempotent). + +--- + +## Step 8 — End-to-end verification + +Run **on your LAPTOP**: + +```bash +# Re-run doctor — package probe should now succeed +bash ~/code/mygh/learning_ai_common_plat/scripts/gitea/doctor.sh --probe @bytelyst/errors +``` + +Expected: + +``` +✓ @bytelyst/errors resolvable (latest versions: 0.1.10) +✅ All Gitea pre-flight checks passed +``` + +Then smoke-test a real product install: + +```bash +cd ~/code/mygh/learning_ai_notes +rm -rf node_modules backend/node_modules web/node_modules +pnpm install +``` + +Should complete in ~15-30s. If you see `ERR_PNPM_NO_MATCHING_VERSION`, you've +hit the historical-version gap — proceed to Step 9. + +--- + +## Step 9 — (Optional) Backfill historical versions + +Some `@bytelyst/*` packages pin older versions transitively (e.g. +`@bytelyst/auth@0.1.5` pins `@bytelyst/errors@0.1.5`). The publish script +only publishes the current version of each package; older versions need +backfilling. + +Run **on the VM**: + +```bash +cd ~/code/mygh/learning_ai_common_plat +export GITEA_NPM_HOST=localhost +export GITEA_NPM_OWNER=learning_ai_user +export GITEA_NPM_TOKEN="" + +bash scripts/gitea/publish-outdated-packages.sh +``` + +This walks `pnpm view versions --json` for every `@bytelyst/*` package, +checks out the matching git tag, builds, and publishes any version not yet in +the registry. Slow (~10-15 min for full backfill) but only runs once. + +After this, the `learning_ai_notes` install should complete without errors. + +--- + +## Step 10 — Persist environment for future shells + +Add to your laptop's `~/.zshrc` (or confirm these are already there from +`switch-network.sh` sourcing): + +```bash +# Already in switch-network.sh, but verify they're picked up +echo $GITEA_NPM_HOST # should be your VM hostname on home network +echo $GITEA_NPM_OWNER # should be learning_ai_user +echo $GITEA_NPM_TOKEN # should be the 40-char token +``` + +If any are missing, ensure your `~/.zshrc` has: + +```bash +export NETWORK=home # or corp; switch-network.sh keys on this +source "$HOME/code/mygh/learning_ai_common_plat/scripts/switch-network.sh" +``` + +--- + +## Troubleshooting + +### Doctor reports `STALE TOKEN: env GITEA_NPM_TOKEN ≠ file` + +Your shell has an old token cached. Fix: + +```bash +source ~/.zshrc +# Or to refresh just the token without sourcing everything: +eval "$(bash ~/code/mygh/learning_ai_common_plat/scripts/gitea/token.sh print --export)" +``` + +### `pnpm install` fails with `EAI_AGAIN` or `ETIMEDOUT` + +DNS or network. Verify: + +```bash +ping +nslookup +curl -v http://:3300/api/v1/version +``` + +### Need to rotate the token + +```bash +# On laptop (assumes Keychain entry from Step 1): +bash ~/code/mygh/learning_ai_common_plat/scripts/gitea/token.sh rotate +``` + +Or manually re-run Step 3 on the VM. + +### Need to start over (nuke the Gitea data) + +**On the VM** (destructive): + +```bash +sudo docker stop gitea +sudo rm -rf /var/lib/gitea/* # or wherever your data volume is +sudo docker start gitea +# Then re-run Steps 1-7 +``` + +--- + +## What's persistent vs. ephemeral + +| Item | Where | Survives VM reboot? | Survives VM rebuild? | +| ------------------------- | ------------------------------------------------ | ------------------- | -------------------------- | +| Gitea database | `/var/lib/gitea/data/gitea.db` | ✅ | ❌ (snapshot the disk) | +| Published packages | `/var/lib/gitea/data/packages/` | ✅ | ❌ (re-publish via Step 7) | +| Admin/npm users | inside Gitea DB | ✅ | ❌ (re-run Steps 1-2) | +| NPM tokens | inside Gitea DB + your `~/.gitea_npm_token_home` | ✅ | ❌ (re-run Step 3) | +| `~/.gitea_vm_host` | your laptop | ✅ | n/a | +| `~/.gitea_npm_token_home` | your laptop | ✅ | n/a | + +For VM rebuilds: snapshot `/var/lib/gitea` to Azure Disk Snapshot weekly, +restore on rebuild. Avoids re-running Steps 1-7. + +--- + +## See also + +- `scripts/gitea/doctor.sh` — pre-flight validation (run before every deploy) +- `scripts/gitea/token.sh` — token rotation helper +- `scripts/gitea/bootstrap-vm.sh` — automates Steps 1-3 on a fresh VM +- `scripts/switch-network.sh` — exports `GITEA_NPM_*` env vars per network +- `docker-build-optimization-roadmap.md` (in `learning_ai_devops_tools/docs/`) — + ecosystem-wide Docker build hardening that depends on this setup