- add-host-runner.sh docker mode now strips host-specific envs (HOME, PATH, PNPM_HOME) that leak macOS paths into Linux containers and override workflow env (broke $HOME-relative writes) - GITEA_VM_SETUP.md 11.5: reference pattern + 5 gotchas for migrating a real job (docker-lint) onto the docker runner: Actions secret (not token file), doctor.sh token-file requirement, host-env leakage, env_file token override, proxy bypass. Validated green on M-…-4.
23 KiB
Gitea Cloud VM Setup — Runbook
Status: Active runbook · Last verified: 2026-05-28 Use this when: You have provisioned a cloud VM (Azure / wherever), Gitea is installed and running on
:3300, repos are cloned, and you need to wire the npm registry end-to-end with your laptop.
Assumes you're SSH'd into the VM (or running commands on the VM) and the
sibling learning_ai_common_plat repo is at ~/code/mygh/learning_ai_common_plat/
on both the VM and your laptop. Adjust paths as needed.
Prerequisites checklist
Before starting, confirm all of these on the VM:
# 1. Gitea container is running and healthy
sudo docker ps | grep gitea
curl -fsS http://localhost:3300/api/v1/version
# Expected: {"version":"1.X.X"}
# 2. Port 3300 is reachable from your laptop
# (run this FROM YOUR LAPTOP, not the VM)
# curl -fsS http://<VM_HOST>:3300/api/v1/version
# 3. Repos cloned on the VM
ls ~/code/mygh/learning_ai_common_plat
# Expected: packages/ services/ scripts/ ...
If any of these fail, fix them first. Common gotchas:
- Port 3300 blocked: Azure NSG → VM → Networking → "Add inbound port rule" → TCP 3300 from your home IP
- Gitea registry disabled: Edit
app.iniinside/var/lib/gitea/conf/, add[packages]\nENABLED = true, thensudo docker restart gitea hostnameresolves tolocalhostin the VM but you reach it via public DNS — note both, the API only needslocalhostfrom inside the VM
Step 1 — Create Gitea admin user (skip if you already have one)
Run on the VM:
# Check if any admin exists
sudo docker exec gitea gitea admin user list
# If empty (or you don't have admin creds), create one:
ADMIN_USER="gitea-admin"
ADMIN_PASS="$(openssl rand -base64 24 | tr -dc 'A-Za-z0-9' | head -c 24)"
echo " Admin password (save this!): $ADMIN_PASS"
sudo docker exec gitea gitea admin user create \
--username "$ADMIN_USER" \
--password "$ADMIN_PASS" \
--email "admin@bytelyst.local" \
--admin \
--must-change-password=false
SAVE the admin password somewhere safe (1Password / Bitwarden / macOS Keychain). You'll need it only for token rotation and bootstrapping; day-to-day work uses the npm token.
Optionally store in macOS Keychain on your laptop:
# Run on your LAPTOP (Mac)
security add-generic-password \
-s 'gitea-admin' \
-a 'gitea-admin' \
-w '<paste-the-password>'
Then scripts/gitea/token.sh rotate can auto-discover it later.
Step 2 — Create the npm owner user
The npm registry is namespaced by owner. The canonical owner is learning_ai_user.
Run on the VM:
ADMIN_USER="gitea-admin"
ADMIN_PASS="<paste-admin-password-from-step-1>"
NPM_USER="learning_ai_user"
NPM_PASS="$(openssl rand -base64 24 | tr -dc 'A-Za-z0-9' | head -c 24)"
# Create the user via admin API
curl -fsS -u "$ADMIN_USER:$ADMIN_PASS" \
-X POST "http://localhost:3300/api/v1/admin/users" \
-H 'Content-Type: application/json' \
-d "{
\"username\": \"$NPM_USER\",
\"email\": \"npm@bytelyst.local\",
\"password\": \"$NPM_PASS\",
\"must_change_password\": false
}"
echo ""
echo " NPM user '$NPM_USER' created"
echo " NPM password (needed only to mint tokens): $NPM_PASS"
Save NPM_PASS temporarily — you need it for Step 3. After that the npm
token replaces it for all day-to-day use.
If you see {"message":"user already exists"} — that's fine, skip ahead.
Use the existing NPM_PASS (or reset it via sudo docker exec gitea gitea admin user change-password --username learning_ai_user --password <new>).
Step 3 — Mint the npm token
Run on the VM:
NPM_USER="learning_ai_user"
NPM_PASS="<paste-from-step-2>"
TOKEN_NAME="npm-$(date +%Y%m%d-%H%M%S)-$(hostname -s)"
RESPONSE=$(curl -fsS -u "$NPM_USER:$NPM_PASS" \
-X POST "http://localhost:3300/api/v1/users/$NPM_USER/tokens" \
-H 'Content-Type: application/json' \
-d "{
\"name\": \"$TOKEN_NAME\",
\"scopes\": [\"write:package\", \"read:package\"]
}")
echo "$RESPONSE"
# Extract token (works for both newer "token" and older "sha1" field names)
TOKEN=$(echo "$RESPONSE" | grep -oE '"(sha1|token)":"[^"]+' | head -1 | sed 's/.*":"//')
echo ""
echo "════════════════════════════════════════"
echo " NPM TOKEN (copy this NOW):"
echo " $TOKEN"
echo "════════════════════════════════════════"
Copy the token immediately. Gitea never displays a token's secret value after the first response.
Step 4 — Wire the token into your laptop
Run on your LAPTOP (not the VM):
# Paste the token from Step 3 here:
TOKEN="<paste-token-from-step-3>"
# Write to the home-network-specific token file
echo -n "$TOKEN" > ~/.gitea_npm_token_home
chmod 600 ~/.gitea_npm_token_home
# Also update the catch-all file (used by some scripts as fallback)
echo -n "$TOKEN" > ~/.gitea_npm_token
chmod 600 ~/.gitea_npm_token
ls -la ~/.gitea_npm_token*
Expected output: two files, both -rw-------, 40 chars.
Step 5 — Tell switch-network.sh about the VM hostname
Run on your LAPTOP:
# Replace with your VM's public DNS or IP
echo "<VM_HOST>" > ~/.gitea_vm_host
# Example: echo "bytelyst-vm.eastus.cloudapp.azure.com" > ~/.gitea_vm_host
cat ~/.gitea_vm_host
Then refresh your shell environment so GITEA_NPM_HOST is exported:
source ~/.zshrc
echo "NETWORK=$NETWORK HOST=$GITEA_NPM_HOST OWNER=$GITEA_NPM_OWNER"
If you're on home network, expected output:
NETWORK=home HOST=<VM_HOST> OWNER=learning_ai_user
If NETWORK=corp, the host will be localhost (SSH tunnel mode). That's
expected for corp; the VM workflow assumes you're on home network.
Step 6 — Pre-flight verification
Run on your LAPTOP:
bash ~/code/mygh/learning_ai_common_plat/scripts/gitea/doctor.sh --probe @bytelyst/errors
Expected output:
✓ NETWORK=home
✓ GITEA_NPM_HOST=<VM_HOST>
✓ GITEA_NPM_OWNER=learning_ai_user
✓ Token consistent (env matches ~/.gitea_npm_token_home, 40 chars)
✓ Registry HTTP 200 on @bytelyst/errors
✗ @bytelyst/errors not found in registry (HTTP 404)
The 404 on @bytelyst/errors is EXPECTED at this point — the registry is
empty. We fix that in Step 7.
If you see any other failure (token rejected, registry unreachable, owner 404), debug before moving on:
| Failure | Likely cause | Fix |
|---|---|---|
Registry unreachable |
Port 3300 not open to your laptop | Open NSG inbound rule for TCP 3300 |
Token rejected (HTTP 401) |
Token typo or scope missing | Re-run Step 3 |
Owner 'learning_ai_user' not found |
Step 2 was skipped or used wrong name | Re-run Step 2 |
DNS does not resolve |
~/.gitea_vm_host has typo |
Re-check value |
Step 7 — Publish @bytelyst/* packages to the new VM
Run on the VM (so we publish from canonical sources, not your laptop):
cd ~/code/mygh/learning_ai_common_plat
# Build all packages first
pnpm install --frozen-lockfile
pnpm build
# Set env so publish targets the local Gitea
export GITEA_NPM_HOST=localhost
export GITEA_NPM_OWNER=learning_ai_user
export GITEA_NPM_TOKEN="<paste-token-from-step-3>"
# Publish every @bytelyst/* package
bash scripts/gitea/publish-local-packages.sh
Takes ~2-3 min for ~60 packages. You'll see one + @bytelyst/<name>@<version>
line per package.
If you see npm ERR! 409 Conflict lines, that's fine — those packages were
already published (idempotent).
Step 8 — End-to-end verification
Run on your LAPTOP:
# Re-run doctor — package probe should now succeed
bash ~/code/mygh/learning_ai_common_plat/scripts/gitea/doctor.sh --probe @bytelyst/errors
Expected:
✓ @bytelyst/errors resolvable (latest versions: 0.1.10)
✅ All Gitea pre-flight checks passed
Then smoke-test a real product install:
cd ~/code/mygh/learning_ai_notes
rm -rf node_modules backend/node_modules web/node_modules
pnpm install
Should complete in ~15-30s. If you see ERR_PNPM_NO_MATCHING_VERSION, you've
hit the historical-version gap — proceed to Step 9.
Step 9 — (Optional) Backfill historical versions
Some @bytelyst/* packages pin older versions transitively (e.g.
@bytelyst/auth@0.1.5 pins @bytelyst/errors@0.1.5). The publish script
only publishes the current version of each package; older versions need
backfilling.
Run on the VM:
cd ~/code/mygh/learning_ai_common_plat
export GITEA_NPM_HOST=localhost
export GITEA_NPM_OWNER=learning_ai_user
export GITEA_NPM_TOKEN="<paste-token-from-step-3>"
bash scripts/gitea/publish-outdated-packages.sh
This walks pnpm view <pkg> versions --json for every @bytelyst/* package,
checks out the matching git tag, builds, and publishes any version not yet in
the registry. Slow (~10-15 min for full backfill) but only runs once.
After this, the learning_ai_notes install should complete without errors.
Step 10 — Persist environment for future shells
Add to your laptop's ~/.zshrc (or confirm these are already there from
switch-network.sh sourcing):
# Already in switch-network.sh, but verify they're picked up
echo $GITEA_NPM_HOST # should be your VM hostname on home network
echo $GITEA_NPM_OWNER # should be learning_ai_user
echo $GITEA_NPM_TOKEN # should be the 40-char token
If any are missing, ensure your ~/.zshrc has:
export NETWORK=home # or corp; switch-network.sh keys on this
source "$HOME/code/mygh/learning_ai_common_plat/scripts/switch-network.sh"
Step 11 — Gitea Actions runner (CI)
The npm registry (Steps 1–10) is independent of CI. To run the docker-lint
job (and the rest of each repo's .gitea/workflows/*.yml) you need an
Actions runner registered against Gitea. This section makes that
reproducible — the original runner was registered by hand.
11.1 — Enable Actions on the instance
In app.ini (inside the Gitea container/conf dir):
[actions]
ENABLED = true
Then sudo docker restart gitea. Confirm with
curl -fsS http://localhost:3300/api/v1/version and that the repo Settings →
Actions toggle is available.
11.2 — Install and register the runner
# macOS host runner (laptop) — install
brew install act_runner
# Register reproducibly (fetches a registration token via the admin API,
# registers with the agreed labels + capacity). Host mode is the default.
GITEA_ADMIN_USER=gitea-admin GITEA_ADMIN_PASS='<admin-pass>' \
bash scripts/gitea/register-runner.sh --name bytelyst-mac --capacity 2
# Containerized runner (better isolation; requires Docker on the host):
GITEA_ADMIN_USER=gitea-admin GITEA_ADMIN_PASS='<admin-pass>' \
bash scripts/gitea/register-runner.sh --mode docker --capacity 2
register-runner.sh is idempotent: if a runner is already registered it
prints the current identity and exits. Pass --force to re-register (this
invalidates the old runner row in Gitea).
11.3 — Host mode vs. containerized mode
Host mode (ubuntu-latest:host) |
Docker mode (ubuntu-latest:docker://…) |
|
|---|---|---|
| Isolation | None — jobs run directly on macOS | Each job in a fresh container |
| Speed | Fast (no image pull) | Slower first run (pulls catthehacker/ubuntu) |
| Reproducibility | Depends on host toolchain | Pinned image, matches GitHub closely |
| Best for | Single-operator laptop / corp proxy | Shared/VM runners, untrusted PRs |
We run host mode on the laptop because the corp proxy + Docker-in-Docker is fragile, and the jobs are trusted (own repos). Prefer docker mode on the VM.
11.4 — Secrets: never inline the token
The npm token must not live inline in config.yaml. Externalise it into a
gitignored runner.env referenced by runner.env_file:
# /opt/homebrew/etc/act_runner/config.yaml
runner:
capacity: 2 # parallel jobs
env_file: '/opt/homebrew/etc/act_runner/runner.env'
envs:
# GITEA_NPM_TOKEN is loaded from env_file — never inline here.
NODE_ENV: test
# /opt/homebrew/etc/act_runner/runner.env (chmod 600, never committed)
GITEA_NPM_TOKEN=<token>
After editing config, reload the daemon:
brew services restart act_runner # or, if brew name mismatches:
launchctl kickstart -k gui/$(id -u)/homebrew.mxcl.act_runner
tail -f /opt/homebrew/var/log/act_runner.log # expect "declare successfully"
11.5 — Concurrency
runner.capacity controls parallel jobs on one runner. With capacity: 1
the lightweight docker-lint job queues behind slow backend/web/mobile/E2E
jobs (observed: ~13 min wait). capacity: 2 lets docker-lint run alongside
one heavy job. For more parallelism, register additional runners rather
than pushing capacity high on a single laptop — each runner gets its own
workdir and process, so failures/timeouts stay isolated.
Add more host runners (reproducible):
# Stand up runners #2 and #3 (each capacity 2) as their own launchd services.
# Shares the canonical runner.env token; separate config/.runner/workdir.
bash scripts/gitea/add-host-runner.sh 2 2
bash scripts/gitea/add-host-runner.sh 3 2
add-host-runner.sh <N> [capacity]:
- derives a per-runner
config.yamlfrom the canonical one (preserves proxy env +env_file), overridingrunner.file,runner.capacity, and a uniquehost.workdir_parent(~/.cache/act-<N>) - fetches a one-time registration token via the admin API (
~/.gitea_c5_pat) - registers as
$(hostname -s)-<N>with host-mode labels - writes + loads
~/Library/LaunchAgents/com.bytelyst.act_runner-<N>.plist(RunAtLoad+KeepAlive) - idempotent: re-running just reloads the service
The Homebrew act_runner service is runner #1; add-host-runner.sh adds
#2, #3, … Current fleet: 3 host runners × capacity 3 ≈ 9 parallel host
slots. Verified: pushing a multi-job workflow distributes jobs across all
three runners simultaneously.
Add a docker-mode runner (stronger isolation):
# Pull the act image once (≈2.3 GB; works through the corp proxy):
docker pull catthehacker/ubuntu:act-latest
# Stand up runner #4 in docker mode (capacity 1):
bash scripts/gitea/add-host-runner.sh 4 1 docker
Docker mode advertises a dedicated docker label (not ubuntu-latest),
so it does not hijack the host-mode ubuntu-latest jobs. Opt a job in with
runs-on: docker. The generated config:
- sets
runner.labelstodocker:docker://<image>(act_runner reads labels from the config file, notregister --labels) container.docker_host: "-",force_pull: false(use the locally-pulled image),options: --add-host=host.docker.internal:host-gateway- adds
host.docker.internaltoNO_PROXY/no_proxy— without this, containerized jobs inherit the corp proxy env and routehost.docker.internal:3300through the proxy, getting an HTTP 504. Jobs must reach Gitea viahost.docker.internal:3300(notlocalhost) from inside the container.
Validated end-to-end: a runs-on: docker job runs in an Ubuntu 24.04
container and reaches Gitea (GET /api/v1/version → {"version":"…"}).
Migrating a real job onto the docker runner. The host-clone CI model
(working-directory: /Users/… + git pull) does not work in a container.
A containerized job must instead clone what it needs and reach Gitea via
host.docker.internal. Reference pattern (the docker-lint job in
learning_ai_clock):
docker-lint:
runs-on: docker
env:
GITEA_NPM_HOST: host.docker.internal
GITEA_NPM_OWNER: learning_ai_user
GITEA_NPM_TOKEN: ${{ secrets.NPM_REGISTRY_TOKEN }} # GITEA_-prefixed names are reserved
NO_PROXY: host.docker.internal,localhost,127.0.0.1
no_proxy: host.docker.internal,localhost,127.0.0.1
steps:
- name: Fetch repo + canonical doctor scripts
run: |
G="http://host.docker.internal:3300/learning_ai_user"
git clone --depth 1 "$G/${GITHUB_REPOSITORY##*/}.git" repo
git clone --depth 1 "$G/learning_ai_common_plat.git" common-plat
- name: gitea-doctor
run: |
printf '%s' "$GITEA_NPM_TOKEN" > "$HOME/.gitea_npm_token" # doctor.sh needs a token file
bash common-plat/scripts/gitea/doctor.sh --quiet
- name: docker-doctor
run: bash common-plat/scripts/docker-doctor.sh --repo "$PWD/repo" --quiet
Gotchas that cost real debugging time (all now handled by
add-host-runner.sh docker mode + this pattern):
- Secret, not file. Job containers do not see the runner's
~/.gitea_npm_token. Provide the registry token as a Gitea Actions secret (NPM_REGISTRY_TOKEN, set at repo or user level viaPUT /api/v1/repos/{owner}/{repo}/actions/secrets/{name}).GITEA_/GITHUB_prefixes are reserved → secret create returns HTTP 400. doctor.shrequires a token file. Its stale-shell check errors if~/.gitea_npm_tokenis absent even when the env token is set — so the job writes the secret to the file first.- Host envs leak into the container. The runner injects
HOME,PATH,PNPM_HOME(macOS paths) which override workflowenv:and break$HOME-relative writes.add-host-runner.shdocker mode strips these so the container uses its image defaults (/root). env_filetoken overrides the secret. The runner'srunner.envGITEA_NPM_TOKENis injected into jobs and wins over${{ secrets.* }}. Keeprunner.envin sync with the current token (see §11.6) — a stale value there causes registry HTTP 401 even with a correct secret.- Proxy bypass.
host.docker.internalmust be inNO_PROXY/no_proxy(handled by the runner config) or the corp proxy returns HTTP 504.
Validated: docker-lint runs green on the docker runner (M-…-4) —
git clone over host.docker.internal, gitea-doctor registry probe → 200,
docker-doctor lint pass.
List + prune the fleet:
PAT=$(cat ~/.gitea_c5_pat)
curl -s -H "Authorization: token $PAT" \
http://localhost:3300/api/v1/admin/actions/runners \
| python3 -c "import json,sys; [print(r['id'], r['name'], r['status']) for r in json.load(sys.stdin)['runners']]"
# Delete a stale/offline runner by id:
curl -s -X DELETE -H "Authorization: token $PAT" \
http://localhost:3300/api/v1/admin/actions/runners/<id>
Remove an extra runner entirely:
launchctl bootout "gui/$(id -u)/com.bytelyst.act_runner-2" 2>/dev/null || true
rm -f ~/Library/LaunchAgents/com.bytelyst.act_runner-2.plist
rm -rf "$HOME/Library/Application Support/act_runner-2" ~/.cache/act-2
# then DELETE its row via the admin API (above)
11.6 — Runner token rotation
The registration token (used once at register time) is separate from the npm token (used by jobs). To rotate:
# Registration token — just re-register; old token is single-use anyway:
bash scripts/gitea/register-runner.sh --force --name bytelyst-mac
# npm token — rotate via the existing helper, then update runner.env:
bash scripts/gitea/token.sh rotate
TOKEN=$(cat ~/.gitea_npm_token)
printf 'GITEA_NPM_TOKEN=%s\n' "$TOKEN" > /opt/homebrew/etc/act_runner/runner.env
chmod 600 /opt/homebrew/etc/act_runner/runner.env
brew services restart act_runner
11.7 — Verify CI end-to-end
Push any repo that has a docker-lint job, then:
PAT=$(cat ~/.gitea_c5_pat)
R=learning_ai_clock
RID=$(curl -s -H "Authorization: token $PAT" \
"http://localhost:3300/api/v1/repos/learning_ai_user/$R/actions/runs?limit=1" \
| python3 -c "import json,sys; print(json.load(sys.stdin)['workflow_runs'][0]['id'])")
curl -s -H "Authorization: token $PAT" \
"http://localhost:3300/api/v1/repos/learning_ai_user/$R/actions/runs/$RID/jobs" \
| python3 -c "import json,sys; [print(j['status'], j.get('conclusion'), j['name']) for j in json.load(sys.stdin)['jobs']]"
# Expect: completed success Docker lint — gitea-doctor + docker-doctor
Troubleshooting
Doctor reports STALE TOKEN: env GITEA_NPM_TOKEN ≠ file
Your shell has an old token cached. Fix:
source ~/.zshrc
# Or to refresh just the token without sourcing everything:
eval "$(bash ~/code/mygh/learning_ai_common_plat/scripts/gitea/token.sh print --export)"
pnpm install fails with EAI_AGAIN or ETIMEDOUT
DNS or network. Verify:
ping <VM_HOST>
nslookup <VM_HOST>
curl -v http://<VM_HOST>:3300/api/v1/version
Need to rotate the token
# On laptop (assumes Keychain entry from Step 1):
bash ~/code/mygh/learning_ai_common_plat/scripts/gitea/token.sh rotate
Or manually re-run Step 3 on the VM.
Need to start over (nuke the Gitea data)
On the VM (destructive):
sudo docker stop gitea
sudo rm -rf /var/lib/gitea/* # or wherever your data volume is
sudo docker start gitea
# Then re-run Steps 1-7
What's persistent vs. ephemeral
| Item | Where | Survives VM reboot? | Survives VM rebuild? |
|---|---|---|---|
| Gitea database | /var/lib/gitea/data/gitea.db |
✅ | ❌ (snapshot the disk) |
| Published packages | /var/lib/gitea/data/packages/ |
✅ | ❌ (re-publish via Step 7) |
| Admin/npm users | inside Gitea DB | ✅ | ❌ (re-run Steps 1-2) |
| NPM tokens | inside Gitea DB + your ~/.gitea_npm_token_home |
✅ | ❌ (re-run Step 3) |
~/.gitea_vm_host |
your laptop | ✅ | n/a |
~/.gitea_npm_token_home |
your laptop | ✅ | n/a |
| Actions runner registration | Gitea DB + .runner file |
✅ | ❌ (re-run register-runner.sh) |
| Runner secrets | act_runner/runner.env (chmod 600) |
✅ | ❌ (recreate from token) |
For VM rebuilds: snapshot /var/lib/gitea to Azure Disk Snapshot weekly,
restore on rebuild. Avoids re-running Steps 1-7.
See also
scripts/gitea/doctor.sh— pre-flight validation (run before every deploy)scripts/gitea/token.sh— token rotation helperscripts/gitea/register-runner.sh— reproducible Actions runner registration (Step 11)scripts/gitea/bootstrap-vm.sh— automates Steps 1-3 on a fresh VMscripts/switch-network.sh— exportsGITEA_NPM_*env vars per networkdocker-build-optimization-roadmap.md(inlearning_ai_devops_tools/docs/) — ecosystem-wide Docker build hardening that depends on this setup