Tighten Hermes local fallback chain
This commit is contained in:
parent
3e26f0da31
commit
8da66497cc
@ -15,9 +15,11 @@ Observed on 2026-05-27:
|
|||||||
- Uma Telegram gateway: `uma-hermes-gateway.service`, user service for `uma`, enabled and running
|
- Uma Telegram gateway: `uma-hermes-gateway.service`, user service for `uma`, enabled and running
|
||||||
- Root and Uma default model: `gpt-5.5`, `model.routing.enabled: false`
|
- Root and Uma default model: `gpt-5.5`, `model.routing.enabled: false`
|
||||||
- Shared local fallback chain via Ollama on demand:
|
- Shared local fallback chain via Ollama on demand:
|
||||||
- `qwen2.5-coder:7b`
|
- `qwen2.5-coder:1.5b`
|
||||||
- `llama3.1:8b`
|
- `llama3.2:1b`
|
||||||
- `llama3.2-vision`
|
- `llama3.2-vision`
|
||||||
|
- These local fallbacks are loaded on demand and answer within the gateway's retry budget on this VM; the larger 3B/7B models were observed to be too slow for the live fallback path here.
|
||||||
|
- Live Hermes session-switch proof: root and Uma both fail over from a forced primary-provider error into the local Ollama chain and return `FallbackTest`.
|
||||||
- Web backend target: Firecrawl, configured locally on root and Uma with a private API key
|
- Web backend target: Firecrawl, configured locally on root and Uma with a private API key
|
||||||
- Browser automation: enabled on both Hermes gateways; root was smoke-tested privately against `https://example.com`
|
- Browser automation: enabled on both Hermes gateways; root was smoke-tested privately against `https://example.com`
|
||||||
- Backup cron: `Sync Hermes persistent-data backup to GitHub`, every 30 minutes, local delivery
|
- Backup cron: `Sync Hermes persistent-data backup to GitHub`, every 30 minutes, local delivery
|
||||||
@ -102,7 +104,7 @@ Notes:
|
|||||||
|
|
||||||
- `hermes doctor --fix` migrated root and Uma configs to version `24` on 2026-05-27.
|
- `hermes doctor --fix` migrated root and Uma configs to version `24` on 2026-05-27.
|
||||||
- Optional providers/search backends are mostly not configured yet. Configure through Hermes setup/auth flows only; never commit credentials.
|
- Optional providers/search backends are mostly not configured yet. Configure through Hermes setup/auth flows only; never commit credentials.
|
||||||
- Local Ollama fallback models are installed on demand, not kept hot permanently. Both Hermes instances can reach the shared host service at `http://127.0.0.1:11434/v1`. `gemma4` was attempted but the installed Ollama runtime rejected it, so the vision fallback is `llama3.2-vision`.
|
- Local Ollama fallback models are installed on demand, not kept hot permanently. Both Hermes instances can reach the shared host service at `http://127.0.0.1:11434/v1`. The live fallback order is `qwen2.5-coder:1.5b` -> `llama3.2:1b` -> `llama3.2-vision`. `gemma4` was attempted but the installed Ollama runtime rejected it, so the vision fallback is `llama3.2-vision`.
|
||||||
|
|
||||||
## Gateway recovery
|
## Gateway recovery
|
||||||
|
|
||||||
|
|||||||
@ -214,17 +214,19 @@ A healthy ByteLyst Hermes setup should be:
|
|||||||
- [x] Configure provider credentials through Hermes auth/config flows; do not commit keys.
|
- [x] Configure provider credentials through Hermes auth/config flows; do not commit keys.
|
||||||
- vijay: documented the command path; provider additions requiring new credentials remain pending.
|
- vijay: documented the command path; provider additions requiring new credentials remain pending.
|
||||||
- [x] Define model routing tiers:
|
- [x] Define model routing tiers:
|
||||||
- vijay: fast/cheap = `qwen2.5:0.5b` or `llama3.2:1b`, strong coding = `qwen2.5-coder:7b`, general/long-context = `llama3.1:8b`, vision-capable = `llama3.2-vision`.
|
- vijay: fast/cheap = `qwen2.5-coder:1.5b` or `llama3.2:1b`, strong coding = `qwen2.5-coder:1.5b`, general/fast fallback = `llama3.2:1b`, vision-capable = `llama3.2-vision`.
|
||||||
- bheem: same local tier map applies to Uma.
|
- bheem: same local tier map applies to Uma.
|
||||||
- routing remains disabled until a separate routed path is proven safe.
|
- routing remains disabled until a separate routed path is proven safe.
|
||||||
- [ ] Test fallback behavior by switching models in a new Hermes session.
|
- [x] Test fallback behavior by switching models in a new Hermes session.
|
||||||
- vijay: direct Ollama smoke tests passed for `qwen2.5-coder:7b`, `llama3.1:8b`, and `llama3.2-vision`; live Hermes session-switch verification still needs to be done.
|
- vijay: direct Ollama smoke tests passed for `qwen2.5-coder:1.5b`, `llama3.2:1b`, and `llama3.2-vision`; live Hermes session-switch verification passed for the root fallback chain after forcing the primary provider to fail.
|
||||||
- bheem: same live Hermes session-switch verification still needs to be done for Uma.
|
- bheem: same fallback-chain proof passed for the Uma profile as well.
|
||||||
- [x] Document the preferred default model and fallback order.
|
- [x] Document the preferred default model and fallback order.
|
||||||
- vijay: current default is OpenAI Codex OAuth; fallback provider order is now the shared local Ollama chain.
|
- vijay: current default is OpenAI Codex OAuth; fallback provider order is now the shared local Ollama chain.
|
||||||
- vijay: preferred default is explicitly `gpt-5.5`; model routing is intentionally disabled until upstream routing is proven safe for this backend.
|
- vijay: preferred default is explicitly `gpt-5.5`; model routing is intentionally disabled until upstream routing is proven safe for this backend.
|
||||||
|
|
||||||
- [ ] Verify the root and Uma Telegram gateways can actually switch to the fallback chain in a live conversation without surfacing provider errors.
|
- [ ] Verify the root and Uma Telegram gateways can actually switch to the fallback chain in a live conversation without surfacing provider errors.
|
||||||
|
- vijay: root live Hermes session now fails over into the local Ollama chain cleanly; Telegram gateway proof is still pending.
|
||||||
|
- bheem: Uma gateway proof is still pending.
|
||||||
|
|
||||||
### Phase 5 — Tooling Capability Upgrade
|
### Phase 5 — Tooling Capability Upgrade
|
||||||
|
|
||||||
@ -441,7 +443,7 @@ This roadmap is complete when:
|
|||||||
- [x] Hermes can be upgraded and rolled back/restored with a documented process.
|
- [x] Hermes can be upgraded and rolled back/restored with a documented process.
|
||||||
- vijay: upgrade path was executed against shared checkout `0b6ace649`; restore rehearsal succeeded into `/tmp/hermes-restore-test-root`. Full rollback remains a manual operator decision but the documented restore process is tested.
|
- vijay: upgrade path was executed against shared checkout `0b6ace649`; restore rehearsal succeeded into `/tmp/hermes-restore-test-root`. Full rollback remains a manual operator decision but the documented restore process is tested.
|
||||||
- [x] Gateway failures and backup failures notify Telegram.
|
- [x] Gateway failures and backup failures notify Telegram.
|
||||||
- [ ] At least one fallback model/provider is configured and tested.
|
- [x] At least one fallback model/provider is configured and tested.
|
||||||
- [ ] Web/search tooling works for current research tasks.
|
- [ ] Web/search tooling works for current research tasks.
|
||||||
- [x] No Hermes dashboard/API is publicly exposed.
|
- [x] No Hermes dashboard/API is publicly exposed.
|
||||||
- [ ] Backup restore has been tested into a non-production profile.
|
- [ ] Backup restore has been tested into a non-production profile.
|
||||||
|
|||||||
Loading…
Reference in New Issue
Block a user