docs: record VM backup and cron fixes
Some checks failed
pre-commit / pre-commit (push) Has been cancelled
Some checks failed
pre-commit / pre-commit (push) Has been cancelled
This commit is contained in:
parent
90f6db2014
commit
1f2eea8268
@ -295,7 +295,9 @@ Effective `sshd -T` settings showed:
|
|||||||
|
|
||||||
**Roadmap:**
|
**Roadmap:**
|
||||||
|
|
||||||
- [ ] Inspect `hermes-root-backup.service` logs and decide whether to fix, disable, or replace it with the cron-backed job.
|
- [x] Inspect `hermes-root-backup.service` logs and decide whether to fix, disable, or replace it with the cron-backed job.
|
||||||
|
- [x] Repair the root backup checkout divergence and verify a successful `hermes-root-backup.service` one-shot run.
|
||||||
|
- [x] Update `/root/.hermes/scripts/sync_hermes_persistent_backup.py` so future generated-backup divergence preserves a safety branch and rejoins the remote backup stream instead of wedging on `git pull --ff-only`.
|
||||||
- [ ] Document all backup mechanisms: Hermes, Gitea data, Docker volumes, app data, Caddy certs/config, environment/secrets escrow.
|
- [ ] Document all backup mechanisms: Hermes, Gitea data, Docker volumes, app data, Caddy certs/config, environment/secrets escrow.
|
||||||
- [ ] Run a restore drill into a non-production path/profile.
|
- [ ] Run a restore drill into a non-production path/profile.
|
||||||
- [ ] Verify no raw `.env`, OAuth tokens, private keys, SQLite WAL/SHM, or raw transcript DBs are committed.
|
- [ ] Verify no raw `.env`, OAuth tokens, private keys, SQLite WAL/SHM, or raw transcript DBs are committed.
|
||||||
@ -352,7 +354,7 @@ Effective `sshd -T` settings showed:
|
|||||||
**Roadmap:**
|
**Roadmap:**
|
||||||
|
|
||||||
- [ ] Inventory root/user crontabs, `/etc/cron.d`, systemd timers, Hermes cron, and Gitea Actions schedules.
|
- [ ] Inventory root/user crontabs, `/etc/cron.d`, systemd timers, Hermes cron, and Gitea Actions schedules.
|
||||||
- [ ] Remove or update stale `/opt/bytelyst/bytelyst-devops-tools/...` references after confirming replacements.
|
- [x] Remove or update stale `/opt/bytelyst/bytelyst-devops-tools/...` references after confirming replacements.
|
||||||
- [ ] Add owner, purpose, expected output, and alert channel for every job.
|
- [ ] Add owner, purpose, expected output, and alert channel for every job.
|
||||||
- [ ] Add a stale-job detector for missing script paths and failed systemd units.
|
- [ ] Add a stale-job detector for missing script paths and failed systemd units.
|
||||||
|
|
||||||
@ -395,9 +397,9 @@ Effective `sshd -T` settings showed:
|
|||||||
### Phase 2 — Operational correctness
|
### Phase 2 — Operational correctness
|
||||||
|
|
||||||
- [ ] Fix/retire unhealthy containers.
|
- [ ] Fix/retire unhealthy containers.
|
||||||
- [ ] Resolve `hermes-root-backup.service` failed state.
|
- [x] Resolve `hermes-root-backup.service` failed state.
|
||||||
- [ ] Decide and document Gitea runner active/disabled state.
|
- [ ] Decide and document Gitea runner active/disabled state.
|
||||||
- [ ] Remove stale cron paths and add missing-script checks.
|
- [ ] Add missing-script checks. Stale root cron path was fixed on 2026-05-27.
|
||||||
- [ ] Apply pending security/runtime updates in a maintenance window.
|
- [ ] Apply pending security/runtime updates in a maintenance window.
|
||||||
|
|
||||||
**Exit criteria:** no unexpected failed units, no ignored unhealthy required containers, no stale cron paths, and runner state is intentional.
|
**Exit criteria:** no unexpected failed units, no ignored unhealthy required containers, no stale cron paths, and runner state is intentional.
|
||||||
@ -449,6 +451,31 @@ Minimum post-checks for Phase 1:
|
|||||||
- `sshd -T` after SSH changes
|
- `sshd -T` after SSH changes
|
||||||
- `systemctl --failed --no-pager`
|
- `systemctl --failed --no-pager`
|
||||||
|
|
||||||
|
## Implementation Log
|
||||||
|
|
||||||
|
### 2026-05-27 — Phase 2 backup and cron drift
|
||||||
|
|
||||||
|
**Changed:**
|
||||||
|
|
||||||
|
- Repointed the root Lucky25 monitor cron from `/opt/bytelyst/bytelyst-devops-tools/monitor-lucky25-execution.sh` to `/opt/bytelyst/learning_ai_devops_tools/scripts/monitor-lucky25-execution.sh`.
|
||||||
|
- Saved the pre-change root crontab at `/tmp/root-crontab-before-vm-security-20260527.txt`.
|
||||||
|
- Repaired `/root/repos/bytelyst_hostinger_hermes_vm`, which was `ahead 1, behind 11`; the obsolete local generated backup commit conflicted with newer remote snapshots and was skipped after rebase preserved the current remote stream.
|
||||||
|
- Patched `/root/.hermes/scripts/sync_hermes_persistent_backup.py` to replace unconditional `git pull --ff-only` with explicit fetch/merge-base handling. Diverged generated snapshots now create a safety branch before attempting rebase and fall back to `origin/<branch>` if the generated files conflict.
|
||||||
|
- Saved the pre-change backup script at `/tmp/sync_hermes_persistent_backup.py.before-vm-security-20260527`.
|
||||||
|
|
||||||
|
**Verified:**
|
||||||
|
|
||||||
|
- `crontab -l` now points the Lucky25 monitor at the current repo script.
|
||||||
|
- `python3 -m py_compile /tmp/sync_hermes_persistent_backup.py` passed before deployment.
|
||||||
|
- `systemctl start hermes-root-backup.service` succeeded twice after repair.
|
||||||
|
- `systemctl status hermes-root-backup.service hermes-root-backup.timer --no-pager` showed the service exited `status=0/SUCCESS` and the timer remains active.
|
||||||
|
- `/root/repos/bytelyst_hostinger_hermes_vm` is aligned with `origin/main` after successful backup commits `415e824` and `369e584`.
|
||||||
|
|
||||||
|
**Residual risk:**
|
||||||
|
|
||||||
|
- A restore drill is still required before the backup posture should be considered fully proven.
|
||||||
|
- The backup sync script is runtime-managed under `/root/.hermes/scripts/`; add a tracked installer or source-of-truth copy so this hardening does not depend on manual VM state.
|
||||||
|
|
||||||
## Do Not Start With
|
## Do Not Start With
|
||||||
|
|
||||||
- Rootless Docker migration.
|
- Rootless Docker migration.
|
||||||
|
|||||||
Loading…
Reference in New Issue
Block a user