18 KiB
ByteLyst VM Security Blind Spots Roadmap
Review date: 2026-05-27 Reviewer: Hermes Agent Scope: Hostinger ByteLyst VM, Docker-hosted product stack, Caddy ingress, Gitea/CI, Hermes backup/ops, VM maintenance posture.
Executive Summary
The VM is operational and has several good foundations already in place: UFW is active, fail2ban is running for SSH, unattended upgrades are enabled, Caddy config validates, disk/memory headroom is acceptable, and Hermes persistent-data backup cron is healthy.
The biggest blind spot is that the apparent firewall posture is misleading: UFW only allows SSH, but Docker-published ports create iptables rules that can expose many application, database/emulator, observability, registry, and development ports on 0.0.0.0 / IPv6. Several of those services should either be private-only, routed only through Caddy with auth, or bound to loopback/internal Docker networks.
Second-order risks are SSH hardening gaps, rootful Docker/container hardening gaps, unhealthy apps that can hide failed deploys, an inactive Gitea Actions runner, a failed Hermes backup systemd unit despite cron backup success, and incomplete evidence for restore drills, secret scans, and off-host recovery.
Evidence Snapshot
Collected on 2026-05-27 from this VM.
Host and patching
- Host:
srv1491630 - OS: Ubuntu
25.10 - Kernel:
6.17.0-29-generic - Uptime: about 14 hours at review time
- Root filesystem: 193G total, 71G used, 123G available, 37% used
- Memory: 15Gi total, about 10Gi available
- Swap: 4.0G total, about 1.3G used
- Reboot required: no
- Pending package upgrades included Docker CE/containerd/buildx/compose and security updates for
libgcrypt20,libcaca0, andlibssh2-1t64 - Unattended upgrades: active and configured for automatic reboot at 04:00 with users absent
Network and ingress
- UFW: active; default deny incoming; only
22/tcpallowed by UFW rules - Docker iptables rules are present and publish many ports despite UFW's simple rule list
- Public/listening TCP ports bound on all interfaces included:
22,80,443- app/frontend ports:
3000,3002,3003,3030,3035,3040,3049,3050,3055,3060,3070,3075,3085 - backend/API ports:
4004,4010,4011,4012,4013,4014,4015,4016,4017,4019,4020,4025 - infra/dev ports:
1025,1234,3100,3300,8025,8081,10000,11434
- Caddy source-of-truth config:
/opt/bytelyst/Caddyfile, mounted read-only into thecaddycontainer docker exec caddy caddy validate --config /etc/caddy/Caddyfile: valid config, formatting warning only- Caddy public hostnames include:
api.bytelyst.comgitea.bytelyst.comadmin.bytelyst.comdevops.bytelyst.comtracker.bytelyst.comllmlab.bytelyst.comollama.bytelyst.comtrading-api.bytelyst.cominvttrdg.bytelyst.comnotes.bytelyst.comclock.bytelyst.com
SSH and account surface
Effective sshd -T settings showed:
permitrootlogin yespasswordauthentication yespubkeyauthentication yeskbdinteractiveauthentication nomaxauthtries 6x11forwarding yesclientaliveinterval 0
fail2ban is active with one jail: sshd; no current bans at review time.
Docker runtime and containers
- Docker: client/server
29.4.2; newer Docker packages are available - Docker daemon is rootful; security options showed AppArmor, seccomp builtin, and cgroup namespaces;
live_restore=false - Most product containers run with writable root filesystems and no explicit
userconfigured cadvisoris privilegedDOCKER-USERchain appears empty, so there is no central Docker firewall policy in front of published containers- Multiple containers are unhealthy:
learning_ai_common_plat-llmlab-dashboard-1learning_ai_common_plat-actiontrail-web-1learning_ai_common_plat-jarvisjr-web-1learning_ai_common_plat-localmemgpt-web-1learning_ai_common_plat-nomgap-web-1learning_ai_common_plat-flowmonk-web-1learning_ai_common_plat-mindlyst-web-1
Gitea and CI
- Gitea public route:
https://gitea.bytelyst.com - Local Gitea container port: host
3300-> container3000, bound on0.0.0.0and IPv6 gitea-act-runner.service: enabled but inactive/dead- Runner user exists:
gitea-runner, member ofdocker - Runner config directory permissions look reasonable:
/home/gitea-runner/act_runner:750, owned bygitea-runner:gitea-runner/home/gitea-runner/act_runner/config.yaml:600, owned bygitea-runner:gitea-runner
Backup and operations
systemctl --failedshowed failed unit:hermes-root-backup.service—Sync root Hermes persistent backup to GitHub
- Hermes cron backup is active and healthy:
- job
470832621b43,Sync Hermes persistent-data backup to GitHub, every 30 minutes, last runok
- job
- Existing VM maintenance cron entries exist for health check and cleanup under
/opt/bytelyst/learning_ai_devops_tools/scripts/VMs/HostingerVM/ - A root crontab entry still references
/opt/bytelyst/bytelyst-devops-tools/monitor-lucky25-execution.sh, which may be stale after repo relocation/renaming
Blind Spots and Risk Register
P0 — Internet-exposed Docker ports bypass the intended ingress model
Risk: UFW suggests only SSH is allowed, but Docker-published ports expose many services directly on all interfaces. This can bypass Caddy, TLS, auth, logging, rate limiting, and hostname/path controls.
Examples observed: 3300, 8025, 1025, 1234, 8081, 10000, 11434, many 30xx web ports, and many 40xx backend ports.
Impact: Direct access to dev/infra services, internal APIs, emulators, mail tooling, dashboards, or model endpoints if upstream firewall/provider rules do not block them.
Roadmap:
- Create a canonical exposure inventory: service, container, host port, public hostname, required audience, auth requirement.
- For each service, decide one of: public via Caddy, private via Tailscale/SSH, loopback-only host port, Docker-internal only, or remove.
- Bind non-public Compose ports to
127.0.0.1or remove host port mapping entirely. - Add a
DOCKER-USERchain policy to drop unsolicited traffic to non-approved published ports before Docker's accept rules. - Keep only
80/443and intentionally public SSH exposed at the provider/firewall layer. - Add a recurring check that compares
ss -ltnand Docker published ports against the approved inventory.
P0 — SSH permits root login and password authentication
Risk: PermitRootLogin yes and PasswordAuthentication yes keep the primary admin surface broad. fail2ban helps, but password-enabled root SSH is still high-risk for an internet-facing VM.
Roadmap:
- Confirm all required admin users have working SSH keys and sudo access.
- Add a non-root break-glass admin path if one does not exist.
- Change SSH effective config to:
PermitRootLogin prohibit-passwordornoPasswordAuthentication noX11Forwarding no- lower
MaxAuthTries, e.g.3 - set a sane
ClientAliveInterval/ClientAliveCountMax
- Validate with a second session before restarting SSH.
- Record rollback commands and keep console/provider access available during rollout.
P0 — Public/private boundary for dev and internal tooling is unclear
Risk: Caddy publishes ollama.bytelyst.com, llmlab.bytelyst.com, devops.bytelyst.com, admin.bytelyst.com, and Gitea. Some may be intended, but the roadmap lacks an explicit auth/access decision for each.
Roadmap:
- Document public hostnames, auth model, and data sensitivity.
- Require explicit approval before exposing new dashboards or model endpoints.
- Add Caddy auth/IP allowlist/Tailscale-only strategy for admin-like surfaces.
- Add security headers/auth checks to public UI health reviews.
- Confirm
ollama.bytelyst.comshould be publicly reachable at all; if not, move behind private network or auth gate.
P1 — Docker/container hardening is mostly default
Risk: Many containers run as default/root user, writable rootfs, broad capabilities by default, and rootful Docker. A compromised app gets more host-adjacent leverage than needed.
Roadmap:
- Create a per-service Docker hardening matrix: user, read-only rootfs, dropped capabilities, no-new-privileges, resource limits, healthcheck, restart policy, secrets handling.
- Start with public-facing/backend services and admin dashboards.
- Add
security_opt: ["no-new-privileges:true"]where compatible. - Add
cap_drop: ["ALL"]and selectively add back capabilities only when needed. - Convert app images to non-root users consistently.
- Use
read_only: trueplus explicit writable tmp/cache volumes where compatible. - Review
cadvisorprivileged mode and replace/restrict if possible. - Enable Docker
live-restoreif it fits maintenance operations.
P1 — Unhealthy containers can normalize broken deployments
Risk: Multiple app web containers are unhealthy while still running. If unhealthy states are ignored, deploy regressions and broken public pages can persist unnoticed.
Roadmap:
- Triage each unhealthy container and classify: real app failure, bad healthcheck, intentionally unused, or deprecated.
- Fix or remove bad healthchecks so Docker health state is trustworthy.
- Add alerting for sustained unhealthy containers.
- Make deployment scripts fail on unhealthy post-deploy state.
- Update dashboard/observability docs with current service ownership and expected state.
P1 — Gitea Actions runner is enabled but inactive
Risk: CI/deploy assumptions may be wrong. If a runner is expected to deploy or publish packages, inactive runner state blocks automation and may cause manual drift.
Roadmap:
- Decide whether the runner should be active or intentionally disabled.
- If active: restart and verify
gitea-act-runner.service, runner labels, Docker access, and a smoke workflow. - If disabled: disable the service and document the intentional state.
- Keep runner secrets separate from smoke/test workflows.
- Add a runner-health check to VM observability.
P1 — Backup/restore evidence is split and one backup unit is failed
Risk: Hermes cron backup works, but hermes-root-backup.service is failed. There is no recent full restore drill evidence in this review. A backup that cannot be restored is only an assumption.
Roadmap:
- Inspect
hermes-root-backup.servicelogs and decide whether to fix, disable, or replace it with the cron-backed job. - Document all backup mechanisms: Hermes, Gitea data, Docker volumes, app data, Caddy certs/config, environment/secrets escrow.
- Run a restore drill into a non-production path/profile.
- Verify no raw
.env, OAuth tokens, private keys, SQLite WAL/SHM, or raw transcript DBs are committed. - Add backup freshness and restore-drill status to the monthly VM review.
P1 — Patch management has pending security/runtime updates
Risk: Unattended upgrades are on, but Docker and security package updates were pending at review time. Docker updates may need controlled restart/redeploy planning.
Roadmap:
- Add a weekly patch review checkpoint that reports pending security and Docker updates separately.
- Define a Docker upgrade maintenance window with pre/post checks.
- Run
apt list --upgradableand capture package classes without dumping noise. - Verify apps after Docker/containerd upgrades.
P2 — Ubuntu 25.10 lifecycle risk needs explicit tracking
Risk: Ubuntu interim releases have short support windows. If this VM is long-lived production infrastructure, lifecycle tracking matters.
Roadmap:
- Record current Ubuntu 25.10 support/EOL date in ops docs.
- Decide whether to stay on interim releases or migrate to an LTS baseline.
- Add an OS lifecycle check to quarterly review.
P2 — Repository/config secret hygiene needs a repeatable scanner
Risk: The DevOps repo contains operational inputs and historical/deleted repo copies exist on disk. Manual review can miss tokens in old files, generated JSON, logs, backups, or abandoned directories.
Roadmap:
- Add a documented secret-scan command using
gitleaksortrufflehogfor tracked files and selected untracked ops directories. - Scan historical directories such as
DELETED_bytelyst-devops-toolsseparately before archiving or deleting. - Add
.gitignorepatterns for generated scans, local account snapshots, and credential-shaped outputs. - Keep examples as
.examplefiles only.
P2 — Cron/systemd ownership and drift are not fully inventoried
Risk: Root crontab references old repo paths and there are multiple cron/systemd sources. Stale jobs can fail silently or mutate production unexpectedly.
Roadmap:
- Inventory root/user crontabs,
/etc/cron.d, systemd timers, Hermes cron, and Gitea Actions schedules. - Remove or update stale
/opt/bytelyst/bytelyst-devops-tools/...references after confirming replacements. - Add owner, purpose, expected output, and alert channel for every job.
- Add a stale-job detector for missing script paths and failed systemd units.
P2 — Observability exists but needs security-focused SLOs
Risk: Prometheus/Grafana/Loki/exporters are present, but security-focused alerts are not yet proven from this review.
Roadmap:
- Add alerts for unexpected public ports, failed units, unhealthy containers, high disk/swap, backup staleness, Gitea runner inactive, and SSH auth spikes.
- Validate alert delivery to Telegram.
- Keep internal observability endpoints private; do not publish Prometheus/Loki/node-exporter/cAdvisor directly.
Execution Plan
Phase 0 — Freeze and inventory before changes
- Freeze new public hostnames/ports until the exposure inventory is complete.
- Generate
docs/vm-exposure-inventory.mdfrom Docker, Caddy,ss, and DNS. - Mark each exposed service as
public,private,internal-only, orretire. - Review with S before changing public access for customer/user-facing apps.
Phase 1 — Immediate security hardening
- Close or loopback-bind non-public Docker host ports.
- Add
DOCKER-USERdefault-deny rules for non-approved ports. - Harden SSH root/password access after key-based access is verified.
- Put
ollama.bytelyst.com, admin dashboards, and dev tooling behind private/auth-gated access unless explicitly approved as public.
Phase 2 — Operational correctness
- Fix/retire unhealthy containers.
- Resolve
hermes-root-backup.servicefailed state. - Decide and document Gitea runner active/disabled state.
- Remove stale cron paths and add missing-script checks.
- Apply pending security/runtime updates in a maintenance window.
Phase 3 — Docker and app hardening
- Add non-root users,
no-new-privileges, cap drops, and read-only rootfs by service. - Add resource limits for noisy services and emulators.
- Move emulators/dev tools off public bindings.
- Review cAdvisor privilege and observability surface.
Phase 4 — Backup, restore, and incident readiness
- Define full backup map: Hermes, Gitea, Caddy, Docker volumes, app DB/state, secrets escrow.
- Perform restore drill to non-prod target.
- Add incident runbooks: compromised container, leaked token, SSH brute force, disk full, failed Docker upgrade.
- Add quarterly tabletop review.
Phase 5 — Continuous governance
- Monthly VM security review cron/checklist.
- Secret scan before DevOps repo pushes.
- OS lifecycle/EOL tracker.
- Drift detection for ports, Caddy routes, Docker health, systemd failures, and cron paths.
Suggested First Tickets
- P0: Build and review exposure inventory — produce exact approved/blocked list for all currently bound ports.
- P0: Lock Docker-published non-public ports — bind to loopback/internal or enforce
DOCKER-USERdrops. - P0: Harden SSH — disable password/root login after confirming key-based admin access.
- P1: Triage unhealthy containers — fix healthchecks/apps or retire dead services.
- P1: Resolve failed Hermes backup unit — fix or disable duplicate failed unit; keep cron backup healthy.
- P1: Decide Gitea runner state — active smoke-tested runner or documented disabled service.
- P2: Add secret scanner and stale-job scanner — prevent silent credential and automation drift.
Verification Commands for Future Runs
# Host/security baseline
date -Is
uname -a
. /etc/os-release && echo "$PRETTY_NAME"
apt-get -s upgrade | awk '/^Inst /{print}'
test -f /var/run/reboot-required && cat /var/run/reboot-required || echo no-reboot-required
# Firewall and public bind inventory
ufw status verbose
iptables -S DOCKER-USER
ss -ltnup
# SSH effective config
sshd -T | egrep '^(permitrootlogin|passwordauthentication|pubkeyauthentication|kbdinteractiveauthentication|maxauthtries|x11forwarding|clientaliveinterval)'
fail2ban-client status sshd
# Docker health/security
docker ps --format '{{.Names}}\t{{.Status}}\t{{.Ports}}'
docker ps -q | xargs -r docker inspect --format '{{.Name}} user={{.Config.User}} privileged={{.HostConfig.Privileged}} readonly={{.HostConfig.ReadonlyRootfs}} ports={{json .NetworkSettings.Ports}}'
# Caddy and ingress
docker exec caddy caddy validate --config /etc/caddy/Caddyfile
sed -n '1,220p' /opt/bytelyst/Caddyfile
# Backup/cron/systemd drift
systemctl --failed --no-pager
hermes cron list
crontab -l
for f in /etc/cron.d/*; do echo "--- $f"; sed -n '1,80p' "$f"; done
Notes
- This review did not change firewall, SSH, Docker, Caddy, or service settings. It intentionally documents the risk and remediation order before making potentially disruptive security changes.
- Public exposure changes should be handled in small maintenance windows with pre/post health checks because this VM hosts multiple ByteLyst apps.
- The Caddyfile validates today, but Caddy formatting should be normalized in a separate low-risk docs/ops cleanup if desired.