bytelyst-devops-tools/ENGINEERING_REVIEW_SCORECARD.md

# Engineering Review & Scorecard

> Evidence-based, read-only review of the entire `~/code/mygh` workspace (~38 git
> repos) per `docs/prompts/engineering-review-scorecard.md`. Generated 2026-05-30.
>
> **Method:** static inspection only — file reads, `grep`, and read-only `git`.
> No builds, installs, or test runs were executed (that would mutate the trees),
> so dynamic results (pass/fail, coverage %) are inferred from config + test
> counts, not measured. See §9 for limits. Per-repo evidence was gathered by
> parallel read-only agents and spot-verified.

---

## 1. Executive Summary

**What this is:** a single developer running a surprisingly coherent *product
ecosystem* — ~10 product apps (clock, notes, fastgap, peakpulse, flowmonk,
efforise, jarvis_jr, trails, talk2obsidian, local-memory-gpt, voice-ai-agent,
multimodal/mindlyst) sharing one platform monorepo (`learning_ai_common_plat`,
36 `@bytelyst/*` packages, auth/Cosmos/design-tokens), orchestrated by a single
`docker-compose.ecosystem.yml` (~20 services) and driven heavily by AI agents
through a homegrown `agent-queue`. This is far more disciplined than a typical
"learning" folder.

**Overall maturity:** **Beta-quality ecosystem.** A core of genuinely
production-grade repos (`learning_ai_notes`, `learning_ai_trails`,
`oss/claw-code`/`claw-cowork`, `learning_ai_clock`, `learning_ai_fastgap`)
surrounded by a long tail of MVP/prototype repos with thin or zero tests and no
CI.

**Biggest strengths (top 3)**
1. **Strong platform discipline.** Shared `@bytelyst/*` packages, a repeated
   `types.ts → repository.ts → routes.ts` backend pattern, Cosmos partition-key
   conventions (`/userId`, `productId` on every doc), per-repo `AGENTS.md`,
   conventional commits, and field-level encryption (`field-encrypt.ts`) recur
   across the best repos.
2. **Clean security posture for a personal workspace.** Secret scans across all
   repos surfaced **no real committed production secrets** — only `.env.example`
   placeholders, the public Azure Cosmos emulator key, dev `JWT_SECRET=dev-...`
   values, and Azure Key Vault *references*. `.gitignore` is present nearly
   everywhere.
3. **Top repos are legitimately good.** `notes`, `trails`, and the two Rust
   `claw-*` repos show modular architecture, real test suites (28–80+ files),
   CI, multi-stage Docker, and strict typing (`0` `as any` in several backends).

**Biggest risks (top 3)**
1. **CI is the weak link.** GitHub Actions is **disabled (billing)** on the
   platform monorepo `learning_ai_common_plat` and on `voice_ai_agent`
   (`*.disabled` workflows); ~15 repos have **no CI at all**. The shared
   platform that everything depends on has no automated gate.
2. **Process churn dirties the repos.** A live `agent-queue` daemon + `devin`
   agents in `--permission-mode dangerous` were actively writing to repos; ~14
   repos were found dirty with uncommitted work, several behind `origin`. Work
   is at risk of being lost or silently diverging.
3. **Testing is bimodal.** Excellent in the flagship repos, **zero** in many
   others (`productivity_web`, `webui_copilot`, `pytorch_todo_predictor`,
   `server-survival`, `sidecar_setup`, `mac_tooling`). No portfolio-wide
   coverage signal.

**Is the dev style helping or hurting velocity?** **Net helping, but fraying at
the edges.** The platform/agent approach clearly lets one person ship a dozen
apps — that's the upside. The drag is operational: disabled CI, constantly-dirty
working trees, abandoned worktrees, and "AI-generated scaffolding smell" in a
few repos (e.g. `magic_clipboard_mgr`'s 50+ service files + phase-named test
buckets). Tightening the commit/CI loop would convert a lot of that churn back
into velocity.

---

## 2. Overall Score Sheet

Scores are 1–10 (1 = critical/broken, 10 = production-grade), aggregated across
the ~30 code repos (pure docs/usage repos excluded from category math).

| Category | Score | Justification (evidence) |
|---|---|---|
| A. Repository organization | **8** | Consistent `@bytelyst/*` + `types/repository/routes` pattern, per-repo `AGENTS.md`, clear monorepos; minus for ~14 dirty trees, stray worktrees, a few unstructured repos. |
| B. Code quality | **7** | Flagships: strict TS, `0` `as any`, no `console.log`, Zod validation. Tail: `print()`-heavy (`2nd_brain` 60+, `mac_tooling` 200+), `any` leaks, AI-scaffold smell (`magic_clipboard_mgr`). |
| C. Architecture | **8** | Genuinely strong: shared platform, datastore abstraction, deterministic engines (`flowmonk` scheduler), risk-scoring (`trails`), MCP integrations, clean native/web boundaries. |
| D. DevOps & deployment | **6** | Ecosystem compose orchestrates ~20 services, multi-stage Dockerfiles common — but **CI disabled on the platform repo**, ~15 repos with no CI, and **0 healthchecks** in `docker-compose.ecosystem.yml`. |
| E. Testing | **6** | Bimodal: `notes`/`fastgap`/`clock`/`trails`/`claw-*` have 28–600+ tests; many repos have 0. E2E frequently `continue-on-error: true`. No measured coverage. |
| F. Security | **8** | No real committed secrets anywhere; field encryption + Key Vault refs in the mature repos; `.gitignore`/`.env.example` discipline. Minus for `NODE_TLS_REJECT_UNAUTHORIZED=0` in some Docker, thin input-validation in prototypes. |
| G. Product readiness | **7** | Several apps runnable end-to-end (web+backend); mobile/native surfaces often partial; CI-disabled + flaky E2E hold back true "launchable". |
| H. AI-agent practices | **6** | Impressive tooling (`agent-queue`, profiles, job briefs, `AGENTS.md`), but guardrails are weak: `--permission-mode dangerous`, agents dirtying live repos, duplicate work landing upstream, no enforced test-before-commit. |
| I. Personal workflow | **6** | Good: conventional commits, auto `backup-main-*` branches, `AGENTS.md`. Bad: ~14 dirty repos, branches behind `origin`, abandoned worktrees, no unified release/issue discipline. |
| **Weighted overall** | **≈ 7.0** | Beta-quality. See weighting below. |

**Weighting & rationale:** Security (F) and Product readiness (G) weighted ~1.5×,
Testing (E) and DevOps (D) ~1.25× (these gate real-world reliability);
A/B/C/H/I at 1.0×. The strong architecture/security pull the number up; the
weak CI/testing pull it back to a solid-but-not-shippable **~7.0**.

---

## 3. Per-Product / Per-Repo Breakdown

Maturity legend: **PROD** = production-grade, **BETA**, **MVP**, **PROTO** =
prototype/learning, **REF** = docs/reference (not code).

### Flagship products (platform-integrated)
| Repo | Stack | Tests | CI | Docker | Maturity |
|---|---|---|---|---|---|
| `learning_ai_notes` | Fastify5 + Next16 + Expo, Cosmos | 80+ files | ✓ gitea | ✓ | **BETA→PROD** |
| `learning_ai_trails` | Fastify5 + Next16 + SDK, Cosmos | 28 files | ✓ gitea | ✓ | **PROD** |
| `learning_ai_clock` | Next16 PWA + iOS/Android, Fastify | 662 total | ✓ gitea | ✓ | **BETA** |
| `learning_ai_fastgap` | Expo + Next16 + Fastify | 700+ total | ✓ gitea (7 jobs) | ✓ | **BETA** |
| `learning_ai_peakpulse` | SwiftUI + Fastify | 26 files | ✓ (backend) | ✓ | **BETA→PROD** |
| `learning_ai_flowmonk` | Next16 + Fastify + Expo | 102 backend | ✓ gitea | ✓ | **BETA** |
| `learning_ai_efforise` | React/Vite + Fastify + RN | ~9 backend | ✓ gitea | ✓ | **MVP** |
| `learning_ai_dev_intelli` | Fastify + Next16, GitHub API | 52 backend | ✓ gitea | ✓ | **MVP** |
| `learning_ai_local_memory_gpt` | Fastify + Next16, SQLite/Ollama | 122 | ✓ gitea | ✓ | **MVP** |
| `learning_ai_talk2obsidian` | Fastify + Vite, SQLite/Ollama | 8 | ✗ | ✓ | **BETA** |
| `learning_voice_ai_agent` | Python + Fastify + Next + KMP | 463+ | ⚠ disabled | ✓ | **BETA** |
| `learning_multimodal_memory_agents` (MindLyst) | KMP + Next + Fastify | 33 | ⚠ disabled | ✓ | **MVP** |
| `learning_ai_jarvis_jr` | SwiftUI + Next + Android | ~13 web | ✓ gitea | ✓ | **ALPHA/BETA** |
| `learning_ai_auth_app` | iOS/watchOS/Android (spec+UI) | 0 (here) | ✗ | ✗ | **MVP (spec)** |

### Platform & infra
| Repo | Stack | Notes | Maturity |
|---|---|---|---|
| `learning_ai_common_plat` | pnpm monorepo, 36 `@bytelyst/*`, Fastify, Cosmos | ~466k LOC; full auth (OAuth/MFA/passkeys/SAML); **GH Actions disabled (billing)**, gitea CI active | **PROD** |
| `learning_ai_devops_tools` | Bash + Python + Node (this repo) | GitHub admin scripts, `agent-queue`, Hermes dashboard; thin tests | **PROD (scripts) / MVP (dash)** |
| `learning_ai_k8s_streaming` | Python FastAPI + Helm | Use-case registry, HPA/probes, load tools | **BETA→PROD** |
| `learning_ai_local_llms` | Next16 dashboard + Python TTS | Ollama mission-control; 57 tests | **BETA** |

### Tools / OSS / native
| Repo | Stack | Notes | Maturity |
|---|---|---|---|
| `oss/learning_ai_claw-code-oss` | Rust workspace (10+ crates) | `unsafe forbid`, clippy pedantic, 40+ test files | **PROD** |
| `oss/learning_ai_claw-cowork` | Rust + Tauri + Python | 65+ test files, E2E, Docker | **PROD** |
| `learning_magic_terminal` | **Rust** | README+CI+many tests; command-blocks v2; dirty(5) | **BETA** |
| `learning_notif_scanr` | **Swift** (Package.swift) | tests present, **no CI**, no Docker | **MVP** |
| `ios/learning_swift_hourglass` | Swift/SwiftUI macOS | MVVM, 2 test files, no CI | **MVP** |
| `learning_ai_magic_clipboard_mgr` | Swift/macOS, GRDB | 24 tests but 50+ services + phase-named tests (AI-scaffold smell) | **MVP** |
| `learning_ai_mac_tooling` | Python FastAPI + React | forensics toolkit; **0 tests**, 200+ `print()`, 3k-line files | **PROTO** |
| `copilot/learning_ai_uxui_web` | Next16 + MSW + Playwright | component showcase, Lighthouse CI | **MVP** |
| `learning_ai_productivity_web` | Next15, client-only | clean registry pattern, **0 tests** | **MVP** |
| `learning_ai_webui_copilot` | Python FastAPI + LangChain | rules/policy engines, **0 tests, no Docker/CI** | **MVP** |
| `learning_agent_monitoring_fx` | npm monorepo + KMP | agent/ingest/web work, native WIP, 54 `console.log`, TODOs | **BETA** |
| `learning_agentic_tools_portal` | Python Flask + uv | minimal (1 endpoint, 1 test), has CI | **PROTO** |
| `learning_server-survival-devops-web` | Vanilla JS + Three.js | playable game, **0 tests** | **MVP** |
| `learning_pytorch_todo_predictor` | Python + PyTorch | educational, **0 tests**, **no upstream** | **PROTO** |
| `learning_sidecar_setup` | Next16 scaffold + py stub | scaffolding only, **no upstream**, dirty(8) | **PROTO** |
| `learning_claude_code_setup` | Bash + markdown | setup notes/scripts; dirty(1) | **REF** |
| `learning_github_copilot` | Markdown (CLI/SDK docs) | reference only | **REF** |
| `learning_python_sandbox` | Python | LeetCode/learning; dirty(1) | **PROTO** |
| `learning_ai_materials` | Docs | NBA handover package | **REF** |
| `learning_windsurf_setup` | Usage logs | not a codebase | **N/A** |

---

## 4. Findings by Dimension

### A. Repository organization
- **Fact:** Strong, repeated conventions — `AGENTS.md`/`CLAUDE.md` per repo, pnpm
  workspaces, `types→repository→routes` backend modules, `docs/` with PRD/ROADMAP.
- **Fact:** ~14 repos dirty at audit time; abandoned `worktrees/` (now cleaned);
  some repos behind `origin`. Two repos (`pytorch_todo_predictor`,
  `sidecar_setup`) have **no git upstream**.
- **Reco:** Adopt a "clean tree or it doesn't exist" rule (see §8). Add upstreams
  for the two orphan repos or mark them clearly local.

### B. Code quality
- **Fact:** Best repos enforce strict TS (`0` `as any` in `notes`, `trails`,
  `local_memory_gpt` backends), no `console.log` (Fastify logger), Zod validation.
- **Fact:** `learning_ai_2nd_brain` has 60+ `print()`; `mac_tooling` 200+ and
  3k+-line files (`network_transfer_audit.py` 3521 lines); `magic_clipboard_mgr`
  shows AI-scaffold smell (50+ service files, `Phase5–8`/`RemainingQATests`).
- **Reco:** Lint-gate `print()`/`console.log` in the Python/TS repos; split the
  3k-line files; audit `magic_clipboard_mgr` for stubbed vs real services.

### C. Architecture
- **Fact:** Clear separation and reuse: shared auth/datastore/design-tokens,
  deterministic scheduler (`flowmonk`), risk engine (`trails`), use-case registry
  (`k8s_streaming`), MCP tool servers, Rust crate boundaries (`claw-*`).
- **Reco:** This is the strongest dimension — protect it by keeping product
  domains out of `common_plat` and vice-versa.

### D. DevOps & deployment
- **Fact:** `docker-compose.ecosystem.yml` wires ~20 services (10 backends + 10
  webs) + infra (Cosmos emulator, Azurite, Traefik, Loki, Grafana, MCP); 30
  `restart:` policies, 24 `build:` contexts, but **0 `healthcheck:` blocks**.
- **Fact:** GH Actions disabled on `common_plat` + `voice_ai_agent`; ~15 repos no CI.
- **Reco (P1):** Add healthchecks + `depends_on: condition: service_healthy` to
  the ecosystem compose; re-enable or fully migrate CI to gitea self-hosted.

### E. Testing
- **Fact:** `fastgap` (~700), `clock` (662), `notes` (80+ files), `voice_ai_agent`
  (463+), `claw-cowork` (65+ files) are excellent; ~8 repos have 0 tests.
- **Fact:** E2E often `continue-on-error: true` (`fastgap`, `flowmonk`,
  `jarvis_jr`, `local_memory_gpt`) — i.e. not actually gating.
- **Reco:** Set a per-repo minimum (smoke + happy-path) and stop masking E2E
  failures with `continue-on-error` once stabilized.

### F. Security
- **Fact:** No real committed secrets across all repos. Matches were
  `.env.example` placeholders, the public Cosmos emulator key
  (`C2y6yDjf5/R...`), `dev-*` JWT secrets, and Azure Key Vault references.
- **Fact:** Field encryption (AES-256-GCM) in `clock`/`notes`/`dev_intelli`;
  `unsafe_code = "forbid"` in the Rust repos.
- **Watch:** `NODE_TLS_REJECT_UNAUTHORIZED=0` seen in some Docker setups; thin
  input validation / no rate-limiting in the prototype Python apps.

### G. Product readiness
- **Fact:** Web+backend pairs generally run end-to-end; native/mobile surfaces
  (iOS/Android/KMP) are frequently partial or scaffolded.
- **Reco:** Pick 2–3 flagships (`notes`, `trails`, `clock`) and drive them to a
  true launch checklist; treat the rest explicitly as experiments.

### H. AI-agent practices
- **Fact:** Sophisticated `agent-queue` (profiles, job briefs, lifecycle dirs,
  Node dashboard) — genuinely advanced for a solo setup.
- **Fact:** Guardrails weak: agents run `--permission-mode dangerous`, write to
  live working trees (caused the dirty-repo churn), and **landed duplicate work**
  (during this session a rebase auto-dropped 2 commits already pushed upstream).
- **Reco:** Standardize the agent task contract (§8): one task = one branch =
  clean tree → tests → commit → push; ignore runtime/queue state in git (already
  fixed in this repo this session).

### I. Personal engineering workflow
- **Fact:** Conventional commits, auto `backup-main-*` branches (nice safety net),
  `AGENTS.md` discipline.
- **Fact:** Too many long-lived dirty trees and behind-`origin` branches; no
  visible issue tracker or release cadence.
- **Reco:** A weekly "sync sweep" (rebase+push all clean repos, list dirty) — you
  effectively did this manually this session; automate it.

---

## 5. Prioritized Action Plan

**P0 — now (correctness / risk)**
1. **Re-establish a working CI gate on `learning_ai_common_plat`** (everything
   depends on it). Either fix GH Actions billing or make gitea CI the enforced
   gate. *(M, common_plat)*
2. **Resolve the ~14 dirty repos**: review + commit or discard intentionally;
   add upstreams for `pytorch_todo_predictor` & `sidecar_setup`. *(M, workspace)*
3. **Decide the agent-queue daemon policy** so it doesn't write to live trees
   uncontrolled (it was running in `dangerous` mode). *(S, devops_tools)*

**P1 — this week**
4. Add **healthchecks** to `docker-compose.ecosystem.yml` (0 today) + ordered
   `depends_on`. *(M, common_plat/ecosystem)*
5. Stop masking E2E with `continue-on-error: true` once stabilized; make at least
   smoke E2E gating. *(M, fastgap/flowmonk/jarvis_jr)*
6. Replace `print()` with logging in `2nd_brain` (60+) and `mac_tooling` (200+).
   *(S–M)*

**P2 — this month**
7. Add minimum test suites to the 0-test repos that matter (`productivity_web`,
   `webui_copilot`, `agent_monitoring_fx`). *(M)*
8. Audit `magic_clipboard_mgr` for dead/stubbed services (50+ files). *(M)*
9. Split 3k-line files in `mac_tooling`. *(M)*
10. Remove `NODE_TLS_REJECT_UNAUTHORIZED=0` from Docker; add rate-limiting to the
    Python prototypes. *(S–M)*

**P3 — nice to have**
11. Portfolio-wide coverage reporting + dependency audit (`npm audit`/`pip-audit`)
    in CI. *(M)*
12. A lightweight issue/release cadence for the 2–3 flagships. *(S)*

---

## 6. Safe Auto-Fix Candidates
*(Low-risk; listed only — not applied. Each needs your approval.)*
- **Ecosystem compose healthchecks** — add `healthcheck:` to each backend/web
  service in `docker-compose.ecosystem.yml`. Safe: additive.
- **Add upstreams** for `learning_pytorch_todo_predictor` and
  `learning_sidecar_setup` (`git remote add origin … && git push -u`). Safe once
  remote exists.
- **Lint rule to ban `print()`** in `learning_ai_2nd_brain` (ruff `T20`) — flags
  only; you fix incrementally.
- **Drop `NODE_TLS_REJECT_UNAUTHORIZED=0`** from Docker envs where a real CA/host
  override is available. (Verify per service first.)
- **`.gitignore` audit** for the few repos still tracking runtime artifacts
  (pattern already fixed in `devops_tools` this session).

## 7. Delegate-to-Agent Queue
Ready-to-paste briefs (each self-contained, one branch, clean-tree rule):
1. **"Add healthchecks to ecosystem compose"** — repo `common_plat`; read
   `docker-compose.ecosystem.yml`; add `healthcheck` + ordered `depends_on` to
   all `*-backend`/`*-web` services; `docker compose config` must pass; no app
   code changes.
2. **"De-`print()` 2nd_brain"** — repo `learning_ai_2nd_brain`; replace `print()`
   with `typer.echo`/logging in `src/brain/**`; keep behavior identical; run
   `pytest`.
3. **"Bootstrap tests for webui_copilot"** — repo `learning_ai_webui_copilot`;
   add `pytest` smoke tests for `site_backend` rules/policy engines + a copilot
   happy-path; wire a `.github`/gitea CI job.
4. **"Service audit: magic_clipboard_mgr"** — repo `learning_ai_magic_clipboard_mgr`;
   produce a report of which of the 50+ services are wired vs stubbed; no code
   changes.
5. **"Stabilize E2E"** — repos `fastgap`/`flowmonk`; make smoke E2E reliable, then
   remove `continue-on-error: true` for that job only.

## 8. Recommended Standard Operating Procedure (for every agent task)
1. **One task = one branch** off latest `origin/main`; never work on a dirty tree.
2. **Scope it** with a job brief (you already do this in `agent-queue/docs/jobs/`).
3. **Test before commit**: typecheck + lint + unit must pass locally.
4. **Commit small**, conventional messages; **push the branch**, open a PR — don't
   let agents push straight to `main` of the shared platform.
5. **Never track runtime/queue state** (ignore `agent-queue/queue/*` lifecycle —
   fixed here this session).
6. **Prefer least-privilege** over `--permission-mode dangerous`; reserve dangerous
   mode for sandboxed/disposable checkouts.
7. **Weekly sync sweep**: rebase+push all clean repos, list dirty ones for review.

## 9. What I Could Not Inspect
- **No dynamic results.** I did not run `npm/pnpm install`, builds, `pytest`,
  `vitest`, Playwright, `cargo test`, or `docker compose up` (those mutate trees /
  need services). Test counts and CI configs are evidence of *intended* coverage,
  not measured pass/coverage.
- **No live `git` per-repo ahead/behind** inside the read-only agents (they lacked
  shell git); branch/dirty facts come from the orchestrator's own checks and may
  have shifted as the agent-queue daemon ran.
- **One agent batch misfired**: it reported 5 repos as "missing"
  (`claude_code_setup`, `github_copilot`, `magic_terminal`, `notif_scanr`,
  `python_sandbox`) due to a read-access issue; I re-scanned them directly —
  they exist (notably `magic_terminal` = Rust, `notif_scanr` = Swift).
- **Mobile/native depth** (iOS/Android/KMP/Tauri runtime behavior) and **secret
  *values*** were not executed/decrypted — only presence/format was checked.
- **`.env.ecosystem`** holds dev-only values; production secret management
  (Key Vault wiring) was inferred from references, not verified live.

---

### TL;DR
- Coherent **beta-grade product ecosystem** (~38 repos) — far beyond "learning".
- **Architecture & security are strong; CI & testing are the weak links.**
- **P0:** restore a CI gate on `common_plat`, clean the ~14 dirty repos, and rein
  in the `dangerous`-mode agent-queue.
- A handful of flagships (`notes`, `trails`, `claw-*`, `clock`, `fastgap`) are
  genuinely production-grade; the long tail is MVP/prototype.
- Tighten the agent commit/CI loop (§8) and most of the operational churn
  converts back into velocity.