docs: add workspace engineering review & scorecard

Read-only, evidence-based review of the ~38-repo workspace produced from docs/prompts/engineering-review-scorecard.md: per-repo breakdown, 1-10 category scorecard (weighted overall ~7.0, beta-grade), prioritized P0-P3 action plan, safe auto-fix candidates, delegate-to-agent queue, and an agent SOP. No code changes. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2026-05-30 20:53:44 -07:00 · 2026-05-30 20:53:44 -07:00 · 32162312a9
commit 32162312a9
parent 1bcea394f5
1 changed files with 335 additions and 0 deletions
--- a/ENGINEERING_REVIEW_SCORECARD.md
+++ b/ENGINEERING_REVIEW_SCORECARD.md
@ -0,0 +1,335 @@
+# Engineering Review & Scorecard
+
+> Evidence-based, read-only review of the entire `~/code/mygh` workspace (~38 git
+> repos) per `docs/prompts/engineering-review-scorecard.md`. Generated 2026-05-30.
+>
+> **Method:** static inspection only — file reads, `grep`, and read-only `git`.
+> No builds, installs, or test runs were executed (that would mutate the trees),
+> so dynamic results (pass/fail, coverage %) are inferred from config + test
+> counts, not measured. See §9 for limits. Per-repo evidence was gathered by
+> parallel read-only agents and spot-verified.
+
+---
+
+## 1. Executive Summary
+
+**What this is:** a single developer running a surprisingly coherent *product
+ecosystem* — ~10 product apps (clock, notes, fastgap, peakpulse, flowmonk,
+efforise, jarvis_jr, trails, talk2obsidian, local-memory-gpt, voice-ai-agent,
+multimodal/mindlyst) sharing one platform monorepo (`learning_ai_common_plat`,
+36 `@bytelyst/*` packages, auth/Cosmos/design-tokens), orchestrated by a single
+`docker-compose.ecosystem.yml` (~20 services) and driven heavily by AI agents
+through a homegrown `agent-queue`. This is far more disciplined than a typical
+"learning" folder.
+
+**Overall maturity:** **Beta-quality ecosystem.** A core of genuinely
+production-grade repos (`learning_ai_notes`, `learning_ai_trails`,
+`oss/claw-code`/`claw-cowork`, `learning_ai_clock`, `learning_ai_fastgap`)
+surrounded by a long tail of MVP/prototype repos with thin or zero tests and no
+CI.
+
+**Biggest strengths (top 3)**
+1. **Strong platform discipline.** Shared `@bytelyst/*` packages, a repeated
+   `types.ts → repository.ts → routes.ts` backend pattern, Cosmos partition-key
+   conventions (`/userId`, `productId` on every doc), per-repo `AGENTS.md`,
+   conventional commits, and field-level encryption (`field-encrypt.ts`) recur
+   across the best repos.
+2. **Clean security posture for a personal workspace.** Secret scans across all
+   repos surfaced **no real committed production secrets** — only `.env.example`
+   placeholders, the public Azure Cosmos emulator key, dev `JWT_SECRET=dev-...`
+   values, and Azure Key Vault *references*. `.gitignore` is present nearly
+   everywhere.
+3. **Top repos are legitimately good.** `notes`, `trails`, and the two Rust
+   `claw-*` repos show modular architecture, real test suites (28–80+ files),
+   CI, multi-stage Docker, and strict typing (`0` `as any` in several backends).
+
+**Biggest risks (top 3)**
+1. **CI is the weak link.** GitHub Actions is **disabled (billing)** on the
+   platform monorepo `learning_ai_common_plat` and on `voice_ai_agent`
+   (`*.disabled` workflows); ~15 repos have **no CI at all**. The shared
+   platform that everything depends on has no automated gate.
+2. **Process churn dirties the repos.** A live `agent-queue` daemon + `devin`
+   agents in `--permission-mode dangerous` were actively writing to repos; ~14
+   repos were found dirty with uncommitted work, several behind `origin`. Work
+   is at risk of being lost or silently diverging.
+3. **Testing is bimodal.** Excellent in the flagship repos, **zero** in many
+   others (`productivity_web`, `webui_copilot`, `pytorch_todo_predictor`,
+   `server-survival`, `sidecar_setup`, `mac_tooling`). No portfolio-wide
+   coverage signal.
+
+**Is the dev style helping or hurting velocity?** **Net helping, but fraying at
+the edges.** The platform/agent approach clearly lets one person ship a dozen
+apps — that's the upside. The drag is operational: disabled CI, constantly-dirty
+working trees, abandoned worktrees, and "AI-generated scaffolding smell" in a
+few repos (e.g. `magic_clipboard_mgr`'s 50+ service files + phase-named test
+buckets). Tightening the commit/CI loop would convert a lot of that churn back
+into velocity.
+
+---
+
+## 2. Overall Score Sheet
+
+Scores are 1–10 (1 = critical/broken, 10 = production-grade), aggregated across
+the ~30 code repos (pure docs/usage repos excluded from category math).
+
+| Category | Score | Justification (evidence) |
+|---|---|---|
+| A. Repository organization | **8** | Consistent `@bytelyst/*` + `types/repository/routes` pattern, per-repo `AGENTS.md`, clear monorepos; minus for ~14 dirty trees, stray worktrees, a few unstructured repos. |
+| B. Code quality | **7** | Flagships: strict TS, `0` `as any`, no `console.log`, Zod validation. Tail: `print()`-heavy (`2nd_brain` 60+, `mac_tooling` 200+), `any` leaks, AI-scaffold smell (`magic_clipboard_mgr`). |
+| C. Architecture | **8** | Genuinely strong: shared platform, datastore abstraction, deterministic engines (`flowmonk` scheduler), risk-scoring (`trails`), MCP integrations, clean native/web boundaries. |
+| D. DevOps & deployment | **6** | Ecosystem compose orchestrates ~20 services, multi-stage Dockerfiles common — but **CI disabled on the platform repo**, ~15 repos with no CI, and **0 healthchecks** in `docker-compose.ecosystem.yml`. |
+| E. Testing | **6** | Bimodal: `notes`/`fastgap`/`clock`/`trails`/`claw-*` have 28–600+ tests; many repos have 0. E2E frequently `continue-on-error: true`. No measured coverage. |
+| F. Security | **8** | No real committed secrets anywhere; field encryption + Key Vault refs in the mature repos; `.gitignore`/`.env.example` discipline. Minus for `NODE_TLS_REJECT_UNAUTHORIZED=0` in some Docker, thin input-validation in prototypes. |
+| G. Product readiness | **7** | Several apps runnable end-to-end (web+backend); mobile/native surfaces often partial; CI-disabled + flaky E2E hold back true "launchable". |
+| H. AI-agent practices | **6** | Impressive tooling (`agent-queue`, profiles, job briefs, `AGENTS.md`), but guardrails are weak: `--permission-mode dangerous`, agents dirtying live repos, duplicate work landing upstream, no enforced test-before-commit. |
+| I. Personal workflow | **6** | Good: conventional commits, auto `backup-main-*` branches, `AGENTS.md`. Bad: ~14 dirty repos, branches behind `origin`, abandoned worktrees, no unified release/issue discipline. |
+| **Weighted overall** | **≈ 7.0** | Beta-quality. See weighting below. |
+
+**Weighting & rationale:** Security (F) and Product readiness (G) weighted ~1.5×,
+Testing (E) and DevOps (D) ~1.25× (these gate real-world reliability);
+A/B/C/H/I at 1.0×. The strong architecture/security pull the number up; the
+weak CI/testing pull it back to a solid-but-not-shippable **~7.0**.
+
+---
+
+## 3. Per-Product / Per-Repo Breakdown
+
+Maturity legend: **PROD** = production-grade, **BETA**, **MVP**, **PROTO** =
+prototype/learning, **REF** = docs/reference (not code).
+
+### Flagship products (platform-integrated)
+| Repo | Stack | Tests | CI | Docker | Maturity |
+|---|---|---|---|---|---|
+| `learning_ai_notes` | Fastify5 + Next16 + Expo, Cosmos | 80+ files | ✓ gitea | ✓ | **BETA→PROD** |
+| `learning_ai_trails` | Fastify5 + Next16 + SDK, Cosmos | 28 files | ✓ gitea | ✓ | **PROD** |
+| `learning_ai_clock` | Next16 PWA + iOS/Android, Fastify | 662 total | ✓ gitea | ✓ | **BETA** |
+| `learning_ai_fastgap` | Expo + Next16 + Fastify | 700+ total | ✓ gitea (7 jobs) | ✓ | **BETA** |
+| `learning_ai_peakpulse` | SwiftUI + Fastify | 26 files | ✓ (backend) | ✓ | **BETA→PROD** |
+| `learning_ai_flowmonk` | Next16 + Fastify + Expo | 102 backend | ✓ gitea | ✓ | **BETA** |
+| `learning_ai_efforise` | React/Vite + Fastify + RN | ~9 backend | ✓ gitea | ✓ | **MVP** |
+| `learning_ai_dev_intelli` | Fastify + Next16, GitHub API | 52 backend | ✓ gitea | ✓ | **MVP** |
+| `learning_ai_local_memory_gpt` | Fastify + Next16, SQLite/Ollama | 122 | ✓ gitea | ✓ | **MVP** |
+| `learning_ai_talk2obsidian` | Fastify + Vite, SQLite/Ollama | 8 | ✗ | ✓ | **BETA** |
+| `learning_voice_ai_agent` | Python + Fastify + Next + KMP | 463+ | ⚠ disabled | ✓ | **BETA** |
+| `learning_multimodal_memory_agents` (MindLyst) | KMP + Next + Fastify | 33 | ⚠ disabled | ✓ | **MVP** |
+| `learning_ai_jarvis_jr` | SwiftUI + Next + Android | ~13 web | ✓ gitea | ✓ | **ALPHA/BETA** |
+| `learning_ai_auth_app` | iOS/watchOS/Android (spec+UI) | 0 (here) | ✗ | ✗ | **MVP (spec)** |
+
+### Platform & infra
+| Repo | Stack | Notes | Maturity |
+|---|---|---|---|
+| `learning_ai_common_plat` | pnpm monorepo, 36 `@bytelyst/*`, Fastify, Cosmos | ~466k LOC; full auth (OAuth/MFA/passkeys/SAML); **GH Actions disabled (billing)**, gitea CI active | **PROD** |
+| `learning_ai_devops_tools` | Bash + Python + Node (this repo) | GitHub admin scripts, `agent-queue`, Hermes dashboard; thin tests | **PROD (scripts) / MVP (dash)** |
+| `learning_ai_k8s_streaming` | Python FastAPI + Helm | Use-case registry, HPA/probes, load tools | **BETA→PROD** |
+| `learning_ai_local_llms` | Next16 dashboard + Python TTS | Ollama mission-control; 57 tests | **BETA** |
+
+### Tools / OSS / native
+| Repo | Stack | Notes | Maturity |
+|---|---|---|---|
+| `oss/learning_ai_claw-code-oss` | Rust workspace (10+ crates) | `unsafe forbid`, clippy pedantic, 40+ test files | **PROD** |
+| `oss/learning_ai_claw-cowork` | Rust + Tauri + Python | 65+ test files, E2E, Docker | **PROD** |
+| `learning_magic_terminal` | **Rust** | README+CI+many tests; command-blocks v2; dirty(5) | **BETA** |
+| `learning_notif_scanr` | **Swift** (Package.swift) | tests present, **no CI**, no Docker | **MVP** |
+| `ios/learning_swift_hourglass` | Swift/SwiftUI macOS | MVVM, 2 test files, no CI | **MVP** |
+| `learning_ai_magic_clipboard_mgr` | Swift/macOS, GRDB | 24 tests but 50+ services + phase-named tests (AI-scaffold smell) | **MVP** |
+| `learning_ai_mac_tooling` | Python FastAPI + React | forensics toolkit; **0 tests**, 200+ `print()`, 3k-line files | **PROTO** |
+| `copilot/learning_ai_uxui_web` | Next16 + MSW + Playwright | component showcase, Lighthouse CI | **MVP** |
+| `learning_ai_productivity_web` | Next15, client-only | clean registry pattern, **0 tests** | **MVP** |
+| `learning_ai_webui_copilot` | Python FastAPI + LangChain | rules/policy engines, **0 tests, no Docker/CI** | **MVP** |
+| `learning_agent_monitoring_fx` | npm monorepo + KMP | agent/ingest/web work, native WIP, 54 `console.log`, TODOs | **BETA** |
+| `learning_agentic_tools_portal` | Python Flask + uv | minimal (1 endpoint, 1 test), has CI | **PROTO** |
+| `learning_server-survival-devops-web` | Vanilla JS + Three.js | playable game, **0 tests** | **MVP** |
+| `learning_pytorch_todo_predictor` | Python + PyTorch | educational, **0 tests**, **no upstream** | **PROTO** |
+| `learning_sidecar_setup` | Next16 scaffold + py stub | scaffolding only, **no upstream**, dirty(8) | **PROTO** |
+| `learning_claude_code_setup` | Bash + markdown | setup notes/scripts; dirty(1) | **REF** |
+| `learning_github_copilot` | Markdown (CLI/SDK docs) | reference only | **REF** |
+| `learning_python_sandbox` | Python | LeetCode/learning; dirty(1) | **PROTO** |
+| `learning_ai_materials` | Docs | NBA handover package | **REF** |
+| `learning_windsurf_setup` | Usage logs | not a codebase | **N/A** |
+
+---
+
+## 4. Findings by Dimension
+
+### A. Repository organization
+- **Fact:** Strong, repeated conventions — `AGENTS.md`/`CLAUDE.md` per repo, pnpm
+  workspaces, `types→repository→routes` backend modules, `docs/` with PRD/ROADMAP.
+- **Fact:** ~14 repos dirty at audit time; abandoned `worktrees/` (now cleaned);
+  some repos behind `origin`. Two repos (`pytorch_todo_predictor`,
+  `sidecar_setup`) have **no git upstream**.
+- **Reco:** Adopt a "clean tree or it doesn't exist" rule (see §8). Add upstreams
+  for the two orphan repos or mark them clearly local.
+
+### B. Code quality
+- **Fact:** Best repos enforce strict TS (`0` `as any` in `notes`, `trails`,
+  `local_memory_gpt` backends), no `console.log` (Fastify logger), Zod validation.
+- **Fact:** `learning_ai_2nd_brain` has 60+ `print()`; `mac_tooling` 200+ and
+  3k+-line files (`network_transfer_audit.py` 3521 lines); `magic_clipboard_mgr`
+  shows AI-scaffold smell (50+ service files, `Phase5–8`/`RemainingQATests`).
+- **Reco:** Lint-gate `print()`/`console.log` in the Python/TS repos; split the
+  3k-line files; audit `magic_clipboard_mgr` for stubbed vs real services.
+
+### C. Architecture
+- **Fact:** Clear separation and reuse: shared auth/datastore/design-tokens,
+  deterministic scheduler (`flowmonk`), risk engine (`trails`), use-case registry
+  (`k8s_streaming`), MCP tool servers, Rust crate boundaries (`claw-*`).
+- **Reco:** This is the strongest dimension — protect it by keeping product
+  domains out of `common_plat` and vice-versa.
+
+### D. DevOps & deployment
+- **Fact:** `docker-compose.ecosystem.yml` wires ~20 services (10 backends + 10
+  webs) + infra (Cosmos emulator, Azurite, Traefik, Loki, Grafana, MCP); 30
+  `restart:` policies, 24 `build:` contexts, but **0 `healthcheck:` blocks**.
+- **Fact:** GH Actions disabled on `common_plat` + `voice_ai_agent`; ~15 repos no CI.
+- **Reco (P1):** Add healthchecks + `depends_on: condition: service_healthy` to
+  the ecosystem compose; re-enable or fully migrate CI to gitea self-hosted.
+
+### E. Testing
+- **Fact:** `fastgap` (~700), `clock` (662), `notes` (80+ files), `voice_ai_agent`
+  (463+), `claw-cowork` (65+ files) are excellent; ~8 repos have 0 tests.
+- **Fact:** E2E often `continue-on-error: true` (`fastgap`, `flowmonk`,
+  `jarvis_jr`, `local_memory_gpt`) — i.e. not actually gating.
+- **Reco:** Set a per-repo minimum (smoke + happy-path) and stop masking E2E
+  failures with `continue-on-error` once stabilized.
+
+### F. Security
+- **Fact:** No real committed secrets across all repos. Matches were
+  `.env.example` placeholders, the public Cosmos emulator key
+  (`C2y6yDjf5/R...`), `dev-*` JWT secrets, and Azure Key Vault references.
+- **Fact:** Field encryption (AES-256-GCM) in `clock`/`notes`/`dev_intelli`;
+  `unsafe_code = "forbid"` in the Rust repos.
+- **Watch:** `NODE_TLS_REJECT_UNAUTHORIZED=0` seen in some Docker setups; thin
+  input validation / no rate-limiting in the prototype Python apps.
+
+### G. Product readiness
+- **Fact:** Web+backend pairs generally run end-to-end; native/mobile surfaces
+  (iOS/Android/KMP) are frequently partial or scaffolded.
+- **Reco:** Pick 2–3 flagships (`notes`, `trails`, `clock`) and drive them to a
+  true launch checklist; treat the rest explicitly as experiments.
+
+### H. AI-agent practices
+- **Fact:** Sophisticated `agent-queue` (profiles, job briefs, lifecycle dirs,
+  Node dashboard) — genuinely advanced for a solo setup.
+- **Fact:** Guardrails weak: agents run `--permission-mode dangerous`, write to
+  live working trees (caused the dirty-repo churn), and **landed duplicate work**
+  (during this session a rebase auto-dropped 2 commits already pushed upstream).
+- **Reco:** Standardize the agent task contract (§8): one task = one branch =
+  clean tree → tests → commit → push; ignore runtime/queue state in git (already
+  fixed in this repo this session).
+
+### I. Personal engineering workflow
+- **Fact:** Conventional commits, auto `backup-main-*` branches (nice safety net),
+  `AGENTS.md` discipline.
+- **Fact:** Too many long-lived dirty trees and behind-`origin` branches; no
+  visible issue tracker or release cadence.
+- **Reco:** A weekly "sync sweep" (rebase+push all clean repos, list dirty) — you
+  effectively did this manually this session; automate it.
+
+---
+
+## 5. Prioritized Action Plan
+
+**P0 — now (correctness / risk)**
+1. **Re-establish a working CI gate on `learning_ai_common_plat`** (everything
+   depends on it). Either fix GH Actions billing or make gitea CI the enforced
+   gate. *(M, common_plat)*
+2. **Resolve the ~14 dirty repos**: review + commit or discard intentionally;
+   add upstreams for `pytorch_todo_predictor` & `sidecar_setup`. *(M, workspace)*
+3. **Decide the agent-queue daemon policy** so it doesn't write to live trees
+   uncontrolled (it was running in `dangerous` mode). *(S, devops_tools)*
+
+**P1 — this week**
+4. Add **healthchecks** to `docker-compose.ecosystem.yml` (0 today) + ordered
+   `depends_on`. *(M, common_plat/ecosystem)*
+5. Stop masking E2E with `continue-on-error: true` once stabilized; make at least
+   smoke E2E gating. *(M, fastgap/flowmonk/jarvis_jr)*
+6. Replace `print()` with logging in `2nd_brain` (60+) and `mac_tooling` (200+).
+   *(S–M)*
+
+**P2 — this month**
+7. Add minimum test suites to the 0-test repos that matter (`productivity_web`,
+   `webui_copilot`, `agent_monitoring_fx`). *(M)*
+8. Audit `magic_clipboard_mgr` for dead/stubbed services (50+ files). *(M)*
+9. Split 3k-line files in `mac_tooling`. *(M)*
+10. Remove `NODE_TLS_REJECT_UNAUTHORIZED=0` from Docker; add rate-limiting to the
+    Python prototypes. *(S–M)*
+
+**P3 — nice to have**
+11. Portfolio-wide coverage reporting + dependency audit (`npm audit`/`pip-audit`)
+    in CI. *(M)*
+12. A lightweight issue/release cadence for the 2–3 flagships. *(S)*
+
+---
+
+## 6. Safe Auto-Fix Candidates
+*(Low-risk; listed only — not applied. Each needs your approval.)*
+- **Ecosystem compose healthchecks** — add `healthcheck:` to each backend/web
+  service in `docker-compose.ecosystem.yml`. Safe: additive.
+- **Add upstreams** for `learning_pytorch_todo_predictor` and
+  `learning_sidecar_setup` (`git remote add origin … && git push -u`). Safe once
+  remote exists.
+- **Lint rule to ban `print()`** in `learning_ai_2nd_brain` (ruff `T20`) — flags
+  only; you fix incrementally.
+- **Drop `NODE_TLS_REJECT_UNAUTHORIZED=0`** from Docker envs where a real CA/host
+  override is available. (Verify per service first.)
+- **`.gitignore` audit** for the few repos still tracking runtime artifacts
+  (pattern already fixed in `devops_tools` this session).
+
+## 7. Delegate-to-Agent Queue
+Ready-to-paste briefs (each self-contained, one branch, clean-tree rule):
+1. **"Add healthchecks to ecosystem compose"** — repo `common_plat`; read
+   `docker-compose.ecosystem.yml`; add `healthcheck` + ordered `depends_on` to
+   all `*-backend`/`*-web` services; `docker compose config` must pass; no app
+   code changes.
+2. **"De-`print()` 2nd_brain"** — repo `learning_ai_2nd_brain`; replace `print()`
+   with `typer.echo`/logging in `src/brain/**`; keep behavior identical; run
+   `pytest`.
+3. **"Bootstrap tests for webui_copilot"** — repo `learning_ai_webui_copilot`;
+   add `pytest` smoke tests for `site_backend` rules/policy engines + a copilot
+   happy-path; wire a `.github`/gitea CI job.
+4. **"Service audit: magic_clipboard_mgr"** — repo `learning_ai_magic_clipboard_mgr`;
+   produce a report of which of the 50+ services are wired vs stubbed; no code
+   changes.
+5. **"Stabilize E2E"** — repos `fastgap`/`flowmonk`; make smoke E2E reliable, then
+   remove `continue-on-error: true` for that job only.
+
+## 8. Recommended Standard Operating Procedure (for every agent task)
+1. **One task = one branch** off latest `origin/main`; never work on a dirty tree.
+2. **Scope it** with a job brief (you already do this in `agent-queue/docs/jobs/`).
+3. **Test before commit**: typecheck + lint + unit must pass locally.
+4. **Commit small**, conventional messages; **push the branch**, open a PR — don't
+   let agents push straight to `main` of the shared platform.
+5. **Never track runtime/queue state** (ignore `agent-queue/queue/*` lifecycle —
+   fixed here this session).
+6. **Prefer least-privilege** over `--permission-mode dangerous`; reserve dangerous
+   mode for sandboxed/disposable checkouts.
+7. **Weekly sync sweep**: rebase+push all clean repos, list dirty ones for review.
+
+## 9. What I Could Not Inspect
+- **No dynamic results.** I did not run `npm/pnpm install`, builds, `pytest`,
+  `vitest`, Playwright, `cargo test`, or `docker compose up` (those mutate trees /
+  need services). Test counts and CI configs are evidence of *intended* coverage,
+  not measured pass/coverage.
+- **No live `git` per-repo ahead/behind** inside the read-only agents (they lacked
+  shell git); branch/dirty facts come from the orchestrator's own checks and may
+  have shifted as the agent-queue daemon ran.
+- **One agent batch misfired**: it reported 5 repos as "missing"
+  (`claude_code_setup`, `github_copilot`, `magic_terminal`, `notif_scanr`,
+  `python_sandbox`) due to a read-access issue; I re-scanned them directly —
+  they exist (notably `magic_terminal` = Rust, `notif_scanr` = Swift).
+- **Mobile/native depth** (iOS/Android/KMP/Tauri runtime behavior) and **secret
+  *values*** were not executed/decrypted — only presence/format was checked.
+- **`.env.ecosystem`** holds dev-only values; production secret management
+  (Key Vault wiring) was inferred from references, not verified live.
+
+---
+
+### TL;DR
+- Coherent **beta-grade product ecosystem** (~38 repos) — far beyond "learning".
+- **Architecture & security are strong; CI & testing are the weak links.**
+- **P0:** restore a CI gate on `common_plat`, clean the ~14 dirty repos, and rein
+  in the `dangerous`-mode agent-queue.
+- A handful of flagships (`notes`, `trails`, `claw-*`, `clock`, `fastgap`) are
+  genuinely production-grade; the long tail is MVP/prototype.
+- Tighten the agent commit/CI loop (§8) and most of the operational churn
+  converts back into velocity.