diff --git a/ENGINEERING_REVIEW_SCORECARD.md b/ENGINEERING_REVIEW_SCORECARD.md new file mode 100644 index 0000000..1a685c3 --- /dev/null +++ b/ENGINEERING_REVIEW_SCORECARD.md @@ -0,0 +1,335 @@ +# Engineering Review & Scorecard + +> Evidence-based, read-only review of the entire `~/code/mygh` workspace (~38 git +> repos) per `docs/prompts/engineering-review-scorecard.md`. Generated 2026-05-30. +> +> **Method:** static inspection only — file reads, `grep`, and read-only `git`. +> No builds, installs, or test runs were executed (that would mutate the trees), +> so dynamic results (pass/fail, coverage %) are inferred from config + test +> counts, not measured. See §9 for limits. Per-repo evidence was gathered by +> parallel read-only agents and spot-verified. + +--- + +## 1. Executive Summary + +**What this is:** a single developer running a surprisingly coherent *product +ecosystem* — ~10 product apps (clock, notes, fastgap, peakpulse, flowmonk, +efforise, jarvis_jr, trails, talk2obsidian, local-memory-gpt, voice-ai-agent, +multimodal/mindlyst) sharing one platform monorepo (`learning_ai_common_plat`, +36 `@bytelyst/*` packages, auth/Cosmos/design-tokens), orchestrated by a single +`docker-compose.ecosystem.yml` (~20 services) and driven heavily by AI agents +through a homegrown `agent-queue`. This is far more disciplined than a typical +"learning" folder. + +**Overall maturity:** **Beta-quality ecosystem.** A core of genuinely +production-grade repos (`learning_ai_notes`, `learning_ai_trails`, +`oss/claw-code`/`claw-cowork`, `learning_ai_clock`, `learning_ai_fastgap`) +surrounded by a long tail of MVP/prototype repos with thin or zero tests and no +CI. + +**Biggest strengths (top 3)** +1. **Strong platform discipline.** Shared `@bytelyst/*` packages, a repeated + `types.ts → repository.ts → routes.ts` backend pattern, Cosmos partition-key + conventions (`/userId`, `productId` on every doc), per-repo `AGENTS.md`, + conventional commits, and field-level encryption (`field-encrypt.ts`) recur + across the best repos. +2. **Clean security posture for a personal workspace.** Secret scans across all + repos surfaced **no real committed production secrets** — only `.env.example` + placeholders, the public Azure Cosmos emulator key, dev `JWT_SECRET=dev-...` + values, and Azure Key Vault *references*. `.gitignore` is present nearly + everywhere. +3. **Top repos are legitimately good.** `notes`, `trails`, and the two Rust + `claw-*` repos show modular architecture, real test suites (28–80+ files), + CI, multi-stage Docker, and strict typing (`0` `as any` in several backends). + +**Biggest risks (top 3)** +1. **CI is the weak link.** GitHub Actions is **disabled (billing)** on the + platform monorepo `learning_ai_common_plat` and on `voice_ai_agent` + (`*.disabled` workflows); ~15 repos have **no CI at all**. The shared + platform that everything depends on has no automated gate. +2. **Process churn dirties the repos.** A live `agent-queue` daemon + `devin` + agents in `--permission-mode dangerous` were actively writing to repos; ~14 + repos were found dirty with uncommitted work, several behind `origin`. Work + is at risk of being lost or silently diverging. +3. **Testing is bimodal.** Excellent in the flagship repos, **zero** in many + others (`productivity_web`, `webui_copilot`, `pytorch_todo_predictor`, + `server-survival`, `sidecar_setup`, `mac_tooling`). No portfolio-wide + coverage signal. + +**Is the dev style helping or hurting velocity?** **Net helping, but fraying at +the edges.** The platform/agent approach clearly lets one person ship a dozen +apps — that's the upside. The drag is operational: disabled CI, constantly-dirty +working trees, abandoned worktrees, and "AI-generated scaffolding smell" in a +few repos (e.g. `magic_clipboard_mgr`'s 50+ service files + phase-named test +buckets). Tightening the commit/CI loop would convert a lot of that churn back +into velocity. + +--- + +## 2. Overall Score Sheet + +Scores are 1–10 (1 = critical/broken, 10 = production-grade), aggregated across +the ~30 code repos (pure docs/usage repos excluded from category math). + +| Category | Score | Justification (evidence) | +|---|---|---| +| A. Repository organization | **8** | Consistent `@bytelyst/*` + `types/repository/routes` pattern, per-repo `AGENTS.md`, clear monorepos; minus for ~14 dirty trees, stray worktrees, a few unstructured repos. | +| B. Code quality | **7** | Flagships: strict TS, `0` `as any`, no `console.log`, Zod validation. Tail: `print()`-heavy (`2nd_brain` 60+, `mac_tooling` 200+), `any` leaks, AI-scaffold smell (`magic_clipboard_mgr`). | +| C. Architecture | **8** | Genuinely strong: shared platform, datastore abstraction, deterministic engines (`flowmonk` scheduler), risk-scoring (`trails`), MCP integrations, clean native/web boundaries. | +| D. DevOps & deployment | **6** | Ecosystem compose orchestrates ~20 services, multi-stage Dockerfiles common — but **CI disabled on the platform repo**, ~15 repos with no CI, and **0 healthchecks** in `docker-compose.ecosystem.yml`. | +| E. Testing | **6** | Bimodal: `notes`/`fastgap`/`clock`/`trails`/`claw-*` have 28–600+ tests; many repos have 0. E2E frequently `continue-on-error: true`. No measured coverage. | +| F. Security | **8** | No real committed secrets anywhere; field encryption + Key Vault refs in the mature repos; `.gitignore`/`.env.example` discipline. Minus for `NODE_TLS_REJECT_UNAUTHORIZED=0` in some Docker, thin input-validation in prototypes. | +| G. Product readiness | **7** | Several apps runnable end-to-end (web+backend); mobile/native surfaces often partial; CI-disabled + flaky E2E hold back true "launchable". | +| H. AI-agent practices | **6** | Impressive tooling (`agent-queue`, profiles, job briefs, `AGENTS.md`), but guardrails are weak: `--permission-mode dangerous`, agents dirtying live repos, duplicate work landing upstream, no enforced test-before-commit. | +| I. Personal workflow | **6** | Good: conventional commits, auto `backup-main-*` branches, `AGENTS.md`. Bad: ~14 dirty repos, branches behind `origin`, abandoned worktrees, no unified release/issue discipline. | +| **Weighted overall** | **≈ 7.0** | Beta-quality. See weighting below. | + +**Weighting & rationale:** Security (F) and Product readiness (G) weighted ~1.5×, +Testing (E) and DevOps (D) ~1.25× (these gate real-world reliability); +A/B/C/H/I at 1.0×. The strong architecture/security pull the number up; the +weak CI/testing pull it back to a solid-but-not-shippable **~7.0**. + +--- + +## 3. Per-Product / Per-Repo Breakdown + +Maturity legend: **PROD** = production-grade, **BETA**, **MVP**, **PROTO** = +prototype/learning, **REF** = docs/reference (not code). + +### Flagship products (platform-integrated) +| Repo | Stack | Tests | CI | Docker | Maturity | +|---|---|---|---|---|---| +| `learning_ai_notes` | Fastify5 + Next16 + Expo, Cosmos | 80+ files | ✓ gitea | ✓ | **BETA→PROD** | +| `learning_ai_trails` | Fastify5 + Next16 + SDK, Cosmos | 28 files | ✓ gitea | ✓ | **PROD** | +| `learning_ai_clock` | Next16 PWA + iOS/Android, Fastify | 662 total | ✓ gitea | ✓ | **BETA** | +| `learning_ai_fastgap` | Expo + Next16 + Fastify | 700+ total | ✓ gitea (7 jobs) | ✓ | **BETA** | +| `learning_ai_peakpulse` | SwiftUI + Fastify | 26 files | ✓ (backend) | ✓ | **BETA→PROD** | +| `learning_ai_flowmonk` | Next16 + Fastify + Expo | 102 backend | ✓ gitea | ✓ | **BETA** | +| `learning_ai_efforise` | React/Vite + Fastify + RN | ~9 backend | ✓ gitea | ✓ | **MVP** | +| `learning_ai_dev_intelli` | Fastify + Next16, GitHub API | 52 backend | ✓ gitea | ✓ | **MVP** | +| `learning_ai_local_memory_gpt` | Fastify + Next16, SQLite/Ollama | 122 | ✓ gitea | ✓ | **MVP** | +| `learning_ai_talk2obsidian` | Fastify + Vite, SQLite/Ollama | 8 | ✗ | ✓ | **BETA** | +| `learning_voice_ai_agent` | Python + Fastify + Next + KMP | 463+ | ⚠ disabled | ✓ | **BETA** | +| `learning_multimodal_memory_agents` (MindLyst) | KMP + Next + Fastify | 33 | ⚠ disabled | ✓ | **MVP** | +| `learning_ai_jarvis_jr` | SwiftUI + Next + Android | ~13 web | ✓ gitea | ✓ | **ALPHA/BETA** | +| `learning_ai_auth_app` | iOS/watchOS/Android (spec+UI) | 0 (here) | ✗ | ✗ | **MVP (spec)** | + +### Platform & infra +| Repo | Stack | Notes | Maturity | +|---|---|---|---| +| `learning_ai_common_plat` | pnpm monorepo, 36 `@bytelyst/*`, Fastify, Cosmos | ~466k LOC; full auth (OAuth/MFA/passkeys/SAML); **GH Actions disabled (billing)**, gitea CI active | **PROD** | +| `learning_ai_devops_tools` | Bash + Python + Node (this repo) | GitHub admin scripts, `agent-queue`, Hermes dashboard; thin tests | **PROD (scripts) / MVP (dash)** | +| `learning_ai_k8s_streaming` | Python FastAPI + Helm | Use-case registry, HPA/probes, load tools | **BETA→PROD** | +| `learning_ai_local_llms` | Next16 dashboard + Python TTS | Ollama mission-control; 57 tests | **BETA** | + +### Tools / OSS / native +| Repo | Stack | Notes | Maturity | +|---|---|---|---| +| `oss/learning_ai_claw-code-oss` | Rust workspace (10+ crates) | `unsafe forbid`, clippy pedantic, 40+ test files | **PROD** | +| `oss/learning_ai_claw-cowork` | Rust + Tauri + Python | 65+ test files, E2E, Docker | **PROD** | +| `learning_magic_terminal` | **Rust** | README+CI+many tests; command-blocks v2; dirty(5) | **BETA** | +| `learning_notif_scanr` | **Swift** (Package.swift) | tests present, **no CI**, no Docker | **MVP** | +| `ios/learning_swift_hourglass` | Swift/SwiftUI macOS | MVVM, 2 test files, no CI | **MVP** | +| `learning_ai_magic_clipboard_mgr` | Swift/macOS, GRDB | 24 tests but 50+ services + phase-named tests (AI-scaffold smell) | **MVP** | +| `learning_ai_mac_tooling` | Python FastAPI + React | forensics toolkit; **0 tests**, 200+ `print()`, 3k-line files | **PROTO** | +| `copilot/learning_ai_uxui_web` | Next16 + MSW + Playwright | component showcase, Lighthouse CI | **MVP** | +| `learning_ai_productivity_web` | Next15, client-only | clean registry pattern, **0 tests** | **MVP** | +| `learning_ai_webui_copilot` | Python FastAPI + LangChain | rules/policy engines, **0 tests, no Docker/CI** | **MVP** | +| `learning_agent_monitoring_fx` | npm monorepo + KMP | agent/ingest/web work, native WIP, 54 `console.log`, TODOs | **BETA** | +| `learning_agentic_tools_portal` | Python Flask + uv | minimal (1 endpoint, 1 test), has CI | **PROTO** | +| `learning_server-survival-devops-web` | Vanilla JS + Three.js | playable game, **0 tests** | **MVP** | +| `learning_pytorch_todo_predictor` | Python + PyTorch | educational, **0 tests**, **no upstream** | **PROTO** | +| `learning_sidecar_setup` | Next16 scaffold + py stub | scaffolding only, **no upstream**, dirty(8) | **PROTO** | +| `learning_claude_code_setup` | Bash + markdown | setup notes/scripts; dirty(1) | **REF** | +| `learning_github_copilot` | Markdown (CLI/SDK docs) | reference only | **REF** | +| `learning_python_sandbox` | Python | LeetCode/learning; dirty(1) | **PROTO** | +| `learning_ai_materials` | Docs | NBA handover package | **REF** | +| `learning_windsurf_setup` | Usage logs | not a codebase | **N/A** | + +--- + +## 4. Findings by Dimension + +### A. Repository organization +- **Fact:** Strong, repeated conventions — `AGENTS.md`/`CLAUDE.md` per repo, pnpm + workspaces, `types→repository→routes` backend modules, `docs/` with PRD/ROADMAP. +- **Fact:** ~14 repos dirty at audit time; abandoned `worktrees/` (now cleaned); + some repos behind `origin`. Two repos (`pytorch_todo_predictor`, + `sidecar_setup`) have **no git upstream**. +- **Reco:** Adopt a "clean tree or it doesn't exist" rule (see §8). Add upstreams + for the two orphan repos or mark them clearly local. + +### B. Code quality +- **Fact:** Best repos enforce strict TS (`0` `as any` in `notes`, `trails`, + `local_memory_gpt` backends), no `console.log` (Fastify logger), Zod validation. +- **Fact:** `learning_ai_2nd_brain` has 60+ `print()`; `mac_tooling` 200+ and + 3k+-line files (`network_transfer_audit.py` 3521 lines); `magic_clipboard_mgr` + shows AI-scaffold smell (50+ service files, `Phase5–8`/`RemainingQATests`). +- **Reco:** Lint-gate `print()`/`console.log` in the Python/TS repos; split the + 3k-line files; audit `magic_clipboard_mgr` for stubbed vs real services. + +### C. Architecture +- **Fact:** Clear separation and reuse: shared auth/datastore/design-tokens, + deterministic scheduler (`flowmonk`), risk engine (`trails`), use-case registry + (`k8s_streaming`), MCP tool servers, Rust crate boundaries (`claw-*`). +- **Reco:** This is the strongest dimension — protect it by keeping product + domains out of `common_plat` and vice-versa. + +### D. DevOps & deployment +- **Fact:** `docker-compose.ecosystem.yml` wires ~20 services (10 backends + 10 + webs) + infra (Cosmos emulator, Azurite, Traefik, Loki, Grafana, MCP); 30 + `restart:` policies, 24 `build:` contexts, but **0 `healthcheck:` blocks**. +- **Fact:** GH Actions disabled on `common_plat` + `voice_ai_agent`; ~15 repos no CI. +- **Reco (P1):** Add healthchecks + `depends_on: condition: service_healthy` to + the ecosystem compose; re-enable or fully migrate CI to gitea self-hosted. + +### E. Testing +- **Fact:** `fastgap` (~700), `clock` (662), `notes` (80+ files), `voice_ai_agent` + (463+), `claw-cowork` (65+ files) are excellent; ~8 repos have 0 tests. +- **Fact:** E2E often `continue-on-error: true` (`fastgap`, `flowmonk`, + `jarvis_jr`, `local_memory_gpt`) — i.e. not actually gating. +- **Reco:** Set a per-repo minimum (smoke + happy-path) and stop masking E2E + failures with `continue-on-error` once stabilized. + +### F. Security +- **Fact:** No real committed secrets across all repos. Matches were + `.env.example` placeholders, the public Cosmos emulator key + (`C2y6yDjf5/R...`), `dev-*` JWT secrets, and Azure Key Vault references. +- **Fact:** Field encryption (AES-256-GCM) in `clock`/`notes`/`dev_intelli`; + `unsafe_code = "forbid"` in the Rust repos. +- **Watch:** `NODE_TLS_REJECT_UNAUTHORIZED=0` seen in some Docker setups; thin + input validation / no rate-limiting in the prototype Python apps. + +### G. Product readiness +- **Fact:** Web+backend pairs generally run end-to-end; native/mobile surfaces + (iOS/Android/KMP) are frequently partial or scaffolded. +- **Reco:** Pick 2–3 flagships (`notes`, `trails`, `clock`) and drive them to a + true launch checklist; treat the rest explicitly as experiments. + +### H. AI-agent practices +- **Fact:** Sophisticated `agent-queue` (profiles, job briefs, lifecycle dirs, + Node dashboard) — genuinely advanced for a solo setup. +- **Fact:** Guardrails weak: agents run `--permission-mode dangerous`, write to + live working trees (caused the dirty-repo churn), and **landed duplicate work** + (during this session a rebase auto-dropped 2 commits already pushed upstream). +- **Reco:** Standardize the agent task contract (§8): one task = one branch = + clean tree → tests → commit → push; ignore runtime/queue state in git (already + fixed in this repo this session). + +### I. Personal engineering workflow +- **Fact:** Conventional commits, auto `backup-main-*` branches (nice safety net), + `AGENTS.md` discipline. +- **Fact:** Too many long-lived dirty trees and behind-`origin` branches; no + visible issue tracker or release cadence. +- **Reco:** A weekly "sync sweep" (rebase+push all clean repos, list dirty) — you + effectively did this manually this session; automate it. + +--- + +## 5. Prioritized Action Plan + +**P0 — now (correctness / risk)** +1. **Re-establish a working CI gate on `learning_ai_common_plat`** (everything + depends on it). Either fix GH Actions billing or make gitea CI the enforced + gate. *(M, common_plat)* +2. **Resolve the ~14 dirty repos**: review + commit or discard intentionally; + add upstreams for `pytorch_todo_predictor` & `sidecar_setup`. *(M, workspace)* +3. **Decide the agent-queue daemon policy** so it doesn't write to live trees + uncontrolled (it was running in `dangerous` mode). *(S, devops_tools)* + +**P1 — this week** +4. Add **healthchecks** to `docker-compose.ecosystem.yml` (0 today) + ordered + `depends_on`. *(M, common_plat/ecosystem)* +5. Stop masking E2E with `continue-on-error: true` once stabilized; make at least + smoke E2E gating. *(M, fastgap/flowmonk/jarvis_jr)* +6. Replace `print()` with logging in `2nd_brain` (60+) and `mac_tooling` (200+). + *(S–M)* + +**P2 — this month** +7. Add minimum test suites to the 0-test repos that matter (`productivity_web`, + `webui_copilot`, `agent_monitoring_fx`). *(M)* +8. Audit `magic_clipboard_mgr` for dead/stubbed services (50+ files). *(M)* +9. Split 3k-line files in `mac_tooling`. *(M)* +10. Remove `NODE_TLS_REJECT_UNAUTHORIZED=0` from Docker; add rate-limiting to the + Python prototypes. *(S–M)* + +**P3 — nice to have** +11. Portfolio-wide coverage reporting + dependency audit (`npm audit`/`pip-audit`) + in CI. *(M)* +12. A lightweight issue/release cadence for the 2–3 flagships. *(S)* + +--- + +## 6. Safe Auto-Fix Candidates +*(Low-risk; listed only — not applied. Each needs your approval.)* +- **Ecosystem compose healthchecks** — add `healthcheck:` to each backend/web + service in `docker-compose.ecosystem.yml`. Safe: additive. +- **Add upstreams** for `learning_pytorch_todo_predictor` and + `learning_sidecar_setup` (`git remote add origin … && git push -u`). Safe once + remote exists. +- **Lint rule to ban `print()`** in `learning_ai_2nd_brain` (ruff `T20`) — flags + only; you fix incrementally. +- **Drop `NODE_TLS_REJECT_UNAUTHORIZED=0`** from Docker envs where a real CA/host + override is available. (Verify per service first.) +- **`.gitignore` audit** for the few repos still tracking runtime artifacts + (pattern already fixed in `devops_tools` this session). + +## 7. Delegate-to-Agent Queue +Ready-to-paste briefs (each self-contained, one branch, clean-tree rule): +1. **"Add healthchecks to ecosystem compose"** — repo `common_plat`; read + `docker-compose.ecosystem.yml`; add `healthcheck` + ordered `depends_on` to + all `*-backend`/`*-web` services; `docker compose config` must pass; no app + code changes. +2. **"De-`print()` 2nd_brain"** — repo `learning_ai_2nd_brain`; replace `print()` + with `typer.echo`/logging in `src/brain/**`; keep behavior identical; run + `pytest`. +3. **"Bootstrap tests for webui_copilot"** — repo `learning_ai_webui_copilot`; + add `pytest` smoke tests for `site_backend` rules/policy engines + a copilot + happy-path; wire a `.github`/gitea CI job. +4. **"Service audit: magic_clipboard_mgr"** — repo `learning_ai_magic_clipboard_mgr`; + produce a report of which of the 50+ services are wired vs stubbed; no code + changes. +5. **"Stabilize E2E"** — repos `fastgap`/`flowmonk`; make smoke E2E reliable, then + remove `continue-on-error: true` for that job only. + +## 8. Recommended Standard Operating Procedure (for every agent task) +1. **One task = one branch** off latest `origin/main`; never work on a dirty tree. +2. **Scope it** with a job brief (you already do this in `agent-queue/docs/jobs/`). +3. **Test before commit**: typecheck + lint + unit must pass locally. +4. **Commit small**, conventional messages; **push the branch**, open a PR — don't + let agents push straight to `main` of the shared platform. +5. **Never track runtime/queue state** (ignore `agent-queue/queue/*` lifecycle — + fixed here this session). +6. **Prefer least-privilege** over `--permission-mode dangerous`; reserve dangerous + mode for sandboxed/disposable checkouts. +7. **Weekly sync sweep**: rebase+push all clean repos, list dirty ones for review. + +## 9. What I Could Not Inspect +- **No dynamic results.** I did not run `npm/pnpm install`, builds, `pytest`, + `vitest`, Playwright, `cargo test`, or `docker compose up` (those mutate trees / + need services). Test counts and CI configs are evidence of *intended* coverage, + not measured pass/coverage. +- **No live `git` per-repo ahead/behind** inside the read-only agents (they lacked + shell git); branch/dirty facts come from the orchestrator's own checks and may + have shifted as the agent-queue daemon ran. +- **One agent batch misfired**: it reported 5 repos as "missing" + (`claude_code_setup`, `github_copilot`, `magic_terminal`, `notif_scanr`, + `python_sandbox`) due to a read-access issue; I re-scanned them directly — + they exist (notably `magic_terminal` = Rust, `notif_scanr` = Swift). +- **Mobile/native depth** (iOS/Android/KMP/Tauri runtime behavior) and **secret + *values*** were not executed/decrypted — only presence/format was checked. +- **`.env.ecosystem`** holds dev-only values; production secret management + (Key Vault wiring) was inferred from references, not verified live. + +--- + +### TL;DR +- Coherent **beta-grade product ecosystem** (~38 repos) — far beyond "learning". +- **Architecture & security are strong; CI & testing are the weak links.** +- **P0:** restore a CI gate on `common_plat`, clean the ~14 dirty repos, and rein + in the `dangerous`-mode agent-queue. +- A handful of flagships (`notes`, `trails`, `claw-*`, `clock`, `fastgap`) are + genuinely production-grade; the long tail is MVP/prototype. +- Tighten the agent commit/CI loop (§8) and most of the operational churn + converts back into velocity.