bytelyst-devops-tools/ENGINEERING_REVIEW_SCORECARD.md
saravanakumardb1 32162312a9 docs: add workspace engineering review & scorecard
Read-only, evidence-based review of the ~38-repo workspace produced from
docs/prompts/engineering-review-scorecard.md: per-repo breakdown, 1-10
category scorecard (weighted overall ~7.0, beta-grade), prioritized P0-P3
action plan, safe auto-fix candidates, delegate-to-agent queue, and an
agent SOP. No code changes.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2026-05-30 20:53:44 -07:00

336 lines
20 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Engineering Review & Scorecard
> Evidence-based, read-only review of the entire `~/code/mygh` workspace (~38 git
> repos) per `docs/prompts/engineering-review-scorecard.md`. Generated 2026-05-30.
>
> **Method:** static inspection only — file reads, `grep`, and read-only `git`.
> No builds, installs, or test runs were executed (that would mutate the trees),
> so dynamic results (pass/fail, coverage %) are inferred from config + test
> counts, not measured. See §9 for limits. Per-repo evidence was gathered by
> parallel read-only agents and spot-verified.
---
## 1. Executive Summary
**What this is:** a single developer running a surprisingly coherent *product
ecosystem* — ~10 product apps (clock, notes, fastgap, peakpulse, flowmonk,
efforise, jarvis_jr, trails, talk2obsidian, local-memory-gpt, voice-ai-agent,
multimodal/mindlyst) sharing one platform monorepo (`learning_ai_common_plat`,
36 `@bytelyst/*` packages, auth/Cosmos/design-tokens), orchestrated by a single
`docker-compose.ecosystem.yml` (~20 services) and driven heavily by AI agents
through a homegrown `agent-queue`. This is far more disciplined than a typical
"learning" folder.
**Overall maturity:** **Beta-quality ecosystem.** A core of genuinely
production-grade repos (`learning_ai_notes`, `learning_ai_trails`,
`oss/claw-code`/`claw-cowork`, `learning_ai_clock`, `learning_ai_fastgap`)
surrounded by a long tail of MVP/prototype repos with thin or zero tests and no
CI.
**Biggest strengths (top 3)**
1. **Strong platform discipline.** Shared `@bytelyst/*` packages, a repeated
`types.ts → repository.ts → routes.ts` backend pattern, Cosmos partition-key
conventions (`/userId`, `productId` on every doc), per-repo `AGENTS.md`,
conventional commits, and field-level encryption (`field-encrypt.ts`) recur
across the best repos.
2. **Clean security posture for a personal workspace.** Secret scans across all
repos surfaced **no real committed production secrets** — only `.env.example`
placeholders, the public Azure Cosmos emulator key, dev `JWT_SECRET=dev-...`
values, and Azure Key Vault *references*. `.gitignore` is present nearly
everywhere.
3. **Top repos are legitimately good.** `notes`, `trails`, and the two Rust
`claw-*` repos show modular architecture, real test suites (2880+ files),
CI, multi-stage Docker, and strict typing (`0` `as any` in several backends).
**Biggest risks (top 3)**
1. **CI is the weak link.** GitHub Actions is **disabled (billing)** on the
platform monorepo `learning_ai_common_plat` and on `voice_ai_agent`
(`*.disabled` workflows); ~15 repos have **no CI at all**. The shared
platform that everything depends on has no automated gate.
2. **Process churn dirties the repos.** A live `agent-queue` daemon + `devin`
agents in `--permission-mode dangerous` were actively writing to repos; ~14
repos were found dirty with uncommitted work, several behind `origin`. Work
is at risk of being lost or silently diverging.
3. **Testing is bimodal.** Excellent in the flagship repos, **zero** in many
others (`productivity_web`, `webui_copilot`, `pytorch_todo_predictor`,
`server-survival`, `sidecar_setup`, `mac_tooling`). No portfolio-wide
coverage signal.
**Is the dev style helping or hurting velocity?** **Net helping, but fraying at
the edges.** The platform/agent approach clearly lets one person ship a dozen
apps — that's the upside. The drag is operational: disabled CI, constantly-dirty
working trees, abandoned worktrees, and "AI-generated scaffolding smell" in a
few repos (e.g. `magic_clipboard_mgr`'s 50+ service files + phase-named test
buckets). Tightening the commit/CI loop would convert a lot of that churn back
into velocity.
---
## 2. Overall Score Sheet
Scores are 110 (1 = critical/broken, 10 = production-grade), aggregated across
the ~30 code repos (pure docs/usage repos excluded from category math).
| Category | Score | Justification (evidence) |
|---|---|---|
| A. Repository organization | **8** | Consistent `@bytelyst/*` + `types/repository/routes` pattern, per-repo `AGENTS.md`, clear monorepos; minus for ~14 dirty trees, stray worktrees, a few unstructured repos. |
| B. Code quality | **7** | Flagships: strict TS, `0` `as any`, no `console.log`, Zod validation. Tail: `print()`-heavy (`2nd_brain` 60+, `mac_tooling` 200+), `any` leaks, AI-scaffold smell (`magic_clipboard_mgr`). |
| C. Architecture | **8** | Genuinely strong: shared platform, datastore abstraction, deterministic engines (`flowmonk` scheduler), risk-scoring (`trails`), MCP integrations, clean native/web boundaries. |
| D. DevOps & deployment | **6** | Ecosystem compose orchestrates ~20 services, multi-stage Dockerfiles common — but **CI disabled on the platform repo**, ~15 repos with no CI, and **0 healthchecks** in `docker-compose.ecosystem.yml`. |
| E. Testing | **6** | Bimodal: `notes`/`fastgap`/`clock`/`trails`/`claw-*` have 28600+ tests; many repos have 0. E2E frequently `continue-on-error: true`. No measured coverage. |
| F. Security | **8** | No real committed secrets anywhere; field encryption + Key Vault refs in the mature repos; `.gitignore`/`.env.example` discipline. Minus for `NODE_TLS_REJECT_UNAUTHORIZED=0` in some Docker, thin input-validation in prototypes. |
| G. Product readiness | **7** | Several apps runnable end-to-end (web+backend); mobile/native surfaces often partial; CI-disabled + flaky E2E hold back true "launchable". |
| H. AI-agent practices | **6** | Impressive tooling (`agent-queue`, profiles, job briefs, `AGENTS.md`), but guardrails are weak: `--permission-mode dangerous`, agents dirtying live repos, duplicate work landing upstream, no enforced test-before-commit. |
| I. Personal workflow | **6** | Good: conventional commits, auto `backup-main-*` branches, `AGENTS.md`. Bad: ~14 dirty repos, branches behind `origin`, abandoned worktrees, no unified release/issue discipline. |
| **Weighted overall** | **≈ 7.0** | Beta-quality. See weighting below. |
**Weighting & rationale:** Security (F) and Product readiness (G) weighted ~1.5×,
Testing (E) and DevOps (D) ~1.25× (these gate real-world reliability);
A/B/C/H/I at 1.0×. The strong architecture/security pull the number up; the
weak CI/testing pull it back to a solid-but-not-shippable **~7.0**.
---
## 3. Per-Product / Per-Repo Breakdown
Maturity legend: **PROD** = production-grade, **BETA**, **MVP**, **PROTO** =
prototype/learning, **REF** = docs/reference (not code).
### Flagship products (platform-integrated)
| Repo | Stack | Tests | CI | Docker | Maturity |
|---|---|---|---|---|---|
| `learning_ai_notes` | Fastify5 + Next16 + Expo, Cosmos | 80+ files | ✓ gitea | ✓ | **BETA→PROD** |
| `learning_ai_trails` | Fastify5 + Next16 + SDK, Cosmos | 28 files | ✓ gitea | ✓ | **PROD** |
| `learning_ai_clock` | Next16 PWA + iOS/Android, Fastify | 662 total | ✓ gitea | ✓ | **BETA** |
| `learning_ai_fastgap` | Expo + Next16 + Fastify | 700+ total | ✓ gitea (7 jobs) | ✓ | **BETA** |
| `learning_ai_peakpulse` | SwiftUI + Fastify | 26 files | ✓ (backend) | ✓ | **BETA→PROD** |
| `learning_ai_flowmonk` | Next16 + Fastify + Expo | 102 backend | ✓ gitea | ✓ | **BETA** |
| `learning_ai_efforise` | React/Vite + Fastify + RN | ~9 backend | ✓ gitea | ✓ | **MVP** |
| `learning_ai_dev_intelli` | Fastify + Next16, GitHub API | 52 backend | ✓ gitea | ✓ | **MVP** |
| `learning_ai_local_memory_gpt` | Fastify + Next16, SQLite/Ollama | 122 | ✓ gitea | ✓ | **MVP** |
| `learning_ai_talk2obsidian` | Fastify + Vite, SQLite/Ollama | 8 | ✗ | ✓ | **BETA** |
| `learning_voice_ai_agent` | Python + Fastify + Next + KMP | 463+ | ⚠ disabled | ✓ | **BETA** |
| `learning_multimodal_memory_agents` (MindLyst) | KMP + Next + Fastify | 33 | ⚠ disabled | ✓ | **MVP** |
| `learning_ai_jarvis_jr` | SwiftUI + Next + Android | ~13 web | ✓ gitea | ✓ | **ALPHA/BETA** |
| `learning_ai_auth_app` | iOS/watchOS/Android (spec+UI) | 0 (here) | ✗ | ✗ | **MVP (spec)** |
### Platform & infra
| Repo | Stack | Notes | Maturity |
|---|---|---|---|
| `learning_ai_common_plat` | pnpm monorepo, 36 `@bytelyst/*`, Fastify, Cosmos | ~466k LOC; full auth (OAuth/MFA/passkeys/SAML); **GH Actions disabled (billing)**, gitea CI active | **PROD** |
| `learning_ai_devops_tools` | Bash + Python + Node (this repo) | GitHub admin scripts, `agent-queue`, Hermes dashboard; thin tests | **PROD (scripts) / MVP (dash)** |
| `learning_ai_k8s_streaming` | Python FastAPI + Helm | Use-case registry, HPA/probes, load tools | **BETA→PROD** |
| `learning_ai_local_llms` | Next16 dashboard + Python TTS | Ollama mission-control; 57 tests | **BETA** |
### Tools / OSS / native
| Repo | Stack | Notes | Maturity |
|---|---|---|---|
| `oss/learning_ai_claw-code-oss` | Rust workspace (10+ crates) | `unsafe forbid`, clippy pedantic, 40+ test files | **PROD** |
| `oss/learning_ai_claw-cowork` | Rust + Tauri + Python | 65+ test files, E2E, Docker | **PROD** |
| `learning_magic_terminal` | **Rust** | README+CI+many tests; command-blocks v2; dirty(5) | **BETA** |
| `learning_notif_scanr` | **Swift** (Package.swift) | tests present, **no CI**, no Docker | **MVP** |
| `ios/learning_swift_hourglass` | Swift/SwiftUI macOS | MVVM, 2 test files, no CI | **MVP** |
| `learning_ai_magic_clipboard_mgr` | Swift/macOS, GRDB | 24 tests but 50+ services + phase-named tests (AI-scaffold smell) | **MVP** |
| `learning_ai_mac_tooling` | Python FastAPI + React | forensics toolkit; **0 tests**, 200+ `print()`, 3k-line files | **PROTO** |
| `copilot/learning_ai_uxui_web` | Next16 + MSW + Playwright | component showcase, Lighthouse CI | **MVP** |
| `learning_ai_productivity_web` | Next15, client-only | clean registry pattern, **0 tests** | **MVP** |
| `learning_ai_webui_copilot` | Python FastAPI + LangChain | rules/policy engines, **0 tests, no Docker/CI** | **MVP** |
| `learning_agent_monitoring_fx` | npm monorepo + KMP | agent/ingest/web work, native WIP, 54 `console.log`, TODOs | **BETA** |
| `learning_agentic_tools_portal` | Python Flask + uv | minimal (1 endpoint, 1 test), has CI | **PROTO** |
| `learning_server-survival-devops-web` | Vanilla JS + Three.js | playable game, **0 tests** | **MVP** |
| `learning_pytorch_todo_predictor` | Python + PyTorch | educational, **0 tests**, **no upstream** | **PROTO** |
| `learning_sidecar_setup` | Next16 scaffold + py stub | scaffolding only, **no upstream**, dirty(8) | **PROTO** |
| `learning_claude_code_setup` | Bash + markdown | setup notes/scripts; dirty(1) | **REF** |
| `learning_github_copilot` | Markdown (CLI/SDK docs) | reference only | **REF** |
| `learning_python_sandbox` | Python | LeetCode/learning; dirty(1) | **PROTO** |
| `learning_ai_materials` | Docs | NBA handover package | **REF** |
| `learning_windsurf_setup` | Usage logs | not a codebase | **N/A** |
---
## 4. Findings by Dimension
### A. Repository organization
- **Fact:** Strong, repeated conventions — `AGENTS.md`/`CLAUDE.md` per repo, pnpm
workspaces, `types→repository→routes` backend modules, `docs/` with PRD/ROADMAP.
- **Fact:** ~14 repos dirty at audit time; abandoned `worktrees/` (now cleaned);
some repos behind `origin`. Two repos (`pytorch_todo_predictor`,
`sidecar_setup`) have **no git upstream**.
- **Reco:** Adopt a "clean tree or it doesn't exist" rule (see §8). Add upstreams
for the two orphan repos or mark them clearly local.
### B. Code quality
- **Fact:** Best repos enforce strict TS (`0` `as any` in `notes`, `trails`,
`local_memory_gpt` backends), no `console.log` (Fastify logger), Zod validation.
- **Fact:** `learning_ai_2nd_brain` has 60+ `print()`; `mac_tooling` 200+ and
3k+-line files (`network_transfer_audit.py` 3521 lines); `magic_clipboard_mgr`
shows AI-scaffold smell (50+ service files, `Phase58`/`RemainingQATests`).
- **Reco:** Lint-gate `print()`/`console.log` in the Python/TS repos; split the
3k-line files; audit `magic_clipboard_mgr` for stubbed vs real services.
### C. Architecture
- **Fact:** Clear separation and reuse: shared auth/datastore/design-tokens,
deterministic scheduler (`flowmonk`), risk engine (`trails`), use-case registry
(`k8s_streaming`), MCP tool servers, Rust crate boundaries (`claw-*`).
- **Reco:** This is the strongest dimension — protect it by keeping product
domains out of `common_plat` and vice-versa.
### D. DevOps & deployment
- **Fact:** `docker-compose.ecosystem.yml` wires ~20 services (10 backends + 10
webs) + infra (Cosmos emulator, Azurite, Traefik, Loki, Grafana, MCP); 30
`restart:` policies, 24 `build:` contexts, but **0 `healthcheck:` blocks**.
- **Fact:** GH Actions disabled on `common_plat` + `voice_ai_agent`; ~15 repos no CI.
- **Reco (P1):** Add healthchecks + `depends_on: condition: service_healthy` to
the ecosystem compose; re-enable or fully migrate CI to gitea self-hosted.
### E. Testing
- **Fact:** `fastgap` (~700), `clock` (662), `notes` (80+ files), `voice_ai_agent`
(463+), `claw-cowork` (65+ files) are excellent; ~8 repos have 0 tests.
- **Fact:** E2E often `continue-on-error: true` (`fastgap`, `flowmonk`,
`jarvis_jr`, `local_memory_gpt`) — i.e. not actually gating.
- **Reco:** Set a per-repo minimum (smoke + happy-path) and stop masking E2E
failures with `continue-on-error` once stabilized.
### F. Security
- **Fact:** No real committed secrets across all repos. Matches were
`.env.example` placeholders, the public Cosmos emulator key
(`C2y6yDjf5/R...`), `dev-*` JWT secrets, and Azure Key Vault references.
- **Fact:** Field encryption (AES-256-GCM) in `clock`/`notes`/`dev_intelli`;
`unsafe_code = "forbid"` in the Rust repos.
- **Watch:** `NODE_TLS_REJECT_UNAUTHORIZED=0` seen in some Docker setups; thin
input validation / no rate-limiting in the prototype Python apps.
### G. Product readiness
- **Fact:** Web+backend pairs generally run end-to-end; native/mobile surfaces
(iOS/Android/KMP) are frequently partial or scaffolded.
- **Reco:** Pick 23 flagships (`notes`, `trails`, `clock`) and drive them to a
true launch checklist; treat the rest explicitly as experiments.
### H. AI-agent practices
- **Fact:** Sophisticated `agent-queue` (profiles, job briefs, lifecycle dirs,
Node dashboard) — genuinely advanced for a solo setup.
- **Fact:** Guardrails weak: agents run `--permission-mode dangerous`, write to
live working trees (caused the dirty-repo churn), and **landed duplicate work**
(during this session a rebase auto-dropped 2 commits already pushed upstream).
- **Reco:** Standardize the agent task contract (§8): one task = one branch =
clean tree → tests → commit → push; ignore runtime/queue state in git (already
fixed in this repo this session).
### I. Personal engineering workflow
- **Fact:** Conventional commits, auto `backup-main-*` branches (nice safety net),
`AGENTS.md` discipline.
- **Fact:** Too many long-lived dirty trees and behind-`origin` branches; no
visible issue tracker or release cadence.
- **Reco:** A weekly "sync sweep" (rebase+push all clean repos, list dirty) — you
effectively did this manually this session; automate it.
---
## 5. Prioritized Action Plan
**P0 — now (correctness / risk)**
1. **Re-establish a working CI gate on `learning_ai_common_plat`** (everything
depends on it). Either fix GH Actions billing or make gitea CI the enforced
gate. *(M, common_plat)*
2. **Resolve the ~14 dirty repos**: review + commit or discard intentionally;
add upstreams for `pytorch_todo_predictor` & `sidecar_setup`. *(M, workspace)*
3. **Decide the agent-queue daemon policy** so it doesn't write to live trees
uncontrolled (it was running in `dangerous` mode). *(S, devops_tools)*
**P1 — this week**
4. Add **healthchecks** to `docker-compose.ecosystem.yml` (0 today) + ordered
`depends_on`. *(M, common_plat/ecosystem)*
5. Stop masking E2E with `continue-on-error: true` once stabilized; make at least
smoke E2E gating. *(M, fastgap/flowmonk/jarvis_jr)*
6. Replace `print()` with logging in `2nd_brain` (60+) and `mac_tooling` (200+).
*(SM)*
**P2 — this month**
7. Add minimum test suites to the 0-test repos that matter (`productivity_web`,
`webui_copilot`, `agent_monitoring_fx`). *(M)*
8. Audit `magic_clipboard_mgr` for dead/stubbed services (50+ files). *(M)*
9. Split 3k-line files in `mac_tooling`. *(M)*
10. Remove `NODE_TLS_REJECT_UNAUTHORIZED=0` from Docker; add rate-limiting to the
Python prototypes. *(SM)*
**P3 — nice to have**
11. Portfolio-wide coverage reporting + dependency audit (`npm audit`/`pip-audit`)
in CI. *(M)*
12. A lightweight issue/release cadence for the 23 flagships. *(S)*
---
## 6. Safe Auto-Fix Candidates
*(Low-risk; listed only — not applied. Each needs your approval.)*
- **Ecosystem compose healthchecks** — add `healthcheck:` to each backend/web
service in `docker-compose.ecosystem.yml`. Safe: additive.
- **Add upstreams** for `learning_pytorch_todo_predictor` and
`learning_sidecar_setup` (`git remote add origin … && git push -u`). Safe once
remote exists.
- **Lint rule to ban `print()`** in `learning_ai_2nd_brain` (ruff `T20`) — flags
only; you fix incrementally.
- **Drop `NODE_TLS_REJECT_UNAUTHORIZED=0`** from Docker envs where a real CA/host
override is available. (Verify per service first.)
- **`.gitignore` audit** for the few repos still tracking runtime artifacts
(pattern already fixed in `devops_tools` this session).
## 7. Delegate-to-Agent Queue
Ready-to-paste briefs (each self-contained, one branch, clean-tree rule):
1. **"Add healthchecks to ecosystem compose"** — repo `common_plat`; read
`docker-compose.ecosystem.yml`; add `healthcheck` + ordered `depends_on` to
all `*-backend`/`*-web` services; `docker compose config` must pass; no app
code changes.
2. **"De-`print()` 2nd_brain"** — repo `learning_ai_2nd_brain`; replace `print()`
with `typer.echo`/logging in `src/brain/**`; keep behavior identical; run
`pytest`.
3. **"Bootstrap tests for webui_copilot"** — repo `learning_ai_webui_copilot`;
add `pytest` smoke tests for `site_backend` rules/policy engines + a copilot
happy-path; wire a `.github`/gitea CI job.
4. **"Service audit: magic_clipboard_mgr"** — repo `learning_ai_magic_clipboard_mgr`;
produce a report of which of the 50+ services are wired vs stubbed; no code
changes.
5. **"Stabilize E2E"** — repos `fastgap`/`flowmonk`; make smoke E2E reliable, then
remove `continue-on-error: true` for that job only.
## 8. Recommended Standard Operating Procedure (for every agent task)
1. **One task = one branch** off latest `origin/main`; never work on a dirty tree.
2. **Scope it** with a job brief (you already do this in `agent-queue/docs/jobs/`).
3. **Test before commit**: typecheck + lint + unit must pass locally.
4. **Commit small**, conventional messages; **push the branch**, open a PR — don't
let agents push straight to `main` of the shared platform.
5. **Never track runtime/queue state** (ignore `agent-queue/queue/*` lifecycle —
fixed here this session).
6. **Prefer least-privilege** over `--permission-mode dangerous`; reserve dangerous
mode for sandboxed/disposable checkouts.
7. **Weekly sync sweep**: rebase+push all clean repos, list dirty ones for review.
## 9. What I Could Not Inspect
- **No dynamic results.** I did not run `npm/pnpm install`, builds, `pytest`,
`vitest`, Playwright, `cargo test`, or `docker compose up` (those mutate trees /
need services). Test counts and CI configs are evidence of *intended* coverage,
not measured pass/coverage.
- **No live `git` per-repo ahead/behind** inside the read-only agents (they lacked
shell git); branch/dirty facts come from the orchestrator's own checks and may
have shifted as the agent-queue daemon ran.
- **One agent batch misfired**: it reported 5 repos as "missing"
(`claude_code_setup`, `github_copilot`, `magic_terminal`, `notif_scanr`,
`python_sandbox`) due to a read-access issue; I re-scanned them directly —
they exist (notably `magic_terminal` = Rust, `notif_scanr` = Swift).
- **Mobile/native depth** (iOS/Android/KMP/Tauri runtime behavior) and **secret
*values*** were not executed/decrypted — only presence/format was checked.
- **`.env.ecosystem`** holds dev-only values; production secret management
(Key Vault wiring) was inferred from references, not verified live.
---
### TL;DR
- Coherent **beta-grade product ecosystem** (~38 repos) — far beyond "learning".
- **Architecture & security are strong; CI & testing are the weak links.**
- **P0:** restore a CI gate on `common_plat`, clean the ~14 dirty repos, and rein
in the `dangerous`-mode agent-queue.
- A handful of flagships (`notes`, `trails`, `claw-*`, `clock`, `fastgap`) are
genuinely production-grade; the long tail is MVP/prototype.
- Tighten the agent commit/CI loop (§8) and most of the operational churn
converts back into velocity.