saravanakumardb1 32162312a9 docs: add workspace engineering review & scorecard

Read-only, evidence-based review of the ~38-repo workspace produced from
docs/prompts/engineering-review-scorecard.md: per-repo breakdown, 1-10
category scorecard (weighted overall ~7.0, beta-grade), prioritized P0-P3
action plan, safe auto-fix candidates, delegate-to-agent queue, and an
agent SOP. No code changes.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

2026-05-30 20:53:44 -07:00

20 KiB

Raw Blame History

Engineering Review & Scorecard

Evidence-based, read-only review of the entire ~/code/mygh workspace (~38 git repos) per docs/prompts/engineering-review-scorecard.md. Generated 2026-05-30.

Method: static inspection only — file reads, grep, and read-only git. No builds, installs, or test runs were executed (that would mutate the trees), so dynamic results (pass/fail, coverage %) are inferred from config + test counts, not measured. See §9 for limits. Per-repo evidence was gathered by parallel read-only agents and spot-verified.

1. Executive Summary

What this is: a single developer running a surprisingly coherent product ecosystem — ~10 product apps (clock, notes, fastgap, peakpulse, flowmonk, efforise, jarvis_jr, trails, talk2obsidian, local-memory-gpt, voice-ai-agent, multimodal/mindlyst) sharing one platform monorepo (learning_ai_common_plat, 36 @bytelyst/* packages, auth/Cosmos/design-tokens), orchestrated by a single docker-compose.ecosystem.yml (~20 services) and driven heavily by AI agents through a homegrown agent-queue. This is far more disciplined than a typical "learning" folder.

Overall maturity: Beta-quality ecosystem. A core of genuinely production-grade repos (learning_ai_notes, learning_ai_trails, oss/claw-code/claw-cowork, learning_ai_clock, learning_ai_fastgap) surrounded by a long tail of MVP/prototype repos with thin or zero tests and no CI.

Biggest strengths (top 3)

Strong platform discipline. Shared @bytelyst/* packages, a repeated types.ts → repository.ts → routes.ts backend pattern, Cosmos partition-key conventions (/userId, productId on every doc), per-repo AGENTS.md, conventional commits, and field-level encryption (field-encrypt.ts) recur across the best repos.
Clean security posture for a personal workspace. Secret scans across all repos surfaced no real committed production secrets — only .env.example placeholders, the public Azure Cosmos emulator key, dev JWT_SECRET=dev-... values, and Azure Key Vault references. .gitignore is present nearly everywhere.
Top repos are legitimately good. notes, trails, and the two Rust claw-* repos show modular architecture, real test suites (28–80+ files), CI, multi-stage Docker, and strict typing (0 as any in several backends).

Biggest risks (top 3)

CI is the weak link. GitHub Actions is disabled (billing) on the platform monorepo learning_ai_common_plat and on voice_ai_agent (*.disabled workflows); ~15 repos have no CI at all. The shared platform that everything depends on has no automated gate.
Process churn dirties the repos. A live agent-queue daemon + devin agents in --permission-mode dangerous were actively writing to repos; ~14 repos were found dirty with uncommitted work, several behind origin. Work is at risk of being lost or silently diverging.
Testing is bimodal. Excellent in the flagship repos, zero in many others (productivity_web, webui_copilot, pytorch_todo_predictor, server-survival, sidecar_setup, mac_tooling). No portfolio-wide coverage signal.

Is the dev style helping or hurting velocity? Net helping, but fraying at the edges. The platform/agent approach clearly lets one person ship a dozen apps — that's the upside. The drag is operational: disabled CI, constantly-dirty working trees, abandoned worktrees, and "AI-generated scaffolding smell" in a few repos (e.g. magic_clipboard_mgr's 50+ service files + phase-named test buckets). Tightening the commit/CI loop would convert a lot of that churn back into velocity.

2. Overall Score Sheet

Scores are 1–10 (1 = critical/broken, 10 = production-grade), aggregated across the ~30 code repos (pure docs/usage repos excluded from category math).

Category	Score	Justification (evidence)
A. Repository organization	8	Consistent `@bytelyst/*` + `types/repository/routes` pattern, per-repo `AGENTS.md`, clear monorepos; minus for ~14 dirty trees, stray worktrees, a few unstructured repos.
B. Code quality	7	Flagships: strict TS, `0` `as any`, no `console.log`, Zod validation. Tail: `print()`-heavy (`2nd_brain` 60+, `mac_tooling` 200+), `any` leaks, AI-scaffold smell (`magic_clipboard_mgr`).
C. Architecture	8	Genuinely strong: shared platform, datastore abstraction, deterministic engines (`flowmonk` scheduler), risk-scoring (`trails`), MCP integrations, clean native/web boundaries.
D. DevOps & deployment	6	Ecosystem compose orchestrates ~20 services, multi-stage Dockerfiles common — but CI disabled on the platform repo, ~15 repos with no CI, and 0 healthchecks in `docker-compose.ecosystem.yml`.
E. Testing	6	Bimodal: `notes`/`fastgap`/`clock`/`trails`/`claw-*` have 28–600+ tests; many repos have 0. E2E frequently `continue-on-error: true`. No measured coverage.
F. Security	8	No real committed secrets anywhere; field encryption + Key Vault refs in the mature repos; `.gitignore`/`.env.example` discipline. Minus for `NODE_TLS_REJECT_UNAUTHORIZED=0` in some Docker, thin input-validation in prototypes.
G. Product readiness	7	Several apps runnable end-to-end (web+backend); mobile/native surfaces often partial; CI-disabled + flaky E2E hold back true "launchable".
H. AI-agent practices	6	Impressive tooling (`agent-queue`, profiles, job briefs, `AGENTS.md`), but guardrails are weak: `--permission-mode dangerous`, agents dirtying live repos, duplicate work landing upstream, no enforced test-before-commit.
I. Personal workflow	6	Good: conventional commits, auto `backup-main-*` branches, `AGENTS.md`. Bad: ~14 dirty repos, branches behind `origin`, abandoned worktrees, no unified release/issue discipline.
Weighted overall	≈ 7.0	Beta-quality. See weighting below.

Weighting & rationale: Security (F) and Product readiness (G) weighted ~1.5×, Testing (E) and DevOps (D) ~1.25× (these gate real-world reliability); A/B/C/H/I at 1.0×. The strong architecture/security pull the number up; the weak CI/testing pull it back to a solid-but-not-shippable ~7.0.

3. Per-Product / Per-Repo Breakdown

Maturity legend: PROD = production-grade, BETA, MVP, PROTO = prototype/learning, REF = docs/reference (not code).

Flagship products (platform-integrated)

Repo	Stack	Tests	CI	Docker	Maturity
`learning_ai_notes`	Fastify5 + Next16 + Expo, Cosmos	80+ files	✓ gitea	✓	BETA→PROD
`learning_ai_trails`	Fastify5 + Next16 + SDK, Cosmos	28 files	✓ gitea	✓	PROD
`learning_ai_clock`	Next16 PWA + iOS/Android, Fastify	662 total	✓ gitea	✓	BETA
`learning_ai_fastgap`	Expo + Next16 + Fastify	700+ total	✓ gitea (7 jobs)	✓	BETA
`learning_ai_peakpulse`	SwiftUI + Fastify	26 files	✓ (backend)	✓	BETA→PROD
`learning_ai_flowmonk`	Next16 + Fastify + Expo	102 backend	✓ gitea	✓	BETA
`learning_ai_efforise`	React/Vite + Fastify + RN	~9 backend	✓ gitea	✓	MVP
`learning_ai_dev_intelli`	Fastify + Next16, GitHub API	52 backend	✓ gitea	✓	MVP
`learning_ai_local_memory_gpt`	Fastify + Next16, SQLite/Ollama	122	✓ gitea	✓	MVP
`learning_ai_talk2obsidian`	Fastify + Vite, SQLite/Ollama	8	✗	✓	BETA
`learning_voice_ai_agent`	Python + Fastify + Next + KMP	463+	⚠ disabled	✓	BETA
`learning_multimodal_memory_agents` (MindLyst)	KMP + Next + Fastify	33	⚠ disabled	✓	MVP
`learning_ai_jarvis_jr`	SwiftUI + Next + Android	~13 web	✓ gitea	✓	ALPHA/BETA
`learning_ai_auth_app`	iOS/watchOS/Android (spec+UI)	0 (here)	✗	✗	MVP (spec)

Platform & infra

Repo	Stack	Notes	Maturity
`learning_ai_common_plat`	pnpm monorepo, 36 `@bytelyst/*`, Fastify, Cosmos	~466k LOC; full auth (OAuth/MFA/passkeys/SAML); GH Actions disabled (billing), gitea CI active	PROD
`learning_ai_devops_tools`	Bash + Python + Node (this repo)	GitHub admin scripts, `agent-queue`, Hermes dashboard; thin tests	PROD (scripts) / MVP (dash)
`learning_ai_k8s_streaming`	Python FastAPI + Helm	Use-case registry, HPA/probes, load tools	BETA→PROD
`learning_ai_local_llms`	Next16 dashboard + Python TTS	Ollama mission-control; 57 tests	BETA

Tools / OSS / native

Repo	Stack	Notes	Maturity
`oss/learning_ai_claw-code-oss`	Rust workspace (10+ crates)	`unsafe forbid`, clippy pedantic, 40+ test files	PROD
`oss/learning_ai_claw-cowork`	Rust + Tauri + Python	65+ test files, E2E, Docker	PROD
`learning_magic_terminal`	Rust	README+CI+many tests; command-blocks v2; dirty(5)	BETA
`learning_notif_scanr`	Swift (Package.swift)	tests present, no CI, no Docker	MVP
`ios/learning_swift_hourglass`	Swift/SwiftUI macOS	MVVM, 2 test files, no CI	MVP
`learning_ai_magic_clipboard_mgr`	Swift/macOS, GRDB	24 tests but 50+ services + phase-named tests (AI-scaffold smell)	MVP
`learning_ai_mac_tooling`	Python FastAPI + React	forensics toolkit; 0 tests, 200+ `print()`, 3k-line files	PROTO
`copilot/learning_ai_uxui_web`	Next16 + MSW + Playwright	component showcase, Lighthouse CI	MVP
`learning_ai_productivity_web`	Next15, client-only	clean registry pattern, 0 tests	MVP
`learning_ai_webui_copilot`	Python FastAPI + LangChain	rules/policy engines, 0 tests, no Docker/CI	MVP
`learning_agent_monitoring_fx`	npm monorepo + KMP	agent/ingest/web work, native WIP, 54 `console.log`, TODOs	BETA
`learning_agentic_tools_portal`	Python Flask + uv	minimal (1 endpoint, 1 test), has CI	PROTO
`learning_server-survival-devops-web`	Vanilla JS + Three.js	playable game, 0 tests	MVP
`learning_pytorch_todo_predictor`	Python + PyTorch	educational, 0 tests, no upstream	PROTO
`learning_sidecar_setup`	Next16 scaffold + py stub	scaffolding only, no upstream, dirty(8)	PROTO
`learning_claude_code_setup`	Bash + markdown	setup notes/scripts; dirty(1)	REF
`learning_github_copilot`	Markdown (CLI/SDK docs)	reference only	REF
`learning_python_sandbox`	Python	LeetCode/learning; dirty(1)	PROTO
`learning_ai_materials`	Docs	NBA handover package	REF
`learning_windsurf_setup`	Usage logs	not a codebase	N/A

4. Findings by Dimension

A. Repository organization

Fact: Strong, repeated conventions — AGENTS.md/CLAUDE.md per repo, pnpm workspaces, types→repository→routes backend modules, docs/ with PRD/ROADMAP.
Fact: ~14 repos dirty at audit time; abandoned worktrees/ (now cleaned); some repos behind origin. Two repos (pytorch_todo_predictor, sidecar_setup) have no git upstream.
Reco: Adopt a "clean tree or it doesn't exist" rule (see §8). Add upstreams for the two orphan repos or mark them clearly local.

B. Code quality

Fact: Best repos enforce strict TS (0 as any in notes, trails, local_memory_gpt backends), no console.log (Fastify logger), Zod validation.
Fact: learning_ai_2nd_brain has 60+ print(); mac_tooling 200+ and 3k+-line files (network_transfer_audit.py 3521 lines); magic_clipboard_mgr shows AI-scaffold smell (50+ service files, Phase5–8/RemainingQATests).
Reco: Lint-gate print()/console.log in the Python/TS repos; split the 3k-line files; audit magic_clipboard_mgr for stubbed vs real services.

C. Architecture

Fact: Clear separation and reuse: shared auth/datastore/design-tokens, deterministic scheduler (flowmonk), risk engine (trails), use-case registry (k8s_streaming), MCP tool servers, Rust crate boundaries (claw-*).
Reco: This is the strongest dimension — protect it by keeping product domains out of common_plat and vice-versa.

D. DevOps & deployment

Fact: docker-compose.ecosystem.yml wires ~20 services (10 backends + 10 webs) + infra (Cosmos emulator, Azurite, Traefik, Loki, Grafana, MCP); 30 restart: policies, 24 build: contexts, but 0 healthcheck: blocks.
Fact: GH Actions disabled on common_plat + voice_ai_agent; ~15 repos no CI.
Reco (P1): Add healthchecks + depends_on: condition: service_healthy to the ecosystem compose; re-enable or fully migrate CI to gitea self-hosted.

E. Testing

Fact: fastgap (~700), clock (662), notes (80+ files), voice_ai_agent (463+), claw-cowork (65+ files) are excellent; ~8 repos have 0 tests.
Fact: E2E often continue-on-error: true (fastgap, flowmonk, jarvis_jr, local_memory_gpt) — i.e. not actually gating.
Reco: Set a per-repo minimum (smoke + happy-path) and stop masking E2E failures with continue-on-error once stabilized.

F. Security

Fact: No real committed secrets across all repos. Matches were .env.example placeholders, the public Cosmos emulator key (C2y6yDjf5/R...), dev-* JWT secrets, and Azure Key Vault references.
Fact: Field encryption (AES-256-GCM) in clock/notes/dev_intelli; unsafe_code = "forbid" in the Rust repos.
Watch: NODE_TLS_REJECT_UNAUTHORIZED=0 seen in some Docker setups; thin input validation / no rate-limiting in the prototype Python apps.

G. Product readiness

Fact: Web+backend pairs generally run end-to-end; native/mobile surfaces (iOS/Android/KMP) are frequently partial or scaffolded.
Reco: Pick 2–3 flagships (notes, trails, clock) and drive them to a true launch checklist; treat the rest explicitly as experiments.

H. AI-agent practices

Fact: Sophisticated agent-queue (profiles, job briefs, lifecycle dirs, Node dashboard) — genuinely advanced for a solo setup.
Fact: Guardrails weak: agents run --permission-mode dangerous, write to live working trees (caused the dirty-repo churn), and landed duplicate work (during this session a rebase auto-dropped 2 commits already pushed upstream).
Reco: Standardize the agent task contract (§8): one task = one branch = clean tree → tests → commit → push; ignore runtime/queue state in git (already fixed in this repo this session).

I. Personal engineering workflow

Fact: Conventional commits, auto backup-main-* branches (nice safety net), AGENTS.md discipline.
Fact: Too many long-lived dirty trees and behind-origin branches; no visible issue tracker or release cadence.
Reco: A weekly "sync sweep" (rebase+push all clean repos, list dirty) — you effectively did this manually this session; automate it.

5. Prioritized Action Plan

P0 — now (correctness / risk)

Re-establish a working CI gate on learning_ai_common_plat (everything depends on it). Either fix GH Actions billing or make gitea CI the enforced gate. (M, common_plat)
Resolve the ~14 dirty repos: review + commit or discard intentionally; add upstreams for pytorch_todo_predictor & sidecar_setup. (M, workspace)
Decide the agent-queue daemon policy so it doesn't write to live trees uncontrolled (it was running in dangerous mode). (S, devops_tools)

P1 — this week 4. Add healthchecks to docker-compose.ecosystem.yml (0 today) + ordered depends_on. (M, common_plat/ecosystem) 5. Stop masking E2E with continue-on-error: true once stabilized; make at least smoke E2E gating. (M, fastgap/flowmonk/jarvis_jr) 6. Replace print() with logging in 2nd_brain (60+) and mac_tooling (200+). (S–M)

P2 — this month 7. Add minimum test suites to the 0-test repos that matter (productivity_web, webui_copilot, agent_monitoring_fx). (M) 8. Audit magic_clipboard_mgr for dead/stubbed services (50+ files). (M) 9. Split 3k-line files in mac_tooling. (M) 10. Remove NODE_TLS_REJECT_UNAUTHORIZED=0 from Docker; add rate-limiting to the Python prototypes. (S–M)

P3 — nice to have 11. Portfolio-wide coverage reporting + dependency audit (npm audit/pip-audit) in CI. (M) 12. A lightweight issue/release cadence for the 2–3 flagships. (S)

6. Safe Auto-Fix Candidates

(Low-risk; listed only — not applied. Each needs your approval.)

Ecosystem compose healthchecks — add healthcheck: to each backend/web service in docker-compose.ecosystem.yml. Safe: additive.
Add upstreams for learning_pytorch_todo_predictor and learning_sidecar_setup (git remote add origin … && git push -u). Safe once remote exists.
Lint rule to ban print() in learning_ai_2nd_brain (ruff T20) — flags only; you fix incrementally.
Drop NODE_TLS_REJECT_UNAUTHORIZED=0 from Docker envs where a real CA/host override is available. (Verify per service first.)
.gitignore audit for the few repos still tracking runtime artifacts (pattern already fixed in devops_tools this session).

7. Delegate-to-Agent Queue

Ready-to-paste briefs (each self-contained, one branch, clean-tree rule):

"Add healthchecks to ecosystem compose" — repo common_plat; read docker-compose.ecosystem.yml; add healthcheck + ordered depends_on to all *-backend/*-web services; docker compose config must pass; no app code changes.
"De-print() 2nd_brain" — repo learning_ai_2nd_brain; replace print() with typer.echo/logging in src/brain/**; keep behavior identical; run pytest.
"Bootstrap tests for webui_copilot" — repo learning_ai_webui_copilot; add pytest smoke tests for site_backend rules/policy engines + a copilot happy-path; wire a .github/gitea CI job.
"Service audit: magic_clipboard_mgr" — repo learning_ai_magic_clipboard_mgr; produce a report of which of the 50+ services are wired vs stubbed; no code changes.
"Stabilize E2E" — repos fastgap/flowmonk; make smoke E2E reliable, then remove continue-on-error: true for that job only.

8. Recommended Standard Operating Procedure (for every agent task)

One task = one branch off latest origin/main; never work on a dirty tree.
Scope it with a job brief (you already do this in agent-queue/docs/jobs/).
Test before commit: typecheck + lint + unit must pass locally.
Commit small, conventional messages; push the branch, open a PR — don't let agents push straight to main of the shared platform.
Never track runtime/queue state (ignore agent-queue/queue/* lifecycle — fixed here this session).
Prefer least-privilege over --permission-mode dangerous; reserve dangerous mode for sandboxed/disposable checkouts.
Weekly sync sweep: rebase+push all clean repos, list dirty ones for review.

9. What I Could Not Inspect

No dynamic results. I did not run npm/pnpm install, builds, pytest, vitest, Playwright, cargo test, or docker compose up (those mutate trees / need services). Test counts and CI configs are evidence of intended coverage, not measured pass/coverage.
No live git per-repo ahead/behind inside the read-only agents (they lacked shell git); branch/dirty facts come from the orchestrator's own checks and may have shifted as the agent-queue daemon ran.
One agent batch misfired: it reported 5 repos as "missing" (claude_code_setup, github_copilot, magic_terminal, notif_scanr, python_sandbox) due to a read-access issue; I re-scanned them directly — they exist (notably magic_terminal = Rust, notif_scanr = Swift).
Mobile/native depth (iOS/Android/KMP/Tauri runtime behavior) and secret values were not executed/decrypted — only presence/format was checked.
.env.ecosystem holds dev-only values; production secret management (Key Vault wiring) was inferred from references, not verified live.

TL;DR

Coherent beta-grade product ecosystem (~38 repos) — far beyond "learning".
Architecture & security are strong; CI & testing are the weak links.
P0: restore a CI gate on common_plat, clean the ~14 dirty repos, and rein in the dangerous-mode agent-queue.
A handful of flagships (notes, trails, claw-*, clock, fastgap) are genuinely production-grade; the long tail is MVP/prototype.
Tighten the agent commit/CI loop (§8) and most of the operational churn converts back into velocity.

20 KiB Raw Blame History Unescape Escape