bytelyst-devops-tools/ENGINEERING_REVIEW_SCORECARD.md
saravanakumardb1 32162312a9 docs: add workspace engineering review & scorecard
Read-only, evidence-based review of the ~38-repo workspace produced from
docs/prompts/engineering-review-scorecard.md: per-repo breakdown, 1-10
category scorecard (weighted overall ~7.0, beta-grade), prioritized P0-P3
action plan, safe auto-fix candidates, delegate-to-agent queue, and an
agent SOP. No code changes.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2026-05-30 20:53:44 -07:00

20 KiB
Raw Blame History

Engineering Review & Scorecard

Evidence-based, read-only review of the entire ~/code/mygh workspace (~38 git repos) per docs/prompts/engineering-review-scorecard.md. Generated 2026-05-30.

Method: static inspection only — file reads, grep, and read-only git. No builds, installs, or test runs were executed (that would mutate the trees), so dynamic results (pass/fail, coverage %) are inferred from config + test counts, not measured. See §9 for limits. Per-repo evidence was gathered by parallel read-only agents and spot-verified.


1. Executive Summary

What this is: a single developer running a surprisingly coherent product ecosystem — ~10 product apps (clock, notes, fastgap, peakpulse, flowmonk, efforise, jarvis_jr, trails, talk2obsidian, local-memory-gpt, voice-ai-agent, multimodal/mindlyst) sharing one platform monorepo (learning_ai_common_plat, 36 @bytelyst/* packages, auth/Cosmos/design-tokens), orchestrated by a single docker-compose.ecosystem.yml (~20 services) and driven heavily by AI agents through a homegrown agent-queue. This is far more disciplined than a typical "learning" folder.

Overall maturity: Beta-quality ecosystem. A core of genuinely production-grade repos (learning_ai_notes, learning_ai_trails, oss/claw-code/claw-cowork, learning_ai_clock, learning_ai_fastgap) surrounded by a long tail of MVP/prototype repos with thin or zero tests and no CI.

Biggest strengths (top 3)

  1. Strong platform discipline. Shared @bytelyst/* packages, a repeated types.ts → repository.ts → routes.ts backend pattern, Cosmos partition-key conventions (/userId, productId on every doc), per-repo AGENTS.md, conventional commits, and field-level encryption (field-encrypt.ts) recur across the best repos.
  2. Clean security posture for a personal workspace. Secret scans across all repos surfaced no real committed production secrets — only .env.example placeholders, the public Azure Cosmos emulator key, dev JWT_SECRET=dev-... values, and Azure Key Vault references. .gitignore is present nearly everywhere.
  3. Top repos are legitimately good. notes, trails, and the two Rust claw-* repos show modular architecture, real test suites (2880+ files), CI, multi-stage Docker, and strict typing (0 as any in several backends).

Biggest risks (top 3)

  1. CI is the weak link. GitHub Actions is disabled (billing) on the platform monorepo learning_ai_common_plat and on voice_ai_agent (*.disabled workflows); ~15 repos have no CI at all. The shared platform that everything depends on has no automated gate.
  2. Process churn dirties the repos. A live agent-queue daemon + devin agents in --permission-mode dangerous were actively writing to repos; ~14 repos were found dirty with uncommitted work, several behind origin. Work is at risk of being lost or silently diverging.
  3. Testing is bimodal. Excellent in the flagship repos, zero in many others (productivity_web, webui_copilot, pytorch_todo_predictor, server-survival, sidecar_setup, mac_tooling). No portfolio-wide coverage signal.

Is the dev style helping or hurting velocity? Net helping, but fraying at the edges. The platform/agent approach clearly lets one person ship a dozen apps — that's the upside. The drag is operational: disabled CI, constantly-dirty working trees, abandoned worktrees, and "AI-generated scaffolding smell" in a few repos (e.g. magic_clipboard_mgr's 50+ service files + phase-named test buckets). Tightening the commit/CI loop would convert a lot of that churn back into velocity.


2. Overall Score Sheet

Scores are 110 (1 = critical/broken, 10 = production-grade), aggregated across the ~30 code repos (pure docs/usage repos excluded from category math).

Category Score Justification (evidence)
A. Repository organization 8 Consistent @bytelyst/* + types/repository/routes pattern, per-repo AGENTS.md, clear monorepos; minus for ~14 dirty trees, stray worktrees, a few unstructured repos.
B. Code quality 7 Flagships: strict TS, 0 as any, no console.log, Zod validation. Tail: print()-heavy (2nd_brain 60+, mac_tooling 200+), any leaks, AI-scaffold smell (magic_clipboard_mgr).
C. Architecture 8 Genuinely strong: shared platform, datastore abstraction, deterministic engines (flowmonk scheduler), risk-scoring (trails), MCP integrations, clean native/web boundaries.
D. DevOps & deployment 6 Ecosystem compose orchestrates ~20 services, multi-stage Dockerfiles common — but CI disabled on the platform repo, ~15 repos with no CI, and 0 healthchecks in docker-compose.ecosystem.yml.
E. Testing 6 Bimodal: notes/fastgap/clock/trails/claw-* have 28600+ tests; many repos have 0. E2E frequently continue-on-error: true. No measured coverage.
F. Security 8 No real committed secrets anywhere; field encryption + Key Vault refs in the mature repos; .gitignore/.env.example discipline. Minus for NODE_TLS_REJECT_UNAUTHORIZED=0 in some Docker, thin input-validation in prototypes.
G. Product readiness 7 Several apps runnable end-to-end (web+backend); mobile/native surfaces often partial; CI-disabled + flaky E2E hold back true "launchable".
H. AI-agent practices 6 Impressive tooling (agent-queue, profiles, job briefs, AGENTS.md), but guardrails are weak: --permission-mode dangerous, agents dirtying live repos, duplicate work landing upstream, no enforced test-before-commit.
I. Personal workflow 6 Good: conventional commits, auto backup-main-* branches, AGENTS.md. Bad: ~14 dirty repos, branches behind origin, abandoned worktrees, no unified release/issue discipline.
Weighted overall ≈ 7.0 Beta-quality. See weighting below.

Weighting & rationale: Security (F) and Product readiness (G) weighted ~1.5×, Testing (E) and DevOps (D) ~1.25× (these gate real-world reliability); A/B/C/H/I at 1.0×. The strong architecture/security pull the number up; the weak CI/testing pull it back to a solid-but-not-shippable ~7.0.


3. Per-Product / Per-Repo Breakdown

Maturity legend: PROD = production-grade, BETA, MVP, PROTO = prototype/learning, REF = docs/reference (not code).

Flagship products (platform-integrated)

Repo Stack Tests CI Docker Maturity
learning_ai_notes Fastify5 + Next16 + Expo, Cosmos 80+ files ✓ gitea BETA→PROD
learning_ai_trails Fastify5 + Next16 + SDK, Cosmos 28 files ✓ gitea PROD
learning_ai_clock Next16 PWA + iOS/Android, Fastify 662 total ✓ gitea BETA
learning_ai_fastgap Expo + Next16 + Fastify 700+ total ✓ gitea (7 jobs) BETA
learning_ai_peakpulse SwiftUI + Fastify 26 files ✓ (backend) BETA→PROD
learning_ai_flowmonk Next16 + Fastify + Expo 102 backend ✓ gitea BETA
learning_ai_efforise React/Vite + Fastify + RN ~9 backend ✓ gitea MVP
learning_ai_dev_intelli Fastify + Next16, GitHub API 52 backend ✓ gitea MVP
learning_ai_local_memory_gpt Fastify + Next16, SQLite/Ollama 122 ✓ gitea MVP
learning_ai_talk2obsidian Fastify + Vite, SQLite/Ollama 8 BETA
learning_voice_ai_agent Python + Fastify + Next + KMP 463+ ⚠ disabled BETA
learning_multimodal_memory_agents (MindLyst) KMP + Next + Fastify 33 ⚠ disabled MVP
learning_ai_jarvis_jr SwiftUI + Next + Android ~13 web ✓ gitea ALPHA/BETA
learning_ai_auth_app iOS/watchOS/Android (spec+UI) 0 (here) MVP (spec)

Platform & infra

Repo Stack Notes Maturity
learning_ai_common_plat pnpm monorepo, 36 @bytelyst/*, Fastify, Cosmos ~466k LOC; full auth (OAuth/MFA/passkeys/SAML); GH Actions disabled (billing), gitea CI active PROD
learning_ai_devops_tools Bash + Python + Node (this repo) GitHub admin scripts, agent-queue, Hermes dashboard; thin tests PROD (scripts) / MVP (dash)
learning_ai_k8s_streaming Python FastAPI + Helm Use-case registry, HPA/probes, load tools BETA→PROD
learning_ai_local_llms Next16 dashboard + Python TTS Ollama mission-control; 57 tests BETA

Tools / OSS / native

Repo Stack Notes Maturity
oss/learning_ai_claw-code-oss Rust workspace (10+ crates) unsafe forbid, clippy pedantic, 40+ test files PROD
oss/learning_ai_claw-cowork Rust + Tauri + Python 65+ test files, E2E, Docker PROD
learning_magic_terminal Rust README+CI+many tests; command-blocks v2; dirty(5) BETA
learning_notif_scanr Swift (Package.swift) tests present, no CI, no Docker MVP
ios/learning_swift_hourglass Swift/SwiftUI macOS MVVM, 2 test files, no CI MVP
learning_ai_magic_clipboard_mgr Swift/macOS, GRDB 24 tests but 50+ services + phase-named tests (AI-scaffold smell) MVP
learning_ai_mac_tooling Python FastAPI + React forensics toolkit; 0 tests, 200+ print(), 3k-line files PROTO
copilot/learning_ai_uxui_web Next16 + MSW + Playwright component showcase, Lighthouse CI MVP
learning_ai_productivity_web Next15, client-only clean registry pattern, 0 tests MVP
learning_ai_webui_copilot Python FastAPI + LangChain rules/policy engines, 0 tests, no Docker/CI MVP
learning_agent_monitoring_fx npm monorepo + KMP agent/ingest/web work, native WIP, 54 console.log, TODOs BETA
learning_agentic_tools_portal Python Flask + uv minimal (1 endpoint, 1 test), has CI PROTO
learning_server-survival-devops-web Vanilla JS + Three.js playable game, 0 tests MVP
learning_pytorch_todo_predictor Python + PyTorch educational, 0 tests, no upstream PROTO
learning_sidecar_setup Next16 scaffold + py stub scaffolding only, no upstream, dirty(8) PROTO
learning_claude_code_setup Bash + markdown setup notes/scripts; dirty(1) REF
learning_github_copilot Markdown (CLI/SDK docs) reference only REF
learning_python_sandbox Python LeetCode/learning; dirty(1) PROTO
learning_ai_materials Docs NBA handover package REF
learning_windsurf_setup Usage logs not a codebase N/A

4. Findings by Dimension

A. Repository organization

  • Fact: Strong, repeated conventions — AGENTS.md/CLAUDE.md per repo, pnpm workspaces, types→repository→routes backend modules, docs/ with PRD/ROADMAP.
  • Fact: ~14 repos dirty at audit time; abandoned worktrees/ (now cleaned); some repos behind origin. Two repos (pytorch_todo_predictor, sidecar_setup) have no git upstream.
  • Reco: Adopt a "clean tree or it doesn't exist" rule (see §8). Add upstreams for the two orphan repos or mark them clearly local.

B. Code quality

  • Fact: Best repos enforce strict TS (0 as any in notes, trails, local_memory_gpt backends), no console.log (Fastify logger), Zod validation.
  • Fact: learning_ai_2nd_brain has 60+ print(); mac_tooling 200+ and 3k+-line files (network_transfer_audit.py 3521 lines); magic_clipboard_mgr shows AI-scaffold smell (50+ service files, Phase58/RemainingQATests).
  • Reco: Lint-gate print()/console.log in the Python/TS repos; split the 3k-line files; audit magic_clipboard_mgr for stubbed vs real services.

C. Architecture

  • Fact: Clear separation and reuse: shared auth/datastore/design-tokens, deterministic scheduler (flowmonk), risk engine (trails), use-case registry (k8s_streaming), MCP tool servers, Rust crate boundaries (claw-*).
  • Reco: This is the strongest dimension — protect it by keeping product domains out of common_plat and vice-versa.

D. DevOps & deployment

  • Fact: docker-compose.ecosystem.yml wires ~20 services (10 backends + 10 webs) + infra (Cosmos emulator, Azurite, Traefik, Loki, Grafana, MCP); 30 restart: policies, 24 build: contexts, but 0 healthcheck: blocks.
  • Fact: GH Actions disabled on common_plat + voice_ai_agent; ~15 repos no CI.
  • Reco (P1): Add healthchecks + depends_on: condition: service_healthy to the ecosystem compose; re-enable or fully migrate CI to gitea self-hosted.

E. Testing

  • Fact: fastgap (~700), clock (662), notes (80+ files), voice_ai_agent (463+), claw-cowork (65+ files) are excellent; ~8 repos have 0 tests.
  • Fact: E2E often continue-on-error: true (fastgap, flowmonk, jarvis_jr, local_memory_gpt) — i.e. not actually gating.
  • Reco: Set a per-repo minimum (smoke + happy-path) and stop masking E2E failures with continue-on-error once stabilized.

F. Security

  • Fact: No real committed secrets across all repos. Matches were .env.example placeholders, the public Cosmos emulator key (C2y6yDjf5/R...), dev-* JWT secrets, and Azure Key Vault references.
  • Fact: Field encryption (AES-256-GCM) in clock/notes/dev_intelli; unsafe_code = "forbid" in the Rust repos.
  • Watch: NODE_TLS_REJECT_UNAUTHORIZED=0 seen in some Docker setups; thin input validation / no rate-limiting in the prototype Python apps.

G. Product readiness

  • Fact: Web+backend pairs generally run end-to-end; native/mobile surfaces (iOS/Android/KMP) are frequently partial or scaffolded.
  • Reco: Pick 23 flagships (notes, trails, clock) and drive them to a true launch checklist; treat the rest explicitly as experiments.

H. AI-agent practices

  • Fact: Sophisticated agent-queue (profiles, job briefs, lifecycle dirs, Node dashboard) — genuinely advanced for a solo setup.
  • Fact: Guardrails weak: agents run --permission-mode dangerous, write to live working trees (caused the dirty-repo churn), and landed duplicate work (during this session a rebase auto-dropped 2 commits already pushed upstream).
  • Reco: Standardize the agent task contract (§8): one task = one branch = clean tree → tests → commit → push; ignore runtime/queue state in git (already fixed in this repo this session).

I. Personal engineering workflow

  • Fact: Conventional commits, auto backup-main-* branches (nice safety net), AGENTS.md discipline.
  • Fact: Too many long-lived dirty trees and behind-origin branches; no visible issue tracker or release cadence.
  • Reco: A weekly "sync sweep" (rebase+push all clean repos, list dirty) — you effectively did this manually this session; automate it.

5. Prioritized Action Plan

P0 — now (correctness / risk)

  1. Re-establish a working CI gate on learning_ai_common_plat (everything depends on it). Either fix GH Actions billing or make gitea CI the enforced gate. (M, common_plat)
  2. Resolve the ~14 dirty repos: review + commit or discard intentionally; add upstreams for pytorch_todo_predictor & sidecar_setup. (M, workspace)
  3. Decide the agent-queue daemon policy so it doesn't write to live trees uncontrolled (it was running in dangerous mode). (S, devops_tools)

P1 — this week 4. Add healthchecks to docker-compose.ecosystem.yml (0 today) + ordered depends_on. (M, common_plat/ecosystem) 5. Stop masking E2E with continue-on-error: true once stabilized; make at least smoke E2E gating. (M, fastgap/flowmonk/jarvis_jr) 6. Replace print() with logging in 2nd_brain (60+) and mac_tooling (200+). (SM)

P2 — this month 7. Add minimum test suites to the 0-test repos that matter (productivity_web, webui_copilot, agent_monitoring_fx). (M) 8. Audit magic_clipboard_mgr for dead/stubbed services (50+ files). (M) 9. Split 3k-line files in mac_tooling. (M) 10. Remove NODE_TLS_REJECT_UNAUTHORIZED=0 from Docker; add rate-limiting to the Python prototypes. (SM)

P3 — nice to have 11. Portfolio-wide coverage reporting + dependency audit (npm audit/pip-audit) in CI. (M) 12. A lightweight issue/release cadence for the 23 flagships. (S)


6. Safe Auto-Fix Candidates

(Low-risk; listed only — not applied. Each needs your approval.)

  • Ecosystem compose healthchecks — add healthcheck: to each backend/web service in docker-compose.ecosystem.yml. Safe: additive.
  • Add upstreams for learning_pytorch_todo_predictor and learning_sidecar_setup (git remote add origin … && git push -u). Safe once remote exists.
  • Lint rule to ban print() in learning_ai_2nd_brain (ruff T20) — flags only; you fix incrementally.
  • Drop NODE_TLS_REJECT_UNAUTHORIZED=0 from Docker envs where a real CA/host override is available. (Verify per service first.)
  • .gitignore audit for the few repos still tracking runtime artifacts (pattern already fixed in devops_tools this session).

7. Delegate-to-Agent Queue

Ready-to-paste briefs (each self-contained, one branch, clean-tree rule):

  1. "Add healthchecks to ecosystem compose" — repo common_plat; read docker-compose.ecosystem.yml; add healthcheck + ordered depends_on to all *-backend/*-web services; docker compose config must pass; no app code changes.
  2. "De-print() 2nd_brain" — repo learning_ai_2nd_brain; replace print() with typer.echo/logging in src/brain/**; keep behavior identical; run pytest.
  3. "Bootstrap tests for webui_copilot" — repo learning_ai_webui_copilot; add pytest smoke tests for site_backend rules/policy engines + a copilot happy-path; wire a .github/gitea CI job.
  4. "Service audit: magic_clipboard_mgr" — repo learning_ai_magic_clipboard_mgr; produce a report of which of the 50+ services are wired vs stubbed; no code changes.
  5. "Stabilize E2E" — repos fastgap/flowmonk; make smoke E2E reliable, then remove continue-on-error: true for that job only.
  1. One task = one branch off latest origin/main; never work on a dirty tree.
  2. Scope it with a job brief (you already do this in agent-queue/docs/jobs/).
  3. Test before commit: typecheck + lint + unit must pass locally.
  4. Commit small, conventional messages; push the branch, open a PR — don't let agents push straight to main of the shared platform.
  5. Never track runtime/queue state (ignore agent-queue/queue/* lifecycle — fixed here this session).
  6. Prefer least-privilege over --permission-mode dangerous; reserve dangerous mode for sandboxed/disposable checkouts.
  7. Weekly sync sweep: rebase+push all clean repos, list dirty ones for review.

9. What I Could Not Inspect

  • No dynamic results. I did not run npm/pnpm install, builds, pytest, vitest, Playwright, cargo test, or docker compose up (those mutate trees / need services). Test counts and CI configs are evidence of intended coverage, not measured pass/coverage.
  • No live git per-repo ahead/behind inside the read-only agents (they lacked shell git); branch/dirty facts come from the orchestrator's own checks and may have shifted as the agent-queue daemon ran.
  • One agent batch misfired: it reported 5 repos as "missing" (claude_code_setup, github_copilot, magic_terminal, notif_scanr, python_sandbox) due to a read-access issue; I re-scanned them directly — they exist (notably magic_terminal = Rust, notif_scanr = Swift).
  • Mobile/native depth (iOS/Android/KMP/Tauri runtime behavior) and secret values were not executed/decrypted — only presence/format was checked.
  • .env.ecosystem holds dev-only values; production secret management (Key Vault wiring) was inferred from references, not verified live.

TL;DR

  • Coherent beta-grade product ecosystem (~38 repos) — far beyond "learning".
  • Architecture & security are strong; CI & testing are the weak links.
  • P0: restore a CI gate on common_plat, clean the ~14 dirty repos, and rein in the dangerous-mode agent-queue.
  • A handful of flagships (notes, trails, claw-*, clock, fastgap) are genuinely production-grade; the long tail is MVP/prototype.
  • Tighten the agent commit/CI loop (§8) and most of the operational churn converts back into velocity.