bytelyst-devops-tools/docs/prompts/engineering-review-scorecard.md
saravanakumardb1 92479113d0 docs(prompts): add engineering review & scorecard master prompt
Reusable evidence-based review prompt covering repos, code, architecture,
DevOps, testing, security, product-readiness, and AI-agent practices, with
a 1-10 scorecard and prioritized action plan output.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2026-05-30 20:29:49 -07:00

251 lines
9.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Engineering Review & Scorecard — Master Prompt
> Reusable, copy/paste prompt for a deep, evidence-based review of an entire
> multi-repo workspace, its code, DevOps posture, and the human + AI-agent
> development practices behind it. Drop this into Claude Code / Codex / Devin /
> Copilot inside your VM or main repo workspace and let it run end-to-end.
>
> Output is a single committed report: `ENGINEERING_REVIEW_SCORECARD.md`.
---
## Prompt
You are acting simultaneously as a **Principal Software Engineer**, a **Staff-level
code reviewer**, a **startup CTO advisor**, and a **DevOps architect**.
I want a **brutally honest but constructive** review of my entire development
setup: codebase, repositories, engineering practices, deployment practices,
security posture, and product-readiness.
Do **not** give generic advice. Inspect the **actual** repos, files, scripts,
configs, commits, docs, Docker setup, CI/CD, tests, logs, dependencies, and
deployment structure before forming any opinion.
### My context
I am building multiple AI / productivity / startup apps and I use AI coding
agents heavily. I want to know:
1. What is good?
2. What is broken?
3. What is risky?
4. What is slowing me down?
5. What should be fixed first?
6. What practices should I adopt to become more reliable, faster, and production-ready?
7. What work can be delegated to AI agents immediately?
### Rules of engagement
- Be direct, specific, and evidence-based. Do **not** flatter me.
- Do **not** make assumptions without checking files. If you cannot inspect
something, say exactly what was missing and why.
- Always cite **file paths, repo names, the commands you ran, and concrete
examples** (short snippets, not walls of code).
- Do **not** make destructive changes. Do **not** commit, push, delete, or
rewrite history. For now, analyze and produce a report only.
- If you find quick, low-risk fixes, list them separately as
**"Safe Auto-Fix Candidates"** with the exact change and the file — but do not
apply them unless I explicitly ask.
- Prefer reading over running. Only run the read-only / non-destructive commands
below. Never run anything that mutates state, deletes data, or pushes.
### Scope & discovery
Inspect all accessible repos/projects under the current workspace and likely
project folders. First discover what exists:
```bash
pwd
find ~ -maxdepth 4 -name ".git" -type d 2>/dev/null | sed 's#/.git##' | sort
find ~ -maxdepth 4 \( -name "package.json" -o -name "pyproject.toml" \
-o -name "requirements.txt" -o -name "Dockerfile" \
-o -name "docker-compose.yml" -o -name "compose.yml" \) 2>/dev/null | sort
```
Common roots to check (skip any that don't exist):
`~/repos`, `~/projects`, `~/apps`, `~/workspace`, `~/code`, `~/dev`,
`~/bytelyst`, note-based project folders, and the current directory + subdirs.
Then **group repos by product / app** so the review is organized by product, not
just by folder.
### Review dimensions
**A. Repository organization** — clear naming; active vs abandoned repos obvious;
docs present; clear README; consistent folder structure; duplicate/fragmented
versions; safe env-file handling; understandable local scripts.
**B. Code quality** — TypeScript/Python/Node quality; modularity; error handling;
logging; naming; dead code; over/under-engineering; security-sensitive code;
duplication; hardcoded values; poor abstractions; AI-generated code smell.
**C. Architecture** — clarity; clean frontend/backend/database boundaries;
consistent APIs; safe authentication; authorization / RLS / tenant isolation;
reliable background jobs; understandable agent workflows; cleanly isolated
integrations; product domains not incorrectly mixed.
**D. DevOps & deployment** — Dockerfile & compose quality; port conflicts; health
checks; restart policies; reverse-proxy (nginx) readiness; SSL/certbot; secrets
management; logging/monitoring; backups; DB migration strategy; CI/CD readiness;
rollback strategy; dev/stage/prod separation.
**E. Testing** — unit / integration / E2E / API / smoke tests; build checks;
lint/typecheck; test reliability; coverage gaps; recommended minimum test suite
per repo.
**F. Security** — committed secrets; `.env` exposure; auth weaknesses; API route
vulns; missing validation; dependency vulns; over-permissive CORS; unsafe file
upload; unsafe shell execution; missing rate limits; missing audit logs;
dangerous agent permissions; data-privacy issues.
**G. Product readiness** — can a user complete a flow end-to-end? core flows
working? clear landing pages? stable onboarding/auth; user-friendly errors;
broken screens; unfinished features; what blocks launch.
**H. AI-agent development practices** — am I using agents effectively? prompts too
vague? agents committing too much at once? roadmaps/checklists maintained?
incremental changes? tests run before commits? agents documenting work? repo
drift/duplication caused by agents? guardrails to add; the standard
prompt/process I should use for every agent task.
**I. Personal engineering workflow** — branching; commit quality; README/roadmap
discipline; issue tracking; release discipline; documentation quality; local
setup reliability; context files for AI agents; repo cleanup needs; backup
strategy; prioritization.
### Commands to run where applicable (read-only / non-destructive)
For Node / TypeScript repos:
```bash
npm install --ignore-scripts || true
npm run lint || true
npm run typecheck || true
npm run build || true
npm test || true
npm audit --audit-level=moderate || true
```
For Python repos:
```bash
python --version || true
pip --version || true
python -m compileall . || true
pytest || true
pip-audit || true
```
For Docker repos:
```bash
docker compose config || true
docker compose ps || true
docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}" || true
```
For Git / repo health:
```bash
git status --short || true
git log --oneline -10 || true
git branch --show-current || true
git remote -v || true
```
For secret scanning (read-only grep):
```bash
grep -RIn --exclude-dir=node_modules --exclude-dir=.git \
--exclude-dir=dist --exclude-dir=build \
-E "OPENAI_API_KEY|ANTHROPIC_API_KEY|GOOGLE_API_KEY|AWS_ACCESS_KEY|AWS_SECRET|SUPABASE_SERVICE_ROLE|PRIVATE_KEY|PASSWORD|SECRET|TOKEN" . || true
```
> Note: a grep hit is a *candidate*, not proof. Confirm whether each match is a
> real committed secret, a placeholder, or a variable name before reporting it.
### Required output
Create a single report named **`ENGINEERING_REVIEW_SCORECARD.md`** with the
following sections, in order.
#### 1. Executive Summary
A direct, high-level opinion:
- Overall maturity.
- Biggest strengths (top 3).
- Biggest risks (top 3).
- Is this **prototype**, **MVP**, **beta**, or **production** quality? Justify it.
- Is the current repo/development style **helping or hurting velocity**? Why?
#### 2. Overall Score Sheet
Score each category **110** (1 = critical/broken, 10 = excellent/production-grade).
Show the evidence behind each score in one line.
| Category | Score (110) | Justification (evidence) |
|---|---|---|
| A. Repository organization | | |
| B. Code quality | | |
| C. Architecture | | |
| D. DevOps & deployment | | |
| E. Testing | | |
| F. Security | | |
| G. Product readiness | | |
| H. AI-agent practices | | |
| I. Personal workflow | | |
| **Weighted overall** | | |
State the weighting you used for the overall score (e.g. Security and Product
readiness weighted higher), and give a one-paragraph rationale.
#### 3. Per-Product / Per-Repo Breakdown
For each product group: repos involved, stack, what works, what's broken, top
risks, and a maturity label (prototype / MVP / beta / prod).
#### 4. Findings by Dimension (AI)
For each dimension: concrete findings with **file paths + repo names + examples**,
ordered by severity. Separate **facts** (what you observed) from
**recommendations** (what to change).
#### 5. Prioritized Action Plan
A single ranked list across all repos:
- **P0 — Fix now** (security, data loss, launch blockers).
- **P1 — This week.**
- **P2 — This month.**
- **P3 — Nice to have.**
Each item: what, why it matters, rough effort (S/M/L), and which repo/file.
#### 6. Safe Auto-Fix Candidates
Low-risk changes you could make immediately *if I approve* — with the exact file,
the exact change, and why it's safe. Do not apply them.
#### 7. Delegate-to-Agent Queue
Tasks ready to hand to an AI agent right now. For each: a tight, self-contained
task brief (repo, files to read first, objective, constraints, definition of
done) so I can paste it straight into an agent.
#### 8. Recommended Standard Operating Procedure
The repeatable process + guardrails I should adopt for every future AI-agent task
(branching, scoping, test-before-commit, documentation, review gates).
#### 9. What You Could Not Inspect
Explicitly list anything inaccessible, skipped, or assumed, and what I'd need to
provide for a complete review.
---
### Final instruction
Work methodically: discover → group → inspect → score → recommend. When you are
done, print the path to `ENGINEERING_REVIEW_SCORECARD.md` and a 5-bullet TL;DR.
Do not commit or push it — leave it for me to review.