The platform-service build was failing with 3 unrelated TS errors,
surfaced while running the Gitea outdated-package detector earlier
in this session:
src/server.ts(18,8): Cannot find module '@bytelyst/devops/server'
src/server.ts(318,61): Property 'cosmosEndpoint' does not exist on type 'ProductIdentity'
src/server.ts(321,42): Property 'platformServiceUrl' does not exist on type 'ProductIdentity'
Root causes (two distinct bugs):
1. Stale install. '@bytelyst/devops' was already declared as
'workspace:*' in services/platform-service/package.json (line 24),
but node_modules/@bytelyst/devops/ did not exist. Re-running
'pnpm install' at the workspace root materialised the symlink.
2. Variable shadowing. In the GET /devops/info handler the code
declared a local 'const config' from loadProductIdentity() that
shadowed the module-level 'config' (env vars) imported from
'./lib/config.js' at line 112. The author then tried to read
'config.cosmosEndpoint' and 'config.platformServiceUrl' off the
ProductIdentity, where those keys never exist:
ProductIdentity = {
productId, displayName, licensePrefix, configDirName,
envVarPrefix, bundleIdSuffix, packageName
}
The intended values live on the env config:
config.COSMOS_ENDPOINT (Zod-validated, required at boot)
config.HOST + config.PORT (defaults '0.0.0.0' / 4003)
There is no 'platformServiceUrl' field anywhere in the codebase —
it only appeared in this single buggy line. Reconstructed as
'\${HOST}:\${PORT}' which is the URL admins would use to reach
this service for the devops/info diagnostic dashboard.
Fix (services/platform-service/src/server.ts:310-339):
- Rename local 'const config' to 'const productIdentity' to break
the shadowing.
- Use productIdentity.productId for the devops productId field.
- Use config.COSMOS_ENDPOINT (the env config) for the cosmos
dependency health check URL.
- Use `http://${config.HOST}:${config.PORT}` for the extra
platformServiceUrl field.
- Add a doc comment block explaining the two-config distinction
so future contributors don't reintroduce the shadow.
Verified:
pnpm --filter @lysnrai/platform-service build OK (0 errors)
pnpm --filter @lysnrai/platform-service test 1511/1512 pass
The 1 remaining failure (src/modules/products/cache.test.ts line 104,
'returns all cached products' expects 2 products but got 3) is a
PRE-EXISTING product-registry test drift on main, verified by
stashing this commit's changes and re-running the same test against
the unmodified tree. It will be addressed separately.
- Add @bytelyst/devops backend endpoints to platform-service
- Add /api/devops/version (public) and /api/devops/info (admin) endpoints
- Add /devops page to admin-web using @bytelyst/devops/ui DevopsPanel
- Add devops link to admin web sidebar navigation
- Add build metadata and runtime information display
- Follow trading web devops pattern
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
What changed:
- Remove nomgap-web from the ecosystem Docker stack now that web is Vercel-hosted.
- Add a TODO for deciding whether local Docker smoke tests still need a NomGap web service.
- Update NomGap product containers and feature flags.
- Seed the NomGap push trigger flag without duplicating the common encryption flag.
Safety notes:
- Dropped unrelated pnpm-lock.yaml formatting churn instead of committing it.
Verification:
- node JSON.parse products/nomgap/product.json
- ruby Psych.safe_load docker-compose.ecosystem.yml
- pnpm --filter @bytelyst/admin-web typecheck
- pnpm --filter @bytelyst/admin-web test
- pnpm --filter @bytelyst/admin-web exec eslint . --ext .ts,.tsx
- pnpm --filter @lysnrai/platform-service build
- pnpm --filter @lysnrai/platform-service test
- pnpm --filter @lysnrai/platform-service exec eslint . --ext .ts,.tsx
- pnpm typecheck
- pnpm lint
Three coordinated fixes so 'docker compose up cosmos-emulator platform-service
cowork-service --wait' completes end-to-end (pre-existing blocker surfaced by
W1 post-push review).
1. Remove harmful prepare:tsc from @bytelyst/react-native-platform-sdk
package.json. The hook fires during pnpm install --frozen-lockfile against
an empty src/ tree (because Dockerfiles COPY package.jsons before
sources), tsc aborts, install fails. Canonical monorepo build flow is
pnpm -r build using the existing build:tsc script; prepare only runs for
git+ URL installs (which this published package doesn't use), so removing
it is lossless.
2. Add --ignore-scripts to platform-service + mcp-server Dockerfile install
steps. Mirrors the pattern already used by extraction-service/Dockerfile,
dashboards/admin-web/Dockerfile, dashboards/tracker-web/Dockerfile.
Belt-and-braces against future prepare-hook regressions in any workspace
package.
3. Expand .dockerignore node_modules/dist/.next/coverage to **/ globs.
Docker's .dockerignore with bare 'node_modules' only matches root-level;
nested packages/*/node_modules/ were being COPY'd into images, poisoning
them with host-absolute-path .bin shims (e.g. @bytelyst/storage's tsc
shim resolved to /learning_voice_ai_agent/node_modules/.pnpm/... which
doesn't exist in the container → MODULE_NOT_FOUND). The glob fix makes
COPY packages/ packages/ deliver source only.
Gap: INFRA-gap-02
Verified:
pnpm install --frozen-lockfile ✅
pnpm --filter @bytelyst/react-native-platform-sdk build ✅
pnpm --filter @bytelyst/react-native-platform-sdk typecheck ✅
docker compose build platform-service ✅ (previously failed)
docker compose build mcp-server ✅
docker compose build extraction-service ✅
Baseline origin/main pnpm -r lint failed with 90+ errors across
platform-service, extraction-service, and tracker-web. These block the
shared W1 quality gates (prompts/README.md §4) which require all of
typecheck + lint + build + test to be green before committing W1 infra
work. Fixes are strictly scoped to unblock gates:
- eslint.config.js: extend @typescript-eslint/no-unused-vars with
varsIgnorePattern / caughtErrorsIgnorePattern / destructuredArrayIgnorePattern
all honouring the existing `^_` convention already used for args.
- platform-service: add file-level eslint-disable for
@typescript-eslint/no-unused-vars, no-redeclare, no-useless-escape on
the 33 legacy files failing lint (ab-testing, ai-diagnostics,
diagnostics, predictive-analytics, broadcasts/types, surveys/types,
lib/push-notifications).
- extraction-service tests: drop unused vitest imports (beforeEach,
afterEach, HealthCheck).
- tracker-web tracker-proxy.test.ts: prefix unused url with _.
- Applied eslint --fix on platform-service which normalised a handful
of `let` → `const` and removed one redundant disable comment.
Scope creep vs W1 "Files You Own" is acknowledged — user explicitly
approved this path when baseline rot was surfaced.
Verified: pnpm -r typecheck, lint, build, test all green.
Root cause: tinypool worker teardown calls kill() which returns EPERM
in the act_runner host environment on Node.js v25.2.1. Tests pass but
the vitest process crashes during cleanup, causing CI failure.
Fix: --pool forks CLI flag on every package/service test script, plus
pool: 'forks' in all vitest.config.ts files. This uses child_process.fork()
worker management which handles termination cleanly.
60 package.json files updated, 10 vitest.config.ts files updated.
platform-service had 16/60, extraction-service had 14/60, mcp-server had 34/60.
All three now list all 57 packages + 4 services + 2 dashboards + scripts.
Required for pnpm install --frozen-lockfile to resolve the full workspace.
- exports/routes: exclude inline data from GET /exports list response
to prevent returning megabytes of serialized export data (perf+security)
- Update WORKSPACE_TODO_AUDIT.md: add post-audit review section with
9 bugs found and fixed across 2 commits (73b07c2, 841cdf3), mark
all action plan sprints complete
- Typecheck clean, 1483/1483 tests pass
- diagnostics/subscribers: wire session.created email notification to
target user using existing 'diagnostics-session-created' template
(was just logging instead of sending the email)
- events/types: add missing 'currency' field to payment.failed schema
(payment.succeeded had it, payment.failed did not — inconsistency)
- delivery/subscribers: use event.payload.currency instead of hardcoded
empty string in payment-failed email variables
- Typecheck clean, 1483/1483 tests pass
- diagnostics/subscribers: use correct template IDs
'diagnostics-session-cancelled' and 'diagnostics-session-completed'
instead of non-existent 'generic' (would throw at runtime)
- delivery/templates: add missing 'broadcast' email template used by
broadcast delivery route (dispatchEmail would throw on unknown ID)
- broadcasts/routes: replace broken dot-path 'metrics.sent' update
with proper updateBroadcastMetrics() call, add productName variable
- exports/routes: store serialized data on job doc, add download
endpoint GET /exports/:id/download with content-type headers,
exclude data payload from metadata GET endpoint
- waitlist/routes: store invitation doc ID (inv_...) instead of
code string (WL-...) in invitationCodeId field
- delivery/delivery.test.ts: update template count 12 -> 13
- Typecheck clean, 1483/1483 tests pass
- delivery/subscribers: welcome email used raw productId as productName,
now uses resolveProductName() for proper display name
- delivery/subscribers: remove redundant String(daysLeft) in trial_expiring
- surveys/routes: incentiveClaimed was set outside if(sub) block, marking
response as claimed even when user has no subscription. Moved inside
if(sub) so claims are only recorded when incentive is actually granted
Backend (delivery retry):
- Use NotFoundError (404) instead of BadRequestError (400) for missing log doc
- Add telegram + slack retry support (was email-only, threw error for others)
Frontend (delivery page):
- Add pk field to DeliveryEntry interface
- Pass pk query param in retry call so backend can look up the doc
- Fix handleRetry to accept full entry object instead of just id
Frontend (webhooks page):
- Parallelize delivery fetches with Promise.allSettled (was sequential for loop)
- Significant page load improvement for subscriptions with many deliveries
Phase 0 from DASHBOARD_UI_COVERAGE_ROADMAP:
- Register ai-diagnostics routes in server.ts (671-line module was never mounted)
- Add 6 hidden pages to admin sidebar-nav.tsx:
Debug Sessions, Health Dashboard, Extraction, Experiments,
Predictive, AI Diagnostics
- /users was already in sidebar (no change needed)
- Kill switch verified: already per-product via productId query param
- Admin sidebar now has 33 items (was 27)
Aligns service-local vitest.config.ts with root config so tests pass
both via 'pnpm test' (uses service config) and 'npx vitest run' (uses root).
Fixes telemetry.test.ts which fails because its import chain eagerly
loads config.ts → envSchema.parse() requiring COSMOS_ENDPOINT/KEY/JWT_SECRET.
Added: RATE_LIMIT_STORE_MODE=memory, COSMOS_ENDPOINT, COSMOS_KEY, JWT_SECRET
(all test-safe placeholders, never used at runtime with DB_PROVIDER=memory)
EventSource API cannot set custom headers, so the SSE /flags/stream
endpoint and feature-flag-client were broken for streaming mode:
- Server: accept productId and token from query string as fallback
when x-product-id / authorization headers are absent
- Client: pass productId (and optional auth token) as query params
when constructing the EventSource URL
- repository.ts: update() and remove() now require productId as partition key
(was passing 'id' as both params — works with memory provider but fails on Cosmos DB)
- repository.ts: updateSegment() and removeSegment() also fixed
- routes.ts: all repo.update/remove calls updated to pass productId
- routes.ts: audit 'before' snapshots now use JSON deep copy instead of shallow spread
(prevents nested object mutation from corrupting audit trail)
- routes.ts: kill switch audit now uses repo.update() return value for 'after' snapshot
- evaluator.ts: anonymous users (no userId) with partial percentage (0 < pct < 100)
now correctly return 'off' instead of falling through to default variation
(can't deterministically hash without a userId)
- Auto-triage previously always set status to 'triaged', even for cases
already in in_progress, escalated, or other later states
- Now only transitions to 'triaged' if case is still in 'open' state
- Cases in later states keep their current status (only priority + tags update)
- Added regression test for in_progress case
- 10 support-cases tests passing
- Previous filter only checked e.recordedAt < currentPeriodStart
- Now also checks e.recordedAt >= prevPeriodStart (lower bound)
- Prevents entries from periods before the previous one from inflating
the spent amount, which would reduce the rollover incorrectly
- 12 ai-budgets tests passing
- supportCase.tags is optional in SupportCaseDoc schema
- Spreading undefined throws TypeError at runtime
- Fixed both [...supportCase.tags] and .includes() call with ?? [] fallback
- Added regression test for undefined tags case
- 9 support-cases tests passing
- Run history: GET /agent-evals/suites/:id/runs with limit param
- Regression comparison: GET /agent-evals/suites/:id/regression
- Detects 5%+ score drop between consecutive runs
- Returns latest vs previous comparison + trend data
- Release gate check: GET /agent-evals/suites/:id/gate
- Checks if latest release-gate run passed threshold
- Agent compliance report: GET /agent-evals/agents/:agentId/report
- Aggregates pass rate, avg score, suite counts, recent runs
- Eval scheduling: POST /agent-evals/suites/:id/schedule
- Wires eval suite to job runner with cron expression
- New repo functions: listRunsBySuite, listRunsByAgent
- 1,324 tests passing (8 new)