- Fleet overview page with factory cards + recent jobs polling
- Job table with stage filter tabs
- Job detail page with events timeline, runs, artifacts, DAG subtree, SHIP action
- Budget page with usage bar, pause/resume controls
- API proxy route forwarding /api/fleet/* to platform-service
- Typed fleet-client.ts with graceful 404 degradation
- 16 unit tests for fleet-client (198 total tracker-web tests green)
- Added Fleet nav item to dashboard layout
- Full monorepo build + test green
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Caddy was binding 0.0.0.0:443, which prevented tailscaled from claiming
100.87.53.10:443 for `tailscale serve --https=443`. Restricting Caddy to
the public eth0 IP (187.124.159.82) keeps the public api.bytelyst.com /
devops.bytelyst.com routing intact while freeing the Tailscale IP so the
tailnet-only dashboard URL (https://srv1491630.tailf85608.ts.net) is
reachable again.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
docker-compose:
- Drop the cosmos-emulator service block. Both image variants we
tried were unfit for the prototype: `:vnext-preview` returned
plain-text PGCosmosError strings that crashed @azure/cosmos at
JSON.parse, and `:latest` core-dumped under load. The container
has been Exited(255) for weeks and was blocking depends_on chains.
- Real Azure Cosmos account `cosmos-mywisprai` (db `bytelyst`,
West US 2) is now the single source of truth; all services pick
up COSMOS_ENDPOINT/KEY/DATABASE from `.env` (already mounted via
`env_file: .env`).
- extraction-service: drop hardcoded `COSMOS_ENDPOINT=…cosmos-emulator…`,
`NODE_TLS_REJECT_UNAUTHORIZED=0`, and `depends_on: cosmos-emulator`.
- cowork-service: same cleanup.
cowork-service IPC bridge:
- Add `error` listeners to the spawned child's stdin/stdout/stderr.
Without them, an EPIPE on stdin (child died mid-write) or a
teardown-time stream error surfaced as an unhandled error and
crashed vitest after all 140 tests had passed.
- Removes the only failing recursive test in the workspace.
Test status after this commit:
- 94 workspace packages, all green
- cowork-service: 19 passed | 1 skipped (140 tests)
- platform-service: 131 test files passed
- extraction-service: 13 test files passed
- All other packages: passing
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
The vnext-preview (Postgres-backed) image returned PGCosmosError plaintext
for cross-partition queryFeed calls, crashing @azure/cosmos at JSON.parse.
:latest is HTTPS-only with a self-signed cert, so consumers are gated by
NODE_TLS_REJECT_UNAUTHORIZED=0 (dev-prototype only). platform-service now
points at the real Azure Cosmos account (per .env), so its dependency on
the local emulator service is removed.
In-process tracker<->fleet bridge — no shell hop. Closes the §10 "direct
tracker->module calls" box.
- tracker-bridge.ts (new):
* ingestItemAsJob(productId, itemId, opts?) — reads the Item via the items
repository (foreign/unknown → NotFoundError), maps title/description → bodyMd
(verbatim) + labels (engine-class:/profile:/priority:/cap:) → manifest hints,
sets trackerItemId + a stable idempotency-key `tracker-<itemId>`, and submits
through coordinator.submitJob — so re-ingest dedupes and the job is scheduled by
the §7 router via the unchanged claim path.
* echoJobToItem(productId, jobId, log?) — mirrors stage → Item status
(queued/assigned/building/review/testing → in_progress; shipped → done;
failed/dead_letter → wont_fix) + a metrics-ONLY comment (attempts/duration/
tokens/cost — never the prompt body/secrets). Idempotent via the job's
`trackerEchoedStatus`; best-effort + non-fatal (items-write failure →
{ echoed: null, error }, never thrown into the job lifecycle). productId-scoped.
- Auto-echo wired into the PATCH + lease/release transitions, GATED by
FLEET_TRACKER_ECHO (default OFF → behavior byte-for-byte unchanged); never blocks
or fails the transition.
- Routes (additive): POST /fleet/tracker/ingest, POST /fleet/tracker/echo
(auth + getRequestProductId, productId-scoped).
- types.ts: optional FleetJobDoc.trackerEchoedStatus (reuses the existing
trackerItemId field; no parallel schema) + Ingest/Echo request schemas.
- repository.ts: setTrackerEchoedStatus (no rev bump — never interferes with the
fenced claim CAS).
Reuses the items + comments contracts directly (no HTTP). Does not touch
claimNextJob or the scheduler. productId on every doc; no any/console.log.
Some keyboard events (dead keys, modifier-only presses) have e.key
as undefined. Similarly, malformed shortcut definitions may lack a key.
Added early-return guards to prevent TypeError.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- packages/llm: use nullish coalescing (??) in GeminiProvider constructor
so explicit empty-string apiKey is not overridden by env var
- dashboards/admin-web,tracker-web: exclude .next/ from vitest test glob
to prevent Next.js internal test files from being picked up
- services/cowork-service: use platform-safe .kill() instead of SIGTERM
which is invalid on Windows
- packages/use-keyboard-shortcuts: add @testing-library/react devDep
- scripts/npmrc.template: use https:// for Gitea registry
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds factory enrollment + a scoped, rotatable credential model for the fleet
coordinator (trust boundary, §12/§18). Tokens are stored HASHED at rest (sha256 —
the same primitive the auth module uses for verify/magic-link tokens); the
high-entropy plaintext is returned exactly once at enroll/rotate and never persisted.
- enrollment.ts: enrollFactory (create/link factory + issue token), rotateToken
(new active token; prior marked `rotating` with a grace overlap so an in-flight
worker isn't cut off), revokeToken (immediate), verifyToken (constant-time hash
compare; revoked/expired-grace → null; updates lastUsedAt). Scope = {productId,
factoryId, capabilities[]}.
- Gated enforcement: enforceFactoryToken() on POST /fleet/factories/heartbeat and
POST /fleet/claim, active only when FLEET_REQUIRE_FACTORY_TOKEN is on (default
OFF — existing behavior/tests unchanged). When on: missing/invalid/revoked → 401;
out-of-scope productId/capability/factory → 403; and the claim is CONSTRAINED to
the verified token scope. Does not touch scheduler scoring or the claim CAS.
- types.ts: FleetFactoryTokenDoc + Enroll/Rotate/Revoke request schemas.
- repository.ts: fleet_factory_tokens collection + CRUD + findByHash.
- routes.ts (additive): POST /fleet/factories/enroll, /:id/token/rotate,
/:id/token/revoke (user auth + productId + Zod).
- cosmos-init.ts: register fleet_factory_tokens (/productId).
Also hardens the artifact routes (review fixes): listArtifactsByJob is now
productId-scoped (GET /fleet/jobs/:id/artifacts threads the request productId), and
artifact upload uses the request/auth productId authoritatively (a spoofed
body.productId no longer overrides it).
Tokens hashed at rest; plaintext shown once; no new crypto schemes; productId on
every doc; no any/console.log; enforcement default OFF.
The Gitea instance runs on default port 80, not 3300. Updated the
npmrc template and AGENTS.md references accordingly.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Include Gitea npm registry variables (token, host, owner) so
developers know which env vars to set for @bytelyst package access.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The foundation's revUpdateJob/revUpdateLease did a read -> rev-check -> write with
await points between them, so two CONCURRENT claims could both read the same rev,
both pass the check, and both write — a double-assignment the old (sequential) race
test could not catch.
Rewire revUpdateJob/revUpdateLease to delegate to the datastore's updateIfMatch,
which performs the compare and the write as one indivisible operation (Cosmos
If-Match; synchronous compare-set on memory). The coordinator's tryClaimJob keeps
identical external behavior (ok/conflict) but is now genuinely single-winner.
Upgrades the coordinator tests to prove atomicity under TRUE concurrency:
- two contenders via Promise.all -> exactly one ok, one conflict; assigned once;
one run; one lease; leaseEpoch 1.
- N-claimer (15) stress via Promise.all -> one ok, N-1 conflicts, no double-assignment.
- N concurrent claimNextJob for one job -> exactly one non-null claim.
- N concurrent lease renewals -> exactly one wins.
Verified these concurrent tests FAIL against the old read-check-write (double-assign)
and pass after the fix.
Adds an additive, backward-compatible conditional write to the datastore abstraction
so consumers can do true single-winner compare-and-swap:
updateIfMatch(id, partitionKey, expected: { etag?, rev? }, patch)
-> { ok: true, doc } | { ok: false, reason: 'conflict' | 'not_found' }
- types: ConcurrencyToken + UpdateIfMatchResult; optional _etag on BaseDocument
(provider-managed, surfaced on reads); new method on DocumentCollection; exported.
- memory provider: get -> compare -> set in ONE synchronous block (no await/yield),
so two concurrent callers cannot interleave under the single-threaded event loop —
the first wins and bumps rev + _etag, the rest get conflict. True in-process atomicity.
- cosmos provider: conditional replace with accessCondition { type: 'IfMatch',
condition: _etag }; translates Cosmos 412 -> conflict, 404 -> not_found; also
compares/bumps rev for parity.
Existing method signatures are unchanged (additive only). Tests: memory match/stale/
missing + an N-concurrent Promise.all atomicity proof; cosmos If-Match mapping via a
fake @azure/cosmos (match writes, stale etag -> 412 conflict, missing -> not_found).
Foundation review: 50 fleet tests green, build clean, no regressions. NOTE: the
atomic-claim uses an in-module rev-CAS over an unconditional datastore write —
true cross-process atomicity requires an If-Match/_etag-conditional update in
@bytelyst/datastore (tracked P0 hardening follow-up before P2-S3).
Guarded REST under /api (auth + productId, like items): POST /fleet/jobs (idempotent
submit), GET /fleet/jobs (by stage/idempotencyKey), GET /fleet/jobs/:id, PATCH
/fleet/jobs/:id (fenced transition), POST /fleet/claim, lease renew/release,
factories/heartbeat, and runs/events streams. Every body validated with the Zod
schemas; fenced/conflict map to 409, missing to 404, invalid to 400. Registers
fleetRoutes in server.ts next to itemRoutes. Routes tested via Fastify inject on
the memory provider (real coordinator).
The concurrency core (§4/§7/§8/§18/§25):
- claimNextJob: priority+age selection over queued/dep-satisfied jobs whose caps
are a subset of the factory's, then tryClaimJob does a rev CAS to flip to
assigned + acquire the lease — exactly one contender wins, no double-assignment.
- leases + fencing: acquire/reclaim bumps leaseEpoch; patchJobFenced/renew/release
reject a call whose leaseEpoch < job.leaseEpoch (zombie worker can't overwrite).
- heartbeat + isFactoryStale for factory liveness.
- reapExpiredLeases: returns expired-lease jobs to queued/blocked, bumps the epoch
(fencing the dead holder), preserves the checkpoint pointer (resume), marks the
lease expired; idempotent. Documents why Cosmos TTL cannot do this.
- submit: idempotent (dedup/supersede/409) + submit-time dependency cycle
detection; deps gating (shipped, or testing when depsMode:soft).
Tests drive the atomic-claim race, fencing, and reaper deterministically via the
rev CAS (no real threads).
One repository per fleet_* container on the @bytelyst/datastore abstraction
(memory + cosmos): create/getById/list (by productId, stage, idempotencyKey),
partition-aware single-partition queries, ordered append-only appendEvent, and
runs/leases/factories/profiles/artifacts CRUD. Adds revUpdateJob/revUpdateLease —
a `rev`-token compare-and-swap that writes only when the stored rev still matches
(the optimistic-concurrency primitive for atomic claim + fenced transitions;
maps to Cosmos _etag/If-Match in production).
Adds the agent-gigafactory fleet data model (modules/fleet/types.ts): Zod schemas
as the source of truth with inferred types (no `any`) for the 7 durable containers
— FleetJobDoc, FleetRunDoc, FleetLeaseDoc, FleetFactoryDoc, FleetProfileDoc,
FleetEventDoc, FleetArtifactDoc — each carrying productId. Lifecycle stages mirror
the agent-queue gigafactory spec (queued|blocked|assigned|building|review|testing|
shipped|failed|dead_letter). Registers fleet_* containers with their partition keys
(/productId for jobs/factories/profiles, /jobId for runs/leases/events/artifacts).
deploy.resources.limits.memory applied per roadmap table.
Limits derived from 2-day RSS baseline (2026-05-27-29).
Takes effect on next docker compose up — no running containers affected.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- UX-6 system banners DEFERRED: platform-service (:4003) is unreachable in this
environment, so there is no real broadcasts/maintenance feed to surface.
Per the wave's explicit condition, banners are not added against an empty feed.
Recorded in the waves list + Deferrals table with a follow-up.
- CC.1-CC.6 ticked: suite/build green every wave; dark-mode parity via the bridge;
zero new color literals; a11y labels on all new controls; charts/palette/motion
code-split via next/dynamic (chart chunk ~3.8 KB gzip); size:check has no
bundlesize config in-repo so gzip sizes recorded inline (follow-up logged).
- Add token-bridge guard test (CC.2/CC.3): asserts every --bl-* maps to an admin
var that flips under .dark and that the bridge contains no raw color literals.
Verify: typecheck+lint+build green (123 routes); vitest 22 files / 183 tests;
format:check no new failures (29 pre-existing); e2e 11 passed / 80 failed
(unchanged vs UX-1 baseline — environmental, no backend).
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
- @bytelyst/motion added workspace:* (importer-only lockfile change;
--frozen-lockfile clean).
- Dashboard overview only: KPI cards grid wrapped in StaggerList (from up,
50ms stagger); the Model-Usage / Recent-Users table row wrapped in Reveal.
- Primitives honor prefers-reduced-motion and resolve to opacity 1, so no
element is stranded transparent (no contrast/a11y regression); prefersReduced
is SSR-safe. Motion is confined to the auth-gated dashboard, not the public
e2e surfaces, per tracker-web's axe/opacity caution.
- vitest.config: inline @bytelyst/motion + react dedupe for the render test.
Tests: happy-dom asserts Reveal/StaggerList end visible and render all children.
Verify: typecheck+lint+build green (123 routes); vitest 21 files / 170 tests
(+2); format:check no new failures; e2e 11 passed / 80 failed (unchanged vs
UX-1 baseline — environmental).
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
- error.tsx -> ErrorPage (keep telemetry on mount; retry wired to Next reset).
- (dashboard)/loading.tsx -> LoadingSpinner inside the existing skeleton.
- not-found.tsx already used NotFoundPage (confirmed, unchanged).
- dashboard overview page.tsx header -> PageHeader (Refresh as actions; the
subtitle/last-updated line preserved directly below).
Rich detail headers (e.g. users/[id] back-button + plan/status badges) left
bespoke on purpose: PageHeader has no subtitle/badge slot, so forcing it would
regress them (additive-only rule). dashboard-components reads --color-* which
admin maps via @theme inline, so it themes in light + dark.
Verify: typecheck+lint+build green (123 routes); vitest 20 files / 168 tests
(+3 happy-dom chrome render tests); format:check no new failures; e2e 11 passed
/ 80 failed (unchanged vs UX-1 baseline — environmental).
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>