feat(appview): backfill & repo sync — ATB-13 (#54)
* docs: add backfill and repo sync design (ATB-13)
Approved design for gap detection, collection-based repo sync via
existing Indexer handlers, DB-backed progress tracking with resume,
and async admin API for manual backfill triggers.
* docs: add backfill implementation plan (ATB-13)
12-task TDD plan covering DB schema, gap detection, repo sync,
orchestration with progress tracking, firehose integration,
admin API endpoints, and AppContext wiring.
* feat(db): add backfill_progress and backfill_errors tables (ATB-13)
Add two tables to support crash-resilient backfill:
- backfill_progress: tracks job state, DID counts, and resume cursor
- backfill_errors: per-DID error log with FK to backfill_progress
* feat(appview): add backfill configuration fields (ATB-13)
Add three new optional config fields with sensible defaults:
- backfillRateLimit (default 10): max XRPC requests/sec per PDS
- backfillConcurrency (default 10): max DIDs processed concurrently
- backfillCursorMaxAgeHours (default 48): cursor age threshold for CatchUp
Declare env vars in turbo.json so Turbo passes them through to tests.
Update test helpers (app-context.test.ts, test-context.ts) for new fields.
* feat(appview): add getCursorAgeHours to CursorManager (ATB-13)
Add method to calculate cursor age in hours from microsecond Jetstream
timestamps. Used by BackfillManager gap detection to determine if
backfill is needed when cursor is too old.
* feat(appview): add BackfillManager with gap detection (ATB-13)
- Adds BackfillManager class with checkIfNeeded() and getIsRunning()
- BackfillStatus enum: NotNeeded, CatchUp, FullSync
- Gap detection logic: null cursor → FullSync, empty DB → FullSync,
stale cursor (>backfillCursorMaxAgeHours) → CatchUp, fresh → NotNeeded
- Structured JSON logging for all decision paths
- 4 unit tests covering all decision branches
* fix(appview): add DB error handling and fix null guard in BackfillManager (ATB-13)
- Wrap forums DB query in try-catch; return FullSync on error (fail safe)
- Replace destructuring with results[0] so forum is in scope after try block
- Use non-null assertion on getCursorAgeHours since cursor is proven non-null at that point
- Remove redundant null ternary in NotNeeded log payload (ageHours is always a number)
- Add test: returns FullSync when DB query fails (fail safe)
* feat(appview): add syncRepoRecords with event adapter (ATB-13)
* fix(appview): correct event adapter shape and add guard logging in BackfillManager (ATB-13)
* feat(appview): add performBackfill orchestration with progress tracking (ATB-13)
* fix(appview): mark backfill as failed on error, fix type and concurrent mutation (ATB-13)
* fix(appview): resolve TypeScript closure narrowing with const capture (ATB-13)
TypeScript cannot narrow let variables through async closure boundaries.
Replace backfillId! non-null assertions inside batch.map closures with
a const resolvedBackfillId captured immediately after the insert.
* test(appview): add CatchUp path coverage for performBackfill (ATB-13)
Add two tests exercising the Phase 2 (CatchUp) branch:
- Aggregates counts correctly across 2 users × 2 collections × 1 record
- Rejected batch callbacks (backfillErrors insert failure) increment
totalErrors via allSettled rejected branch rather than silently swallowing
Phase 1 mocks now explicitly return empty pages for all 5 forum-owned
collections so counts are isolated to Phase 2 user records.
* feat(appview): add interrupted backfill resume (ATB-13)
- Add checkForInterruptedBackfill() to query backfill_progress for any in_progress row
- Add resumeBackfill() to continue a CatchUp from lastProcessedDid without re-running Phase 1
- Add gt to drizzle-orm imports for the WHERE did > lastProcessedDid predicate
- Cover both methods with 6 new tests (null result, found row, resume counts, no-op complete, isRunning cleanup, concurrency guard)
* feat(appview): integrate backfill check into FirehoseService.start() (ATB-13)
- Add BackfillManager setter/getter to FirehoseService for DI wiring
- Run checkForInterruptedBackfill and resumeBackfill before Jetstream starts
- Fall back to gap detection (checkIfNeeded/performBackfill) when no interrupted backfill
- Expose getIndexer() for BackfillManager wiring in Task 10
- Add 5 Backfill Integration tests covering CatchUp, NotNeeded, resume, no-manager, and getIndexer()
- Add missing handleBoard/handleRole handlers to Indexer mock
* feat(appview): add admin backfill endpoints (ATB-13)
- POST /api/admin/backfill: trigger backfill (202), check if needed (200), or force with ?force=catch_up|full_sync
- GET /api/admin/backfill/:id: fetch progress row with error count
- GET /api/admin/backfill/:id/errors: list per-DID errors for a backfill
- Add backfillManager field to AppContext (null until Task 10 wires it up)
- Add backfillProgress/backfillErrors cleanup to test-context for isolation
- Fix health.test.ts to include backfillManager: null in mock AppContext
- 16 tests covering auth, permissions, 409 conflict, 503 unavailable, 200/202 success cases, 404/400 errors
* feat(appview): wire BackfillManager into AppContext and startup (ATB-13)
* docs: add backfill Bruno collection and update plan (ATB-13)
- Add bruno/AppView API/Admin/ with three .bru files:
- Trigger Backfill (POST /api/admin/backfill, ?force param docs)
- Get Backfill Status (GET /api/admin/backfill/:id)
- Get Backfill Errors (GET /api/admin/backfill/:id/errors)
- Mark ATB-13 complete in docs/atproto-forum-plan.md (Phase 3 entry)
- Resolve "Backfill" item in Key Risks & Open Questions
* fix(appview): address PR review feedback for ATB-13 backfill
Critical fixes:
- Wrap firehose startup backfill block in try-catch so a transient DB error
doesn't crash the entire process; stale firehose data is better than no data
- Bind error in handleReconnect bare catch{} so root cause is never silently lost
- Add isProgrammingError re-throw to per-record catch in syncRepoRecords so
code bugs (TypeError, ReferenceError) surface instead of being counted as data errors
- Add try-catch to checkForInterruptedBackfill; returns null on runtime errors
- Mark interrupted FullSync backfills as failed instead of silently no-oping;
FullSync has no checkpoint to resume from and must be re-triggered
Important fixes:
- Remove yourPriority/targetRolePriority from 403 response (CLAUDE.md: no internal details)
- Add isProgrammingError re-throw to GET /roles and GET /members catch blocks
- Wrap cursor load + checkIfNeeded in try-catch in POST /api/admin/backfill
- Replace parseInt with BigInt regex validation to prevent silent precision loss
- Wrap batch checkpoint updates in separate try-catch so a failed checkpoint
logs a warning but does not abort the entire backfill run
- Add DID to batch failure logs for debuggability
API improvement:
- Surface backfill ID in 202 response via prepareBackfillRow; the progress row
is created synchronously so the ID can be used immediately for status polling
- performBackfill now accepts optional existingRowId to skip duplicate row creation
Tests added:
- resumeBackfill with full_sync type marks row as failed (not completed)
- checkForInterruptedBackfill returns null on DB failure
- syncRepoRecords returns error stats when indexer is not set
- 403 tests for GET /backfill/:id and GET /backfill/:id/errors
- 500 error tests for both GET endpoints
- in_progress status response test for GET /backfill/:id
- Decimal backfill ID rejected (5.9 → 400)
- Invalid ?force falls through to gap detection
- 202 response now asserts id field and correct performBackfill call signature
* fix(backfill): address follow-up review feedback on ATB-13
HIGH priority:
- firehose.ts: add isInitialStart guard to prevent backfill re-running
on Jetstream reconnects; flag cleared before try block so reconnects
are skipped even when the initial backfill throws
- firehose.test.ts: replace stub expect(true).toBe(true) with real
graceful-degradation test; add reconnect guard test
- admin.ts: switch GET /backfill/:id and GET /backfill/:id/errors catch
blocks to handleReadError for consistent error classification
Medium priority:
- route-errors.ts: tighten safeParseJsonBody catch to re-throw anything
that is not a SyntaxError (malformed user JSON), preventing silent
swallowing of programming bugs
- packages/atproto/src/errors.ts: replace broad "query" substring with
"failed query" — the exact prefix DrizzleQueryError uses when wrapping
failed DB queries, avoiding false positives on unrelated messages
- backfill-manager.ts: persist per-collection errors to backfillErrors
table during Phase 1 (forum-owned collections) to match Phase 2 behaviour
- admin.ts GET /members: add isTruncated field to response when result
set is truncated at 100 rows