Atlas Plan
Plans012 2026 02 23 Pipeline Workflow Unification

Pipeline Workflow Unification

Overview

Unify the scattered pipeline workflow into a frequency-based command structure: seed (bootstrap), sync (monthly data refresh), publish (dashboard data), and report (generate outputs). Add missing transform and publish stages, implement metadata tracking, and standardize CLI flags.

Goals

  • Reorganize CLI commands by usage frequency: seed, sync, publish, report
  • Add transform command as dbt wrapper (uv run dbt run)
  • Add publish command for DuckDB → LibSQL data sync
  • Combine format + present into unified report command with --type flag
  • Support month ranges (--month 1-3) for quarterly reports
  • Normalize --entity and --unit to uppercase (case-insensitive input)
  • Track pipeline runs in DuckDB _pipeline_runs table
  • Add staleness check (info-only warning) before report generation

Non-Goals

  • Yearly report templates (future scope)
  • Weekly report type (future scope)
  • Interactive prompts for staleness (just warn, allow --force)
  • Combined flags (--publish on sync, --report on sync) — too magical
  • Upsert publish strategy (using delete + insert per month instead)

Phases

  • Phase 1: CLI restructure — new commands, flag parsing, case normalization
  • Phase 2: Transform & Publish — implement missing pipeline stages
  • Phase 3: Report unification — combine format + present, add --type
  • Phase 4: Metadata tracking — _pipeline_runs table and staleness checks
  • Phase 5: Documentation — update AGENTS.md, architecture.md, workflow docs

Success

  • pnpm seed --entity ions seeds lookup tables (case-insensitive)
  • pnpm sync --entity IONS --year 2026 runs extract → load → validate → transform
  • pnpm publish --entity IONS --year 2026 --month 2 pushes data to LibSQL
  • pnpm report --entity IONS --year 2026 --month 2 --type monthly generates report.json + PPTX/PDF
  • pnpm report --entity IONS --year 2026 --month 1-3 --type quarterly generates Q1 report
  • Pipeline runs tracked in _pipeline_runs table with timestamps
  • Staleness warning shown before report if data is stale
  • --force flag skips staleness check
  • Documentation updated (AGENTS.md, architecture.md)

Requirements

  • Existing @packages/pipeline package structure
  • DuckDB node API for _pipeline_runs table
  • @libsql/client for publish operations (already a dependency)
  • uv installed for dbt execution
  • Drizzle schema for target LibSQL tables (commerce_order, finance_transaction, etc.)

Context

Why This Approach

  • Frequency-based grouping matches user mental model (bootstrap vs recurring vs on-demand)
  • Separate sync and publish provides control over when dashboard updates
  • Delete + insert per month is simpler and handles record deletions automatically
  • Metadata in DuckDB keeps pipeline self-contained (no external dependencies)

Key Constraints

  • dbt runs all models (no per-month filtering) — transform is always full refresh
  • LibSQL publish requires FK resolution (denormalized dbt output → normalized LibSQL schema)
  • Present service uses pptxgenjs constructor interop (existing quirk from Plan 011)

Edge Cases

  • Month range --month 1-3 should validate start <= end
  • Quarterly report with missing month data should warn, not fail
  • Entity/unit normalization should handle mixed case (ions, IONS, Ions)
  • Publish with no data for month should be a no-op (not error)

Tradeoffs

  • Transform always runs all models (acceptable — dbt is fast, idempotent)
  • Delete + insert may briefly show incomplete data (acceptable — operation is fast)
  • Staleness check is info-only (user requested no interactive prompts for now)

Skills

  • None required — this is core pipeline TypeScript work

Boundaries

  • Always: Run in transaction for publish (atomic delete + insert)
  • Always: Normalize entity/unit to uppercase at parse time
  • Always: Log pipeline runs to _pipeline_runs table
  • Ask first: Schema changes to LibSQL tables (may need migrations)
  • Ask first: Changes to existing root package.json scripts
  • Never: Delete existing commands without deprecation path
  • Never: Change dbt model outputs (only read from them)

Questions

  • Should clean-report command be kept as-is or folded into report? → Keep as-is (different purpose: QA workbook vs presentation)
  • Should run command be deprecated or kept as alias? → Deprecate with warning (backward compatible, guides users to sync)

On this page