Plans002 2026 02 20 Sync and Source Config
Set Up @packages/sync + Source Config
Overview
Scaffold the @packages/sync package and establish the per-year YAML source config system. The sync layer reads config files from source/config/, resolves file paths and sheet names per year, and loads raw data into DuckDB using native read_xlsx. Also seeds LibSQL lookup tables from registry values.
Result: running bun run sync --source xlsx --entity IONS --year 2026 populates atlas.db with all raw tables (raw_transactions, raw_students, raw_organizations, raw_targets, raw_marketing_activity).
Goals
- Create
source/config/ions-YYYY.yamlfiles for 2023–2026 encoding all per-year file/sheet/column variations - Scaffold
@packages/syncas a Bun workspace package with correct tsconfig, eslint, and build config - Implement DuckDB connection + raw table creation (
duck.ts) - Implement config loader (
config.ts) that reads YAML and resolves paths - Implement xlsx sync runner (
xlsx.ts) that callsread_xlsxfor each configured sheet - Implement
seed.tsto seed LibSQL lookup tables from registry constants - Wire CLI (
index.ts) with--source,--entity,--year,--monthflags - Verify:
bun run sync --source xlsx --entity IONS --year 2026completes without error and raw tables exist inatlas.db
Non-Goals
- Running dbt transforms (Plan 003)
- Populating the LibSQL operational tables with order/person data (Plan 005)
- Odoo adapter (future)
- Syncing years other than 2023–2026 (config files only cover known years)
- CI/CD pipeline
Phases
- Phase 1: Source config files —
source/config/ions-YYYY.yamlfor all 4 years - Phase 2: Package scaffold —
@packages/syncconfig files (package.json, tsconfig, eslint) - Phase 3: Core sync code — duck.ts, config.ts, xlsx.ts, seed.ts
- Phase 4: CLI + integration — index.ts wired, root scripts updated, type-check + run verification
Success
-
source/config/ions-2026.yamlexists with correct file, sheet, column, and header_row entries -
source/config/ions-2025.yaml,ions-2024.yaml,ions-2023.yamlexist with their year-specific variations -
@packages/syncadded to workspace packages in rootpackage.json -
bun run test:type --filter @packages/syncpasses -
bun run sync --source xlsx --entity IONS --year 2026exits 0 -
atlas.dbcontainsraw_transactions,raw_students,raw_organizations,raw_targets,raw_marketing_activityafter sync
Requirements
- DuckDB node bindings available (
duckdb-asyncor@duckdb/node-api) js-yamlor equivalent for YAML parsing in Bun@libsql/clientfor LibSQL seeding (seed.ts)@repo/typescript/libtsconfig preset@repo/lint/baseESLint configsource/directory with the 8 xlsx files already present@plan/registry.mdas source of truth for seed values
Context
Why This Approach
- Per-year YAML config files decouple column name variations from code — adding a new year only requires a new config file, no code changes
- DuckDB
read_xlsxhandles extraction natively; no parsing code required header_rowis configurable per file+year because 2023 LAPORAN EVALUASI starts at row 6 vs row 7 in other years- Config-driven sheet name resolution handles the
FEBRUARI 23(2023, space) vsFEBRUARI26(2026, no space) naming differences - LibSQL seeding in this layer (seed.ts) because lookup table values come from
registry.mdwhich is independent of dbt
Key Constraints
- Raw layer preserves source column names (Indonesian) exactly — no renaming in sync layer; all renaming happens in dbt staging
read_xlsxin DuckDB requiresheader := trueand aheader_rowoffset for files where data doesn't start at row 1- 2023 DATABASE SISWA: header row is row 7; monthly sheet names include a space (
JANUARI 23) - 2026 LAPORAN EVALUASI: adds
NO. KWITANSI(receipt number) andNAMA KELAScolumns — 17 cols vs 15 in 2024/2025 - 2026 DATABASE SISWA:
BULAN/INTAKEsplit intoBULAN+INTAKE(separate columns);TANGGAL/HARI/JAMrenamed toTANGGAL COMBINEDsheets in LAPORAN EVALUASI 2024–2026 contain#REF!errors — do not load these sheetsOLAH *,SUMMARY *,Sheet*tabs are internal computation sheets — do not load
Edge Cases
- 2026 DATABASE SISWA: OLAH/summary tabs for months 3–12 still carry
25suffix (copy error) — only load the data sheets (MARET26, etc.), not the OLAH tabs Summary2026sheet label says2025(copy error) — ignore, not loaded- If a monthly sheet is missing (e.g. future months), skip gracefully with a warning
--monthflag should limit loading to a single month sheet when provided (faster for incremental updates)
Tradeoffs
- YAML over TypeScript config: YAML is more readable for non-developers editing year configs; TypeScript config would give type safety but requires a build step to edit
- Loading all months per year (default): complete historical data in one run;
--monthflag available for incremental seed.tsis idempotent (upsert): safe to run multiple times without duplicating lookup data
Skills
plan— plan file format and conventions
Boundaries
- Always: Update Progress.md after each task completion
- Always: Preserve Indonesian source column names in raw tables — never rename in sync layer
- Always: Use
catalog:references in package.json, never pin versions directly - Always: Use
workspace:*for@repo/lintand@repo/typescript - Ask first: Any deviation from YAML config file format
- Ask first: Adding dependencies not discussed in planning
- Never: Apply business logic or column renaming in the sync layer
- Never: Load
COMBINED,OLAH *,SUMMARY *, orSheet*tabs - Never: Run dbt or write to LibSQL operational tables (beyond seed.ts lookups)
Questions
- Config file format? → YAML in
source/config/ions-YYYY.yaml - Which DuckDB node binding? →
duckdb-async(already used in architecture examples) - YAML parser? →
js-yaml(lightweight, Bun-compatible) - Should
seed.tsbe idempotent (upsert) or error on duplicate? → Upsert/replace - Should the CLI also accept
--monthfor single-month incremental sync? → Yes