Atlas Plan

Analytics

Analytics

The analytics layer for Atlas. Covers the DuckDB analytical database, the dbt transform pipeline, and all mart definitions.

For the overall pipeline context see architecture.md. For the operational database schema see model.md.


Overview

The analytics layer is a Modern Data Stack pipeline built on DuckDB and dbt-core. It implements the ELT pattern — raw source data is loaded first, then transformed through layered SQL models into analysis-ready marts that power the Format layer and the dashboard.


DuckDB

Role

DuckDB is the analytical engine. It runs in-process — no server required. All four transform layers (raw, staging, intermediate, mart) live in a single DuckDB file (atlas.db).

Why DuckDB

CapabilityUsage in Atlas
Native xlsx readingread_xlsx() loads source files directly — no parsing code needed
In-process OLAPRuns embedded in Bun scripts and TanStack Start server functions
Fast aggregationsRevenue totals, year-over-year comparisons, cohort counts — vectorized execution
Full SQLWindow functions, CTEs, PIVOT, UNPIVOT, QUALIFY — used extensively in mart models
dbt compatibledbt-duckdb adapter is the official DuckDB dbt backend
Local fileatlas.db — no network dependency for local development

Connection

import { Database } from 'duckdb-async'

const db = await Database.create('atlas.db')
const conn = await db.connect()

Raw layer setup

Each xlsx source file maps to one or more raw tables. Tables are created or replaced on each sync run — the raw layer always reflects the latest source data.

-- Transactions ledger
CREATE OR REPLACE TABLE raw_transactions AS
SELECT * FROM read_xlsx(
  'source/LAPORAN EVALUASI IONs 2026 TM-WLC.xlsx',
  sheet := 'FEBRUARI',
  header := true
);

-- Student database
CREATE OR REPLACE TABLE raw_students AS
SELECT * FROM read_xlsx(
  'source/DATABASE SISWA IONS 2026 TM-WLC.xlsx',
  sheet := 'FEBRUARI26',
  header := true
);

dbt Project

Setup

The dbt project lives at @packages/transform/. It uses the dbt-duckdb adapter and is managed via uv.

# Install
cd @packages/transform
uv venv
uv pip install -r requirements.txt

# Run all models
uv run dbt run

# Run specific layer
uv run dbt run --select staging
uv run dbt run --select intermediate
uv run dbt run --select marts

# Test
uv run dbt test

# Generate and serve docs
uv run dbt docs generate
uv run dbt docs serve

profiles.yml

atlas:
  target: dev
  outputs:
    dev:
      type: duckdb
      path: ../../atlas.db
      threads: 4

dbt_project.yml

name: atlas
version: '1.0.0'
profile: atlas

model-paths: ['models']
test-paths: ['tests']

models:
  atlas:
    staging:
      +materialized: view
    intermediate:
      +materialized: view
    marts:
      +materialized: table

Staging and intermediate models are materialized as views (no storage cost, always fresh). Mart models are materialized as tables (pre-aggregated, fast reads for the dashboard and Format layer).


Transform Layers

Layer 1 — Raw (raw_*)

Exact copy of source data. No column renaming, no type casting, no filtering. Created directly by the sync layer via DuckDB's read_xlsx().

TableSource fileSource sheet
raw_transactionsLAPORAN EVALUASI IONs [year] TM-WLC.xlsxMonthly ledger sheets (JANUARI, FEBRUARI, …)
raw_studentsDATABASE SISWA IONS [year] TM-WLC.xlsxMonthly enrollment sheets (JANUARI26, FEBRUARI26, …)
raw_organizationsDATABASE SISWA IONS [year] TM-WLC.xlsxREFERENSI sheet
raw_targetsLAPORAN EVALUASI IONs [year] TM-WLC.xlsxEVALUASI DATA SISWA sheet
raw_marketing_activityLAPORAN EVALUASI IONs [year] TM-WLC.xlsxREKAP [month] sheets

Layer 2 — Staging (stg_*)

Light cleaning only. Column names translated from Indonesian to English (matching glossary.md terms). Types cast. Rows deduplicated. No joins, no business logic.

Naming rule: one staging model per raw table. raw_transactionsstg_transactions.

stg_transactions

SELECT
    "NO. KWITANSI"              AS receipt_number,
    TRY_CAST("TANGGAL" AS DATE) AS date,
    "TENANT"                    AS unit_code,
    "PROGRAM"                   AS product_name,
    "NAMA KELAS"                AS variant_name,
    "INTAKE"                    AS period_label,
    "NAMA SISWA"                AS student_name,
    TRY_CAST("JUMLAH" AS DECIMAL(18,2)) AS amount,
    "PEMBAYARAN VIA"            AS payment_channel,
    "CASH/TRANSFER/EDC/QRIS"    AS payment_method_name,
    "BARU/LANJUT/ALUMNI"        AS customer_type_name,
    "LUNAS/ANGSUR"              AS payment_status_name,
    "KATEGORI PRODUCT"          AS product_category,
    "BARSHEET"                  AS batch_ref
FROM raw_transactions
WHERE "NO. KWITANSI" IS NOT NULL
QUALIFY ROW_NUMBER() OVER (PARTITION BY "NO. KWITANSI" ORDER BY "TANGGAL") = 1

stg_students

SELECT
    "NAMA"                      AS full_name,
    "TANGGAL"                   AS enrollment_date,
    "ORGANISASI"                AS organization_name,
    "DIVISI/PRODUCT"            AS unit_code,
    "KLUSTER USIA"              AS age_cluster,
    "TIPE KURSUS"               AS course_type_name,
    "PROGRAM/LAYANAN"           AS product_name,
    "KELAS/LEVEL"               AS variant_name,
    "INTAKE"                    AS period_label,
    "SUMBER INFORMASI"          AS channel_name,
    "BARU/LANJUT/ALUMNI"        AS customer_type_name,
    "CARA BAYAR"                AS payment_channel,
    "JENIS KELAMIN"             AS sex_code
FROM raw_students
WHERE "NAMA" IS NOT NULL

stg_targets

SELECT
    "TENANT"                              AS unit_code,
    TRY_CAST("TARGET REVENUE" AS DECIMAL(18,2))  AS revenue_target,
    TRY_CAST("TARGET SISWA STANDAR" AS DECIMAL)  AS student_target,
    TRY_CAST("REALISASI REVENUE" AS DECIMAL(18,2)) AS revenue_actuals,
    TRY_CAST("REALISASI SISWA RIIL" AS INTEGER)  AS student_actuals
FROM raw_targets
WHERE "TENANT" IS NOT NULL

stg_marketing_activity

SELECT
    "TENANT"                        AS unit_code,
    period_label,
    week_number,
    "IG"                            AS leads_ig,
    "WA"                            AS leads_wa,
    "EMAIL"                         AS leads_email,
    "TELEMARKETING"                 AS leads_telemarketing,
    "GTS"                           AS leads_gts,
    "BRANDING"                      AS leads_branding,
    "FREE CLASS"                    AS leads_free_class
FROM raw_marketing_activity
WHERE "TENANT" IS NOT NULL

Layer 3 — Intermediate (int_*)

Business logic applied. Joins across staging tables. FK resolution. Customer type classification. No aggregation.

int_orders

Joins transactions with students to produce one row per order with all relevant dimensions.

WITH base AS (
    SELECT
        t.receipt_number,
        t.date,
        t.unit_code,
        t.product_name,
        t.variant_name,
        t.amount,
        t.payment_channel,
        t.payment_method_name,
        t.payment_status_name,
        t.customer_type_name,
        t.product_category,
        t.batch_ref,
        s.organization_name,
        s.age_cluster,
        s.course_type_name,
        s.channel_name,
        s.sex_code,
        s.full_name AS student_name,
        -- Extract period year and month from period_label
        CAST(SPLIT_PART(t.period_label, ' ', 2) AS INTEGER) AS period_year,
        CASE SPLIT_PART(t.period_label, ' ', 1)
            WHEN 'JANUARI'   THEN 1  WHEN 'FEBRUARI'  THEN 2
            WHEN 'MARET'     THEN 3  WHEN 'APRIL'     THEN 4
            WHEN 'MEI'       THEN 5  WHEN 'JUNI'      THEN 6
            WHEN 'JULI'      THEN 7  WHEN 'AGUSTUS'   THEN 8
            WHEN 'SEPTEMBER' THEN 9  WHEN 'OKTOBER'   THEN 10
            WHEN 'NOVEMBER'  THEN 11 WHEN 'DESEMBER'  THEN 12
        END AS period_month,
        -- Normalize customer type
        CASE customer_type_name
            WHEN 'B' THEN 'New'
            WHEN 'L' THEN 'Renewal'
            WHEN 'A' THEN 'Alumni'
        END AS customer_type,
        -- Normalize unit code
        CASE unit_code
            WHEN 'WLC INGGRIS'     THEN 'WLC_ENGLISH'
            WHEN 'WLC NON INGGRIS' THEN 'WLC_NON_ENGLISH'
            ELSE unit_code
        END AS unit_code_normalized
    FROM stg_transactions t
    LEFT JOIN stg_students s
        ON t.receipt_number = s.receipt_number
)
SELECT * FROM base
WHERE receipt_number IS NOT NULL

int_enrollments

Filters int_orders to Enrollment-type orders only (excludes book sales, event fees, etc.).

SELECT *
FROM int_orders
WHERE product_category NOT IN ('BUKU REAL', 'BUKU WLC', 'BUKU MANDARIN', 'EVENT - WLC',
                                'EVENT - ICC', 'SIMULASI JLPT', 'SIMULASI HSK',
                                'KIRIM - REAL', 'KIRIM - TM', 'TOP UP POIN')
  AND amount > 0

Layer 4 — Marts (mart_*)

Pre-aggregated, analysis-ready tables. One mart per report section. All mart models are materialized as tables for fast dashboard and Format layer reads.


mart_revenue

Powers the Revenue Comparison slide and dashboard revenue view.

SELECT
    unit_code_normalized                                    AS unit_code,
    period_year,
    period_month,
    SUM(amount)                                             AS revenue_actuals,
    COUNT(DISTINCT receipt_number)                          AS transaction_count,
    -- Period labels for presentation layer
    period_year || '-' || LPAD(period_month::VARCHAR, 2, '0') AS period
FROM int_orders
GROUP BY unit_code_normalized, period_year, period_month

Joined with targets at query time (in the Format layer):

SELECT
    r.*,
    t.revenue_target,
    t.revenue_target - r.revenue_actuals AS gap_to_target,
    ROUND(r.revenue_actuals / t.revenue_target * 100, 2) AS achievement_pct
FROM mart_revenue r
LEFT JOIN stg_targets t
    ON r.unit_code = t.unit_code
    AND r.period_year = {{ var("year") }}
    AND r.period_month = {{ var("month") }}
WHERE r.period_year = {{ var("year") }}
  AND r.period_month = {{ var("month") }}
  AND r.unit_code = '{{ var("unit") }}'

mart_program_progress

Powers the Key Comparison and Program Progress slides.

SELECT
    unit_code_normalized                AS unit_code,
    product_name,
    product_category,
    course_type_name,
    period_year,
    period_month,
    customer_type,
    COUNT(DISTINCT receipt_number)      AS order_count,
    COUNT(DISTINCT student_name)        AS student_count,
    SUM(amount)                         AS revenue
FROM int_enrollments
GROUP BY
    unit_code_normalized,
    product_name,
    product_category,
    course_type_name,
    period_year,
    period_month,
    customer_type

mart_channel_marketing

Powers the Channel Marketing slide and dashboard marketing view.

SELECT
    unit_code_normalized                AS unit_code,
    channel_name,
    period_year,
    period_month,
    customer_type,
    COUNT(DISTINCT receipt_number)      AS closings,
    COUNT(DISTINCT student_name)        AS students,
    ROUND(
        COUNT(DISTINCT receipt_number) * 100.0 /
        SUM(COUNT(DISTINCT receipt_number)) OVER (
            PARTITION BY unit_code_normalized, period_year, period_month
        ), 2
    )                                   AS contribution_pct
FROM int_enrollments
WHERE channel_name IS NOT NULL
GROUP BY
    unit_code_normalized,
    channel_name,
    period_year,
    period_month,
    customer_type

mart_school_progress

Powers the School Progress slide — shows which organizations produced students each year.

SELECT
    unit_code_normalized                AS unit_code,
    organization_name,
    period_year,
    COUNT(DISTINCT student_name)        AS student_count
FROM int_enrollments
WHERE organization_name IS NOT NULL
GROUP BY
    unit_code_normalized,
    organization_name,
    period_year

Pivoted at query time in the Format layer to produce the year × organization matrix shown in the presentation.


dbt Tests

Each model has built-in dbt tests defined in schema.yml files.

# models/staging/schema.yml
models:
  - name: stg_transactions
    columns:
      - name: receipt_number
        tests:
          - not_null
          - unique
      - name: amount
        tests:
          - not_null
      - name: unit_code
        tests:
          - accepted_values:
              values: ['REAL', 'WLC INGGRIS', 'WLC NON INGGRIS', 'TM', 'NP', 'ICC', 'ICA', 'DTC', 'IONs']

  - name: stg_targets
    columns:
      - name: unit_code
        tests:
          - not_null
          - unique

Lineage

Full data lineage from source to mart: