Analytics
Analytics
The analytics layer for Atlas. Covers the DuckDB analytical database, the dbt transform pipeline, and all mart definitions.
For the overall pipeline context see architecture.md. For the operational database schema see model.md.
Overview
The analytics layer is a Modern Data Stack pipeline built on DuckDB and dbt-core. It implements the ELT pattern — raw source data is loaded first, then transformed through layered SQL models into analysis-ready marts that power the Format layer and the dashboard.
DuckDB
Role
DuckDB is the analytical engine. It runs in-process — no server required. All four transform layers (raw, staging, intermediate, mart) live in a single DuckDB file (atlas.db).
Why DuckDB
| Capability | Usage in Atlas |
|---|---|
| Native xlsx reading | read_xlsx() loads source files directly — no parsing code needed |
| In-process OLAP | Runs embedded in Bun scripts and TanStack Start server functions |
| Fast aggregations | Revenue totals, year-over-year comparisons, cohort counts — vectorized execution |
| Full SQL | Window functions, CTEs, PIVOT, UNPIVOT, QUALIFY — used extensively in mart models |
| dbt compatible | dbt-duckdb adapter is the official DuckDB dbt backend |
| Local file | atlas.db — no network dependency for local development |
Connection
import { Database } from 'duckdb-async'
const db = await Database.create('atlas.db')
const conn = await db.connect()Raw layer setup
Each xlsx source file maps to one or more raw tables. Tables are created or replaced on each sync run — the raw layer always reflects the latest source data.
-- Transactions ledger
CREATE OR REPLACE TABLE raw_transactions AS
SELECT * FROM read_xlsx(
'source/LAPORAN EVALUASI IONs 2026 TM-WLC.xlsx',
sheet := 'FEBRUARI',
header := true
);
-- Student database
CREATE OR REPLACE TABLE raw_students AS
SELECT * FROM read_xlsx(
'source/DATABASE SISWA IONS 2026 TM-WLC.xlsx',
sheet := 'FEBRUARI26',
header := true
);dbt Project
Setup
The dbt project lives at @packages/transform/. It uses the dbt-duckdb adapter and is managed via uv.
# Install
cd @packages/transform
uv venv
uv pip install -r requirements.txt
# Run all models
uv run dbt run
# Run specific layer
uv run dbt run --select staging
uv run dbt run --select intermediate
uv run dbt run --select marts
# Test
uv run dbt test
# Generate and serve docs
uv run dbt docs generate
uv run dbt docs serveprofiles.yml
atlas:
target: dev
outputs:
dev:
type: duckdb
path: ../../atlas.db
threads: 4dbt_project.yml
name: atlas
version: '1.0.0'
profile: atlas
model-paths: ['models']
test-paths: ['tests']
models:
atlas:
staging:
+materialized: view
intermediate:
+materialized: view
marts:
+materialized: tableStaging and intermediate models are materialized as views (no storage cost, always fresh). Mart models are materialized as tables (pre-aggregated, fast reads for the dashboard and Format layer).
Transform Layers
Layer 1 — Raw (raw_*)
Exact copy of source data. No column renaming, no type casting, no filtering. Created directly by the sync layer via DuckDB's read_xlsx().
| Table | Source file | Source sheet |
|---|---|---|
raw_transactions | LAPORAN EVALUASI IONs [year] TM-WLC.xlsx | Monthly ledger sheets (JANUARI, FEBRUARI, …) |
raw_students | DATABASE SISWA IONS [year] TM-WLC.xlsx | Monthly enrollment sheets (JANUARI26, FEBRUARI26, …) |
raw_organizations | DATABASE SISWA IONS [year] TM-WLC.xlsx | REFERENSI sheet |
raw_targets | LAPORAN EVALUASI IONs [year] TM-WLC.xlsx | EVALUASI DATA SISWA sheet |
raw_marketing_activity | LAPORAN EVALUASI IONs [year] TM-WLC.xlsx | REKAP [month] sheets |
Layer 2 — Staging (stg_*)
Light cleaning only. Column names translated from Indonesian to English (matching glossary.md terms). Types cast. Rows deduplicated. No joins, no business logic.
Naming rule: one staging model per raw table. raw_transactions → stg_transactions.
stg_transactions
SELECT
"NO. KWITANSI" AS receipt_number,
TRY_CAST("TANGGAL" AS DATE) AS date,
"TENANT" AS unit_code,
"PROGRAM" AS product_name,
"NAMA KELAS" AS variant_name,
"INTAKE" AS period_label,
"NAMA SISWA" AS student_name,
TRY_CAST("JUMLAH" AS DECIMAL(18,2)) AS amount,
"PEMBAYARAN VIA" AS payment_channel,
"CASH/TRANSFER/EDC/QRIS" AS payment_method_name,
"BARU/LANJUT/ALUMNI" AS customer_type_name,
"LUNAS/ANGSUR" AS payment_status_name,
"KATEGORI PRODUCT" AS product_category,
"BARSHEET" AS batch_ref
FROM raw_transactions
WHERE "NO. KWITANSI" IS NOT NULL
QUALIFY ROW_NUMBER() OVER (PARTITION BY "NO. KWITANSI" ORDER BY "TANGGAL") = 1stg_students
SELECT
"NAMA" AS full_name,
"TANGGAL" AS enrollment_date,
"ORGANISASI" AS organization_name,
"DIVISI/PRODUCT" AS unit_code,
"KLUSTER USIA" AS age_cluster,
"TIPE KURSUS" AS course_type_name,
"PROGRAM/LAYANAN" AS product_name,
"KELAS/LEVEL" AS variant_name,
"INTAKE" AS period_label,
"SUMBER INFORMASI" AS channel_name,
"BARU/LANJUT/ALUMNI" AS customer_type_name,
"CARA BAYAR" AS payment_channel,
"JENIS KELAMIN" AS sex_code
FROM raw_students
WHERE "NAMA" IS NOT NULLstg_targets
SELECT
"TENANT" AS unit_code,
TRY_CAST("TARGET REVENUE" AS DECIMAL(18,2)) AS revenue_target,
TRY_CAST("TARGET SISWA STANDAR" AS DECIMAL) AS student_target,
TRY_CAST("REALISASI REVENUE" AS DECIMAL(18,2)) AS revenue_actuals,
TRY_CAST("REALISASI SISWA RIIL" AS INTEGER) AS student_actuals
FROM raw_targets
WHERE "TENANT" IS NOT NULLstg_marketing_activity
SELECT
"TENANT" AS unit_code,
period_label,
week_number,
"IG" AS leads_ig,
"WA" AS leads_wa,
"EMAIL" AS leads_email,
"TELEMARKETING" AS leads_telemarketing,
"GTS" AS leads_gts,
"BRANDING" AS leads_branding,
"FREE CLASS" AS leads_free_class
FROM raw_marketing_activity
WHERE "TENANT" IS NOT NULLLayer 3 — Intermediate (int_*)
Business logic applied. Joins across staging tables. FK resolution. Customer type classification. No aggregation.
int_orders
Joins transactions with students to produce one row per order with all relevant dimensions.
WITH base AS (
SELECT
t.receipt_number,
t.date,
t.unit_code,
t.product_name,
t.variant_name,
t.amount,
t.payment_channel,
t.payment_method_name,
t.payment_status_name,
t.customer_type_name,
t.product_category,
t.batch_ref,
s.organization_name,
s.age_cluster,
s.course_type_name,
s.channel_name,
s.sex_code,
s.full_name AS student_name,
-- Extract period year and month from period_label
CAST(SPLIT_PART(t.period_label, ' ', 2) AS INTEGER) AS period_year,
CASE SPLIT_PART(t.period_label, ' ', 1)
WHEN 'JANUARI' THEN 1 WHEN 'FEBRUARI' THEN 2
WHEN 'MARET' THEN 3 WHEN 'APRIL' THEN 4
WHEN 'MEI' THEN 5 WHEN 'JUNI' THEN 6
WHEN 'JULI' THEN 7 WHEN 'AGUSTUS' THEN 8
WHEN 'SEPTEMBER' THEN 9 WHEN 'OKTOBER' THEN 10
WHEN 'NOVEMBER' THEN 11 WHEN 'DESEMBER' THEN 12
END AS period_month,
-- Normalize customer type
CASE customer_type_name
WHEN 'B' THEN 'New'
WHEN 'L' THEN 'Renewal'
WHEN 'A' THEN 'Alumni'
END AS customer_type,
-- Normalize unit code
CASE unit_code
WHEN 'WLC INGGRIS' THEN 'WLC_ENGLISH'
WHEN 'WLC NON INGGRIS' THEN 'WLC_NON_ENGLISH'
ELSE unit_code
END AS unit_code_normalized
FROM stg_transactions t
LEFT JOIN stg_students s
ON t.receipt_number = s.receipt_number
)
SELECT * FROM base
WHERE receipt_number IS NOT NULLint_enrollments
Filters int_orders to Enrollment-type orders only (excludes book sales, event fees, etc.).
SELECT *
FROM int_orders
WHERE product_category NOT IN ('BUKU REAL', 'BUKU WLC', 'BUKU MANDARIN', 'EVENT - WLC',
'EVENT - ICC', 'SIMULASI JLPT', 'SIMULASI HSK',
'KIRIM - REAL', 'KIRIM - TM', 'TOP UP POIN')
AND amount > 0Layer 4 — Marts (mart_*)
Pre-aggregated, analysis-ready tables. One mart per report section. All mart models are materialized as tables for fast dashboard and Format layer reads.
mart_revenue
Powers the Revenue Comparison slide and dashboard revenue view.
SELECT
unit_code_normalized AS unit_code,
period_year,
period_month,
SUM(amount) AS revenue_actuals,
COUNT(DISTINCT receipt_number) AS transaction_count,
-- Period labels for presentation layer
period_year || '-' || LPAD(period_month::VARCHAR, 2, '0') AS period
FROM int_orders
GROUP BY unit_code_normalized, period_year, period_monthJoined with targets at query time (in the Format layer):
SELECT
r.*,
t.revenue_target,
t.revenue_target - r.revenue_actuals AS gap_to_target,
ROUND(r.revenue_actuals / t.revenue_target * 100, 2) AS achievement_pct
FROM mart_revenue r
LEFT JOIN stg_targets t
ON r.unit_code = t.unit_code
AND r.period_year = {{ var("year") }}
AND r.period_month = {{ var("month") }}
WHERE r.period_year = {{ var("year") }}
AND r.period_month = {{ var("month") }}
AND r.unit_code = '{{ var("unit") }}'mart_program_progress
Powers the Key Comparison and Program Progress slides.
SELECT
unit_code_normalized AS unit_code,
product_name,
product_category,
course_type_name,
period_year,
period_month,
customer_type,
COUNT(DISTINCT receipt_number) AS order_count,
COUNT(DISTINCT student_name) AS student_count,
SUM(amount) AS revenue
FROM int_enrollments
GROUP BY
unit_code_normalized,
product_name,
product_category,
course_type_name,
period_year,
period_month,
customer_typemart_channel_marketing
Powers the Channel Marketing slide and dashboard marketing view.
SELECT
unit_code_normalized AS unit_code,
channel_name,
period_year,
period_month,
customer_type,
COUNT(DISTINCT receipt_number) AS closings,
COUNT(DISTINCT student_name) AS students,
ROUND(
COUNT(DISTINCT receipt_number) * 100.0 /
SUM(COUNT(DISTINCT receipt_number)) OVER (
PARTITION BY unit_code_normalized, period_year, period_month
), 2
) AS contribution_pct
FROM int_enrollments
WHERE channel_name IS NOT NULL
GROUP BY
unit_code_normalized,
channel_name,
period_year,
period_month,
customer_typemart_school_progress
Powers the School Progress slide — shows which organizations produced students each year.
SELECT
unit_code_normalized AS unit_code,
organization_name,
period_year,
COUNT(DISTINCT student_name) AS student_count
FROM int_enrollments
WHERE organization_name IS NOT NULL
GROUP BY
unit_code_normalized,
organization_name,
period_yearPivoted at query time in the Format layer to produce the year × organization matrix shown in the presentation.
dbt Tests
Each model has built-in dbt tests defined in schema.yml files.
# models/staging/schema.yml
models:
- name: stg_transactions
columns:
- name: receipt_number
tests:
- not_null
- unique
- name: amount
tests:
- not_null
- name: unit_code
tests:
- accepted_values:
values: ['REAL', 'WLC INGGRIS', 'WLC NON INGGRIS', 'TM', 'NP', 'ICC', 'ICA', 'DTC', 'IONs']
- name: stg_targets
columns:
- name: unit_code
tests:
- not_null
- uniqueLineage
Full data lineage from source to mart: