Test suite

34 pytest cases covering data integrity, leakage prevention, chronological splits, metric correctness, per-unit model training, ensemble weighting, and feature-column composition. Every push to main runs the data-free subset (23 cases) in GitHub Actions; the full suite runs locally with the gitignored ADT export.

34 / 34

Full suite passing

Last local run: 2026-05-05

23 / 23

CI subset passing

Runs on every push

Test classes

Grouped by concern

7.2 s

Full suite runtime

End-to-end on the local box

Failures

No skips, no errors

Why these tests, in this order

The suite is organized by failure mode, not by source file. Each class isolates a different way the pipeline could silently produce a wrong answer — leakage from future data into the feature set, accidental shuffling across the temporal split, a metric that divides by zero, a model that predicts a negative census. Tests that need the gitignored 38 MB ADT export (data/raw/postsql.csv) are marked requires_data and skipped in CI; the rest are pure unit tests against config and arithmetic and run on every push.

Confirms the raw ADT export loads with the expected schema and types — catches breakage in upstream SQL exports before any feature work begins.

Test	What it verifies	Status	Time
test_shape	Dataset has > 100,000 rows and at least 60 columns.	Pass	0.20 ms
test_required_columns	The datetime, unit, and census columns from the config are all present.	Pass	0.13 ms
test_datetime_parsed	The datetime column is parsed as `datetime64`, not left as a string.	Pass	0.21 ms
test_no_unnamed_column	The SQL export's index column (`Unnamed: 0`) was dropped in cleaning.	Pass	0.13 ms

The single most important class in the suite. Forecasting at horizon H must not use any feature whose lag is < H, and must never see any TARGET_* column. Parametrized across all eight horizons.

Test	What it verifies	Status	Time
test_no_short_lags_for_horizon[1]	At H=1, no lag feature with lag < 1 leaks into the feature set.	Pass	0.14 ms
test_no_short_lags_for_horizon[2]	At H=2, no lag feature with lag < 2 leaks into the feature set.	Pass	0.13 ms
test_no_short_lags_for_horizon[3]	At H=3, no lag feature with lag < 3 leaks into the feature set.	Pass	0.12 ms
test_no_short_lags_for_horizon[4]	At H=4, no lag feature with lag < 4 leaks into the feature set.	Pass	0.13 ms
test_no_short_lags_for_horizon[12]	At H=12, no lag feature with lag < 12 leaks into the feature set.	Pass	0.12 ms
test_no_short_lags_for_horizon[24]	At H=24, no lag feature with lag < 24 leaks into the feature set.	Pass	0.13 ms
test_no_short_lags_for_horizon[48]	At H=48, no lag feature with lag < 48 leaks into the feature set.	Pass	0.11 ms
test_no_short_lags_for_horizon[72]	At H=72, no lag feature with lag < 72 leaks into the feature set.	Pass	0.12 ms
test_no_target_in_features[1]	No `TARGET_CENSUS_*` column appears in the H=1 feature list.	Pass	0.12 ms
test_no_target_in_features[12]	No `TARGET_CENSUS_*` column appears in the H=12 feature list.	Pass	0.13 ms
test_no_target_in_features[24]	No `TARGET_CENSUS_*` column appears in the H=24 feature list.	Pass	0.15 ms
test_no_target_in_features[72]	No `TARGET_CENSUS_*` column appears in the H=72 feature list.	Pass	0.14 ms
test_no_unit_encoded_in_features	Per-unit models do not include `unit_encoded` as a feature.	Pass	0.13 ms

Forecasting on a shuffled split would invalidate every accuracy number on the site. These tests fail loudly if anyone ever swaps in a random split or a stratified resample.

Test	What it verifies	Status	Time
test_no_temporal_overlap	train.max ≤ val.min and val.max ≤ test.min — splits are strictly ordered in time.	Pass	0.71 ms
test_split_sizes	Train is the largest split; validation and test are both non-empty.	Pass	0.13 ms
test_no_shuffling	Within each unit and split, timestamps are monotonically non-decreasing.	Pass	5.8 ms

Hand-checked numerical examples for every metric quoted on the dashboards (MAE, RMSE, MAPE, ±2-patient accuracy). Anchors model-comparison numbers against ground truth, not against themselves.

Test	What it verifies	Status	Time
test_mae_known	MAE on [10,20,30] vs [12,18,33] equals the hand-computed 2.333.	Pass	0.17 ms
test_rmse_perfect	RMSE = 0 when predictions equal actuals exactly.	Pass	0.15 ms
test_mape_no_zero_div	MAPE returns a finite number when actuals contain a zero (no divide-by-zero).	Pass	0.15 ms
test_within_n_perfect	±2-patient accuracy = 100% on a perfectly matched series.	Pass	0.14 ms
test_within_n_partial	±2-patient accuracy ≈ 66.67% on a known 2-of-3-match series.	Pass	0.14 ms
test_evaluate_model_keys	`evaluate_model` returns exactly {mae, rmse, mape, within_2_patients_pct}.	Pass	0.19 ms

End-to-end smoke tests for the per-unit tabular models — fit, predict, and sanity-check that outputs are usable as a forecast (right shape, never negative).

Test	What it verifies	Status	Time
test_rf_per_unit	Random Forest fits on one unit and produces predictions of the validation shape.	Pass	2,981.9 ms
test_lgbm_per_unit	LightGBM (with early stopping) fits and produces predictions of the validation shape.	Pass	309.1 ms
test_predictions_non_negative	A trained model never predicts a negative census — patient counts are bounded below.	Pass	1,246.9 ms

The ensemble blends model predictions with weights inversely proportional to validation MAPE. These tests pin down the weighting invariants the dashboards rely on.

Test	What it verifies	Status	Time
test_weights_sum_to_one	Inverse-MAPE weights across 3 models sum to 1.0 (within float tolerance).	Pass	0.17 ms
test_weights_positive	All ensemble weights are strictly positive — no model gets zeroed out.	Pass	0.13 ms

Sanity checks on the feature-set composition that every model consumes. Catches regressions in the leakage filter and in cyclical encoding.

Test	What it verifies	Status	Time
test_more_features_at_short_horizon	The H=1 model has strictly more features than the H=72 model (leakage filter is active).	Pass	0.15 ms
test_cyclical_features_present	`sin_hour`, `cos_hour`, `sin_day`, `cos_day` are all in the feature set.	Pass	0.14 ms
test_filter_unit_returns_single_unit	`filter_unit` returns a non-empty DataFrame containing exactly one unit ID.	Pass	8.2 ms

How to reproduce locally

With data/raw/postsql.csv in place (or any export with the same schema), run the full suite:

python -m pytest tests/test_pipeline.py -v

To match what GitHub Actions runs (no data file needed):

python -m pytest tests/test_pipeline.py -m "not requires_data" -v

Test definitions: tests/test_pipeline.py · Workflow: .github/workflows/tests.yml

Test suite

1 · TestDataLoading

2 · TestNoDataLeakage

3 · TestChronologicalSplit

4 · TestMetrics

5 · TestModelTrainPredict

6 · TestEnsemble

7 · TestFeatureColumns