Test suite

34 pytest cases covering data integrity, leakage prevention, chronological splits, metric correctness, per-unit model training, ensemble weighting, and feature-column composition. Every push to main runs the data-free subset (23 cases) in GitHub Actions; the full suite runs locally with the gitignored ADT export.

34 / 34
Full suite passing
Last local run: 2026-05-05
23 / 23
CI subset passing
Runs on every push
7
Test classes
Grouped by concern
7.2 s
Full suite runtime
End-to-end on the local box
0
Failures
No skips, no errors
Why these tests, in this order

The suite is organized by failure mode, not by source file. Each class isolates a different way the pipeline could silently produce a wrong answer — leakage from future data into the feature set, accidental shuffling across the temporal split, a metric that divides by zero, a model that predicts a negative census. Tests that need the gitignored 38 MB ADT export (data/raw/postsql.csv) are marked requires_data and skipped in CI; the rest are pure unit tests against config and arithmetic and run on every push.

1 · TestDataLoading

4 cases · requires_data
Confirms the raw ADT export loads with the expected schema and types — catches breakage in upstream SQL exports before any feature work begins.
TestWhat it verifiesStatusTime
test_shapeDataset has > 100,000 rows and at least 60 columns.Pass0.20 ms
test_required_columnsThe datetime, unit, and census columns from the config are all present.Pass0.13 ms
test_datetime_parsedThe datetime column is parsed as datetime64, not left as a string.Pass0.21 ms
test_no_unnamed_columnThe SQL export's index column (Unnamed: 0) was dropped in cleaning.Pass0.13 ms

2 · TestNoDataLeakage

13 cases · pure config — runs in CI
The single most important class in the suite. Forecasting at horizon H must not use any feature whose lag is < H, and must never see any TARGET_* column. Parametrized across all eight horizons.
TestWhat it verifiesStatusTime
test_no_short_lags_for_horizon[1]At H=1, no lag feature with lag < 1 leaks into the feature set.Pass0.14 ms
test_no_short_lags_for_horizon[2]At H=2, no lag feature with lag < 2 leaks into the feature set.Pass0.13 ms
test_no_short_lags_for_horizon[3]At H=3, no lag feature with lag < 3 leaks into the feature set.Pass0.12 ms
test_no_short_lags_for_horizon[4]At H=4, no lag feature with lag < 4 leaks into the feature set.Pass0.13 ms
test_no_short_lags_for_horizon[12]At H=12, no lag feature with lag < 12 leaks into the feature set.Pass0.12 ms
test_no_short_lags_for_horizon[24]At H=24, no lag feature with lag < 24 leaks into the feature set.Pass0.13 ms
test_no_short_lags_for_horizon[48]At H=48, no lag feature with lag < 48 leaks into the feature set.Pass0.11 ms
test_no_short_lags_for_horizon[72]At H=72, no lag feature with lag < 72 leaks into the feature set.Pass0.12 ms
test_no_target_in_features[1]No TARGET_CENSUS_* column appears in the H=1 feature list.Pass0.12 ms
test_no_target_in_features[12]No TARGET_CENSUS_* column appears in the H=12 feature list.Pass0.13 ms
test_no_target_in_features[24]No TARGET_CENSUS_* column appears in the H=24 feature list.Pass0.15 ms
test_no_target_in_features[72]No TARGET_CENSUS_* column appears in the H=72 feature list.Pass0.14 ms
test_no_unit_encoded_in_featuresPer-unit models do not include unit_encoded as a feature.Pass0.13 ms

3 · TestChronologicalSplit

3 cases · requires_data
Forecasting on a shuffled split would invalidate every accuracy number on the site. These tests fail loudly if anyone ever swaps in a random split or a stratified resample.
TestWhat it verifiesStatusTime
test_no_temporal_overlaptrain.max ≤ val.min and val.max ≤ test.min — splits are strictly ordered in time.Pass0.71 ms
test_split_sizesTrain is the largest split; validation and test are both non-empty.Pass0.13 ms
test_no_shufflingWithin each unit and split, timestamps are monotonically non-decreasing.Pass5.8 ms

4 · TestMetrics

6 cases · pure config — runs in CI
Hand-checked numerical examples for every metric quoted on the dashboards (MAE, RMSE, MAPE, ±2-patient accuracy). Anchors model-comparison numbers against ground truth, not against themselves.
TestWhat it verifiesStatusTime
test_mae_knownMAE on [10,20,30] vs [12,18,33] equals the hand-computed 2.333.Pass0.17 ms
test_rmse_perfectRMSE = 0 when predictions equal actuals exactly.Pass0.15 ms
test_mape_no_zero_divMAPE returns a finite number when actuals contain a zero (no divide-by-zero).Pass0.15 ms
test_within_n_perfect±2-patient accuracy = 100% on a perfectly matched series.Pass0.14 ms
test_within_n_partial±2-patient accuracy ≈ 66.67% on a known 2-of-3-match series.Pass0.14 ms
test_evaluate_model_keysevaluate_model returns exactly {mae, rmse, mape, within_2_patients_pct}.Pass0.19 ms

5 · TestModelTrainPredict

3 cases · requires_data
End-to-end smoke tests for the per-unit tabular models — fit, predict, and sanity-check that outputs are usable as a forecast (right shape, never negative).
TestWhat it verifiesStatusTime
test_rf_per_unitRandom Forest fits on one unit and produces predictions of the validation shape.Pass2,981.9 ms
test_lgbm_per_unitLightGBM (with early stopping) fits and produces predictions of the validation shape.Pass309.1 ms
test_predictions_non_negativeA trained model never predicts a negative census — patient counts are bounded below.Pass1,246.9 ms

6 · TestEnsemble

2 cases · pure config — runs in CI
The ensemble blends model predictions with weights inversely proportional to validation MAPE. These tests pin down the weighting invariants the dashboards rely on.
TestWhat it verifiesStatusTime
test_weights_sum_to_oneInverse-MAPE weights across 3 models sum to 1.0 (within float tolerance).Pass0.17 ms
test_weights_positiveAll ensemble weights are strictly positive — no model gets zeroed out.Pass0.13 ms

7 · TestFeatureColumns

3 cases · 2 in CI · 1 requires_data
Sanity checks on the feature-set composition that every model consumes. Catches regressions in the leakage filter and in cyclical encoding.
TestWhat it verifiesStatusTime
test_more_features_at_short_horizonThe H=1 model has strictly more features than the H=72 model (leakage filter is active).Pass0.15 ms
test_cyclical_features_presentsin_hour, cos_hour, sin_day, cos_day are all in the feature set.Pass0.14 ms
test_filter_unit_returns_single_unitfilter_unit returns a non-empty DataFrame containing exactly one unit ID.Pass8.2 ms
How to reproduce locally

With data/raw/postsql.csv in place (or any export with the same schema), run the full suite:

python -m pytest tests/test_pipeline.py -v

To match what GitHub Actions runs (no data file needed):

python -m pytest tests/test_pipeline.py -m "not requires_data" -v

Test definitions: tests/test_pipeline.py · Workflow: .github/workflows/tests.yml