Test suite
34 pytest cases covering data integrity, leakage prevention, chronological
splits, metric correctness, per-unit model training, ensemble weighting, and
feature-column composition. Every push to main runs the
data-free subset (23 cases) in GitHub Actions; the full suite runs
locally with the gitignored ADT export.
34 / 34
Full suite passing
Last local run: 2026-05-05
23 / 23
CI subset passing
Runs on every push
7
Test classes
Grouped by concern
7.2 s
Full suite runtime
End-to-end on the local box
0
Failures
No skips, no errors
Why these tests, in this order
The suite is organized by failure mode, not by source file. Each class isolates
a different way the pipeline could silently produce a wrong answer — leakage from
future data into the feature set, accidental shuffling across the temporal split,
a metric that divides by zero, a model that predicts a negative census. Tests
that need the gitignored 38 MB ADT export (data/raw/postsql.csv)
are marked requires_data and skipped in CI; the rest are pure unit
tests against config and arithmetic and run on every push.
Confirms the raw ADT export loads with the expected schema and types — catches breakage in upstream SQL exports before any feature work begins.
| Test | What it verifies | Status | Time |
| test_shape | Dataset has > 100,000 rows and at least 60 columns. | Pass | 0.20 ms |
| test_required_columns | The datetime, unit, and census columns from the config are all present. | Pass | 0.13 ms |
| test_datetime_parsed | The datetime column is parsed as datetime64, not left as a string. | Pass | 0.21 ms |
| test_no_unnamed_column | The SQL export's index column (Unnamed: 0) was dropped in cleaning. | Pass | 0.13 ms |
The single most important class in the suite. Forecasting at horizon H must not use any feature whose lag is < H, and must never see any TARGET_* column. Parametrized across all eight horizons.
| Test | What it verifies | Status | Time |
| test_no_short_lags_for_horizon[1] | At H=1, no lag feature with lag < 1 leaks into the feature set. | Pass | 0.14 ms |
| test_no_short_lags_for_horizon[2] | At H=2, no lag feature with lag < 2 leaks into the feature set. | Pass | 0.13 ms |
| test_no_short_lags_for_horizon[3] | At H=3, no lag feature with lag < 3 leaks into the feature set. | Pass | 0.12 ms |
| test_no_short_lags_for_horizon[4] | At H=4, no lag feature with lag < 4 leaks into the feature set. | Pass | 0.13 ms |
| test_no_short_lags_for_horizon[12] | At H=12, no lag feature with lag < 12 leaks into the feature set. | Pass | 0.12 ms |
| test_no_short_lags_for_horizon[24] | At H=24, no lag feature with lag < 24 leaks into the feature set. | Pass | 0.13 ms |
| test_no_short_lags_for_horizon[48] | At H=48, no lag feature with lag < 48 leaks into the feature set. | Pass | 0.11 ms |
| test_no_short_lags_for_horizon[72] | At H=72, no lag feature with lag < 72 leaks into the feature set. | Pass | 0.12 ms |
| test_no_target_in_features[1] | No TARGET_CENSUS_* column appears in the H=1 feature list. | Pass | 0.12 ms |
| test_no_target_in_features[12] | No TARGET_CENSUS_* column appears in the H=12 feature list. | Pass | 0.13 ms |
| test_no_target_in_features[24] | No TARGET_CENSUS_* column appears in the H=24 feature list. | Pass | 0.15 ms |
| test_no_target_in_features[72] | No TARGET_CENSUS_* column appears in the H=72 feature list. | Pass | 0.14 ms |
| test_no_unit_encoded_in_features | Per-unit models do not include unit_encoded as a feature. | Pass | 0.13 ms |
Forecasting on a shuffled split would invalidate every accuracy number on the site. These tests fail loudly if anyone ever swaps in a random split or a stratified resample.
| Test | What it verifies | Status | Time |
| test_no_temporal_overlap | train.max ≤ val.min and val.max ≤ test.min — splits are strictly ordered in time. | Pass | 0.71 ms |
| test_split_sizes | Train is the largest split; validation and test are both non-empty. | Pass | 0.13 ms |
| test_no_shuffling | Within each unit and split, timestamps are monotonically non-decreasing. | Pass | 5.8 ms |
Hand-checked numerical examples for every metric quoted on the dashboards (MAE, RMSE, MAPE, ±2-patient accuracy). Anchors model-comparison numbers against ground truth, not against themselves.
| Test | What it verifies | Status | Time |
| test_mae_known | MAE on [10,20,30] vs [12,18,33] equals the hand-computed 2.333. | Pass | 0.17 ms |
| test_rmse_perfect | RMSE = 0 when predictions equal actuals exactly. | Pass | 0.15 ms |
| test_mape_no_zero_div | MAPE returns a finite number when actuals contain a zero (no divide-by-zero). | Pass | 0.15 ms |
| test_within_n_perfect | ±2-patient accuracy = 100% on a perfectly matched series. | Pass | 0.14 ms |
| test_within_n_partial | ±2-patient accuracy ≈ 66.67% on a known 2-of-3-match series. | Pass | 0.14 ms |
| test_evaluate_model_keys | evaluate_model returns exactly {mae, rmse, mape, within_2_patients_pct}. | Pass | 0.19 ms |
End-to-end smoke tests for the per-unit tabular models — fit, predict, and sanity-check that outputs are usable as a forecast (right shape, never negative).
| Test | What it verifies | Status | Time |
| test_rf_per_unit | Random Forest fits on one unit and produces predictions of the validation shape. | Pass | 2,981.9 ms |
| test_lgbm_per_unit | LightGBM (with early stopping) fits and produces predictions of the validation shape. | Pass | 309.1 ms |
| test_predictions_non_negative | A trained model never predicts a negative census — patient counts are bounded below. | Pass | 1,246.9 ms |
The ensemble blends model predictions with weights inversely proportional to validation MAPE. These tests pin down the weighting invariants the dashboards rely on.
| Test | What it verifies | Status | Time |
| test_weights_sum_to_one | Inverse-MAPE weights across 3 models sum to 1.0 (within float tolerance). | Pass | 0.17 ms |
| test_weights_positive | All ensemble weights are strictly positive — no model gets zeroed out. | Pass | 0.13 ms |
Sanity checks on the feature-set composition that every model consumes. Catches regressions in the leakage filter and in cyclical encoding.
| Test | What it verifies | Status | Time |
| test_more_features_at_short_horizon | The H=1 model has strictly more features than the H=72 model (leakage filter is active). | Pass | 0.15 ms |
| test_cyclical_features_present | sin_hour, cos_hour, sin_day, cos_day are all in the feature set. | Pass | 0.14 ms |
| test_filter_unit_returns_single_unit | filter_unit returns a non-empty DataFrame containing exactly one unit ID. | Pass | 8.2 ms |
How to reproduce locally
With data/raw/postsql.csv in place (or any export with the same
schema), run the full suite:
python -m pytest tests/test_pipeline.py -v
To match what GitHub Actions runs (no data file needed):
python -m pytest tests/test_pipeline.py -m "not requires_data" -v
Test definitions: tests/test_pipeline.py
· Workflow: .github/workflows/tests.yml