Model cards

Six models across three families: tabular (Random Forest, LightGBM), recurrent (LSTM), and statistical (ARIMA, Prophet), plus an inverse-MAPE weighted ensemble. All trained per nurse unit; tabular and recurrent models additionally trained per forecast horizon. Cards below follow the Mitchell et al. (2019) ML model card convention, adapted for capstone scope.

Live model performance — Tableau Public
Live Tableau Model Performance dashboard
Best model per horizon
99.7%
1h · RandomForest · MAE 0.27
97.8%
2h · LightGBM · MAE 0.50
95.8%
3h · LightGBM · MAE 0.63
94.5%
4h · LightGBM · MAE 0.72
90.5%
12h · LSTM · MAE 0.95
89.0%
24h · LSTM · MAE 0.97
87.5%
48h · LSTM · MAE 1.03
87.6%
72h · LSTM · MAE 1.20
Individual model cards

Random Forest

Tree-based ensemble (bagging) · scikit-learn 1.3+

Non-parametric ensemble of 200 deep decision trees, each trained on a bootstrap sample with random feature subsets. Captures non-linear interactions across the 61-feature input space without requiring scaling or distributional assumptions.

Architecture & hyperparameters

  • 200 trees (n_estimators=200)
  • max_depth=20, min_samples_split=5, min_samples_leaf=2
  • Per-unit, per-horizon — separate model per (unit, horizon)
  • Feature inputs respect horizon-dependent leakage filter

Strengths

  • Handles non-linear feature interactions natively
  • Built-in feature importance for interpretability
  • Robust to outliers and missing values
  • Best in class at the 1-hour horizon (99.7% ±2 accuracy)

Limitations

  • No native uncertainty quantification (point predictions)
  • Memory-heavy at inference (200-tree ensemble per (unit, horizon))
  • Cannot extrapolate beyond training-data range

Validation ±2 patient accuracy

1h2h3h4h12h24h48h72h
99.7%97.4%95.3%93.8%88.1%85.9%82.4%82.1%

Cells highlighted indicate this model is the winner at that horizon.

LightGBM

Gradient-boosted decision trees · lightgbm 4+

Microsoft's leaf-wise gradient boosting with histogram-based splits and early stopping on the validation set. Produces strong tabular predictions at modest compute cost; the workhorse model for short-to-medium horizons in this pipeline.

Architecture & hyperparameters

  • 500 estimators max with early stopping on validation
  • max_depth=8, num_leaves=31, learning_rate=0.05
  • Subsampling: 0.8 (rows), 0.8 (columns)
  • Per-unit, per-horizon

Strengths

  • Best-in-class accuracy on tabular features (94–98% ±2 at 2–4h horizons)
  • Fast training and inference vs. RF for similar accuracy
  • Handles categorical features natively, NaN-safe
  • Built-in feature importance (split-count and gain)

Limitations

  • Hyperparameter sensitive (tuning matters)
  • No native uncertainty quantification
  • Sequential boosting limits parallelism within one tree fit

Validation ±2 patient accuracy

1h2h3h4h12h24h48h72h
99.7%97.8%95.8%94.5%89.6%88.5%86.5%86.4%

Cells highlighted indicate this model is the winner at that horizon.

LSTM

Recurrent neural network (sequence model) · PyTorch 2+

Two-layer stacked Long Short-Term Memory network reading 168-hour (7-day) sequences of feature vectors. Best at long horizons where short-term lag features become unavailable due to leakage filtering and the model must rely on captured temporal structure.

Architecture & hyperparameters

  • 2-layer stacked LSTM, 64 hidden units per layer
  • Dropout 0.2 between layers, dense output head
  • Sequence length 168 hours (7 days), batch size 64
  • Adam optimizer, lr=0.001, MSE loss, early stopping patience 10
  • Per-unit, per-horizon (with per-horizon scaler)

Strengths

  • Best at long horizons (12h–72h: 87–91% ±2 accuracy)
  • Captures long-term temporal patterns missed by tabular models
  • Sequence input naturally encodes recent history

Limitations

  • Computationally expensive (PyTorch + per-horizon scaler)
  • Less interpretable than tree models
  • Requires sufficient history per unit (sparse units underperform)
  • Per-horizon training inflates artifact count

Validation ±2 patient accuracy

1h2h3h4h12h24h48h72h
98.1%95.6%88.9%94.2%90.5%89.0%87.5%87.6%

Cells highlighted indicate this model is the winner at that horizon.

ARIMA / SARIMA

Statistical time series (Box-Jenkins) · pmdarima · statsmodels

Univariate seasonal ARIMA fit per unit via auto-selection over (p,d,q) and seasonal (P,D,Q) orders with daily seasonality. Trained once per unit and forecast at all horizons (8× speedup over per-horizon training).

Architecture & hyperparameters

  • auto_arima search: max_p=5, max_d=2, max_q=5
  • Seasonal period = 24 (daily seasonality)
  • 3-month rolling window per unit (full history exhausts Kalman state)
  • JSON parameter serialization (avoids multi-GB SARIMAX pickles)

Strengths

  • Native confidence intervals from get_forecast()
  • Statistically interpretable (clear order semantics)
  • Robust baseline; ~82% ±2 accuracy across all horizons

Limitations

  • Univariate — ignores ED, surgery, ADT flow features
  • Slow to fit on long histories (Kalman filter overhead)
  • Identical accuracy across horizons (cannot exploit horizon-specific features)

Validation ±2 patient accuracy

1h2h3h4h12h24h48h72h
82.0%82.0%82.0%82.0%82.0%82.1%82.2%82.1%

Cells highlighted indicate this model is the winner at that horizon.

Prophet

Decomposable additive time series · prophet 1.1+

Facebook's piecewise-linear trend model with explicit yearly, weekly, and daily seasonality plus US holiday effects. Trained once per unit; forecasts at horizons by shifting the future dataframe.

Architecture & hyperparameters

  • yearly_seasonality, weekly_seasonality, daily_seasonality all enabled
  • changepoint_prior_scale=0.05 (default)
  • country_holidays='US' for built-in holiday effects
  • Per-unit, train-once

Strengths

  • Robust to missing data and outliers
  • Native uncertainty intervals
  • Explicit, interpretable seasonality components
  • Holiday-aware out of the box

Limitations

  • Univariate (CENSUS only)
  • Doesn't capture short-term shocks driven by ADT flow
  • ~86% ±2 accuracy across horizons — solid baseline but bested by ML models

Validation ±2 patient accuracy

1h2h3h4h12h24h48h72h
86.3%86.3%86.3%86.3%86.3%86.3%86.2%86.1%

Cells highlighted indicate this model is the winner at that horizon.

Ensemble (inverse-MAPE weighted)

Model averaging · Custom

Per-unit, per-horizon weighted average of all available individual model predictions. Weights are computed from validation MAPE — better-performing models get higher weight — then normalized to sum to 1.

Architecture & hyperparameters

  • weight_i = (1 / MAPE_i) normalized across enabled models
  • Per-unit, per-horizon weights stored as JSON
  • Skips models with NaN MAPE on the validation set
  • Requires ≥2 component models per (unit, horizon) to fire

Strengths

  • Hedges against single-model failure modes
  • Smoother performance across horizons (no cliff between RF dominance and LSTM dominance)
  • MAPE-based weighting downweights poor performers automatically

Limitations

  • Weights frozen from validation set — won't adapt to drift
  • Inherits the failure modes of all components
  • Not always the winner: tree models or LSTM often beat the ensemble at their respective horizons

Validation ±2 patient accuracy

1h2h3h4h12h24h48h72h
97.0%94.1%91.6%91.3%87.7%86.6%85.3%85.2%

Cells highlighted indicate this model is the winner at that horizon.

References