Model cards
Six models across three families: tabular (Random Forest, LightGBM), recurrent (LSTM),
and statistical (ARIMA, Prophet), plus an inverse-MAPE weighted ensemble. All trained
per nurse unit; tabular and recurrent models additionally trained per forecast horizon.
Cards below follow the Mitchell et al. (2019) ML model card convention, adapted for capstone scope.
Individual model cards
Non-parametric ensemble of 200 deep decision trees, each trained on a bootstrap sample with random feature subsets. Captures non-linear interactions across the 61-feature input space without requiring scaling or distributional assumptions.
Architecture & hyperparameters
- 200 trees (n_estimators=200)
- max_depth=20, min_samples_split=5, min_samples_leaf=2
- Per-unit, per-horizon — separate model per (unit, horizon)
- Feature inputs respect horizon-dependent leakage filter
Strengths
- Handles non-linear feature interactions natively
- Built-in feature importance for interpretability
- Robust to outliers and missing values
- Best in class at the 1-hour horizon (99.7% ±2 accuracy)
Limitations
- No native uncertainty quantification (point predictions)
- Memory-heavy at inference (200-tree ensemble per (unit, horizon))
- Cannot extrapolate beyond training-data range
Validation ±2 patient accuracy
| 1h | 2h | 3h | 4h | 12h | 24h | 48h | 72h |
| 99.7% | 97.4% | 95.3% | 93.8% | 88.1% | 85.9% | 82.4% | 82.1% |
Cells highlighted indicate this model is the winner at that horizon.
Microsoft's leaf-wise gradient boosting with histogram-based splits and early stopping on the validation set. Produces strong tabular predictions at modest compute cost; the workhorse model for short-to-medium horizons in this pipeline.
Architecture & hyperparameters
- 500 estimators max with early stopping on validation
- max_depth=8, num_leaves=31, learning_rate=0.05
- Subsampling: 0.8 (rows), 0.8 (columns)
- Per-unit, per-horizon
Strengths
- Best-in-class accuracy on tabular features (94–98% ±2 at 2–4h horizons)
- Fast training and inference vs. RF for similar accuracy
- Handles categorical features natively, NaN-safe
- Built-in feature importance (split-count and gain)
Limitations
- Hyperparameter sensitive (tuning matters)
- No native uncertainty quantification
- Sequential boosting limits parallelism within one tree fit
Validation ±2 patient accuracy
| 1h | 2h | 3h | 4h | 12h | 24h | 48h | 72h |
| 99.7% | 97.8% | 95.8% | 94.5% | 89.6% | 88.5% | 86.5% | 86.4% |
Cells highlighted indicate this model is the winner at that horizon.
Two-layer stacked Long Short-Term Memory network reading 168-hour (7-day) sequences of feature vectors. Best at long horizons where short-term lag features become unavailable due to leakage filtering and the model must rely on captured temporal structure.
Architecture & hyperparameters
- 2-layer stacked LSTM, 64 hidden units per layer
- Dropout 0.2 between layers, dense output head
- Sequence length 168 hours (7 days), batch size 64
- Adam optimizer, lr=0.001, MSE loss, early stopping patience 10
- Per-unit, per-horizon (with per-horizon scaler)
Strengths
- Best at long horizons (12h–72h: 87–91% ±2 accuracy)
- Captures long-term temporal patterns missed by tabular models
- Sequence input naturally encodes recent history
Limitations
- Computationally expensive (PyTorch + per-horizon scaler)
- Less interpretable than tree models
- Requires sufficient history per unit (sparse units underperform)
- Per-horizon training inflates artifact count
Validation ±2 patient accuracy
| 1h | 2h | 3h | 4h | 12h | 24h | 48h | 72h |
| 98.1% | 95.6% | 88.9% | 94.2% | 90.5% | 89.0% | 87.5% | 87.6% |
Cells highlighted indicate this model is the winner at that horizon.
Univariate seasonal ARIMA fit per unit via auto-selection over (p,d,q) and seasonal (P,D,Q) orders with daily seasonality. Trained once per unit and forecast at all horizons (8× speedup over per-horizon training).
Architecture & hyperparameters
- auto_arima search: max_p=5, max_d=2, max_q=5
- Seasonal period = 24 (daily seasonality)
- 3-month rolling window per unit (full history exhausts Kalman state)
- JSON parameter serialization (avoids multi-GB SARIMAX pickles)
Strengths
- Native confidence intervals from get_forecast()
- Statistically interpretable (clear order semantics)
- Robust baseline; ~82% ±2 accuracy across all horizons
Limitations
- Univariate — ignores ED, surgery, ADT flow features
- Slow to fit on long histories (Kalman filter overhead)
- Identical accuracy across horizons (cannot exploit horizon-specific features)
Validation ±2 patient accuracy
| 1h | 2h | 3h | 4h | 12h | 24h | 48h | 72h |
| 82.0% | 82.0% | 82.0% | 82.0% | 82.0% | 82.1% | 82.2% | 82.1% |
Cells highlighted indicate this model is the winner at that horizon.
Facebook's piecewise-linear trend model with explicit yearly, weekly, and daily seasonality plus US holiday effects. Trained once per unit; forecasts at horizons by shifting the future dataframe.
Architecture & hyperparameters
- yearly_seasonality, weekly_seasonality, daily_seasonality all enabled
- changepoint_prior_scale=0.05 (default)
- country_holidays='US' for built-in holiday effects
- Per-unit, train-once
Strengths
- Robust to missing data and outliers
- Native uncertainty intervals
- Explicit, interpretable seasonality components
- Holiday-aware out of the box
Limitations
- Univariate (CENSUS only)
- Doesn't capture short-term shocks driven by ADT flow
- ~86% ±2 accuracy across horizons — solid baseline but bested by ML models
Validation ±2 patient accuracy
| 1h | 2h | 3h | 4h | 12h | 24h | 48h | 72h |
| 86.3% | 86.3% | 86.3% | 86.3% | 86.3% | 86.3% | 86.2% | 86.1% |
Cells highlighted indicate this model is the winner at that horizon.
Per-unit, per-horizon weighted average of all available individual model predictions. Weights are computed from validation MAPE — better-performing models get higher weight — then normalized to sum to 1.
Architecture & hyperparameters
- weight_i = (1 / MAPE_i) normalized across enabled models
- Per-unit, per-horizon weights stored as JSON
- Skips models with NaN MAPE on the validation set
- Requires ≥2 component models per (unit, horizon) to fire
Strengths
- Hedges against single-model failure modes
- Smoother performance across horizons (no cliff between RF dominance and LSTM dominance)
- MAPE-based weighting downweights poor performers automatically
Limitations
- Weights frozen from validation set — won't adapt to drift
- Inherits the failure modes of all components
- Not always the winner: tree models or LSTM often beat the ensemble at their respective horizons
Validation ±2 patient accuracy
| 1h | 2h | 3h | 4h | 12h | 24h | 48h | 72h |
| 97.0% | 94.1% | 91.6% | 91.3% | 87.7% | 86.6% | 85.3% | 85.2% |
Cells highlighted indicate this model is the winner at that horizon.