Model Explorer — Veteran Retention Logit

Split

Tier 4.1

Calibration by subgroup

Each point is one subgroup. Horizontal axis = model's mean predicted retention for that group; vertical axis = weighted observed rate. Dots on the 45° line are perfectly calibrated. Gaps above or below are honest miscalibration — disclosed, not corrected.

Dimension

Group mean-pred vs observed 45° line = perfect calibration |gap| ≥ 4 pp flagged

Dimension	Group	n	mean pred	observed	gap

Tier 4.2

Decision curve — net benefit vs threshold

Pick an operating threshold t. The chart shows model net benefit at every threshold between 0.50 and 0.90, plus the treat-all reference (the prevalence ceiling). Wherever the model's curve sits below treat-all, the model adds nothing over selecting everyone; wherever it sits above, selection is beneficial at that threshold.

Threshold 0.70

Model net benefit Treat-all reference Selected threshold

Net benefit is a weighted combination of true-positive rate and false-positive rate at a given threshold, expressed in the same units as prevalence. Under a synthetic target the absolute scale should be read directionally, not as an operational guide.

Tier 4.4

Disparate-impact screen — selection rates & four-fifths rule

Pick a protected dimension and a decision threshold. Bars show the weighted selection rate (share of group predicted above threshold) per subgroup. A subgroup passes the four-fifths rule if its selection rate is at least 80% of the highest-rate group on that dimension.

Dimension Threshold

Passes four-fifths (ratio ≥ 0.80) Fails four-fifths (ratio < 0.80) 80% reference line

Group	selection rate	ratio to max	four-fifths

Reading guide

How to read this page

The three panels are not independent — they describe different faces of the same small-effect-size model. A few interpretive hooks:

Calibration without discrimination is real. The AUC is modest (test AUC ~0.55–0.57), but the calibration scatter hugs the 45° line for most subgroups. The model is well-calibrated even where it cannot rank individuals sharply.
Decision curves translate AUC into action. If the model's net benefit sits below treat-all at your operating threshold, the model adds nothing over selecting everyone in the cohort at that threshold. That is a scope check, not a negative finding.
The four-fifths screen catches disability-band disparities first. This is expected — the model captures a large retention penalty for higher DRAT ratings. The screen is published so any future reuse of the scores carries that knowledge on the record.

Artifacts behind this page are in reports/phase6_v17/ (stratified) and reports/phase6_v17_temporal/ (temporal). Regeneration scripts: scripts/fit_retention_model_v2.py and scripts/validate_phase6_v17.py.