VRSI 2026.1 — A County-Grain Composite
of Veteran Regional Outcomes
Eight federal sources → twenty KPIs → five domains → one composite. Built around partial-publish discipline and proxy-flagging, so coverage ratchets up as schemas stabilize rather than waiting for perfection. Methodology, weighting, and every fragility boundary disclosed on the page.
Why this index exists
Most public reporting on veteran outcomes stops at the state level. The veteran living in a rural Tennessee county does not see the same labor market, the same cost-burden structure, or the same clinical access as the veteran living in Nashville — and yet state-level dashboards fold both into a single number. State-level aggregation hides the places where the outcomes actually live.
The Veteran Regional Stability Index (VRSI) closes that gap. It pulls from eight federal data sources that already publish at the county grain, binds them into a single reproducible composite across five domains of regional outcome, and surfaces the composite at the level where policy actually operates.
VRSI is built on partial-publish discipline: when a federal source breaks schema or rate-limits the pull, the KPI it feeds drops to a disclosed proxy flag rather than falsifying a number or blocking the entire release. The live index publishes ten KPIs today. Day 2.1 widened the Socrata-style column matchers and added a Facilities API fallback, which is expected to lift the next run to thirteen of twenty. The end-of-month target is seventeen of twenty at stable fidelity.
The problem: state-level reporting hides where veterans actually live
Public-facing dashboards publish veteran outcomes in state-sized slabs. The county is where employment programs, housing assistance, medical access, and benefits infrastructure are actually administered.
Federal agencies already publish most of the data needed for a county-grain view. They just publish it in eight different places, on eight different cadences, under eight different schema conventions. Nothing connects them. The consequence is that the veteran-focused analyst who wants to ask a simple county-level question — is the benefits pipeline in this county as healthy as the labor market? — has to build the integration themselves every time, from first principles.
VRSI is that integration. The composite does not invent new data; it disciplines the existing public record. Every input is a public federal series. Every transformation is logged, versioned, and reproducible. The value added is the composite structure, the fragility disclosure, and the partial-publish contract.
Methodology: eight sources, twenty KPIs, five domains, one composite
Each KPI is pulled, min-max scaled against the live universe, and equally weighted within its domain. Domains then combine into the composite at equal 20% weights each.
The five domains
Economic Stability20%
- Unemployment rate — veteransBLS LAUS · county-annual
- Labor-force participation — veteransACS S2101 5-year
- Regional price parity (cost of living)BEA RPP · county-annual
- Per-capita personal incomeBEA CAINC1 · county-annual
Health & Clinical Access20%
- Disability prevalence — veteransACS S2101 / B21100
- VA facility densityVA Facilities API (authenticated)
- Primary-care HPSA scoreHRSA Data Warehouse
- Mental-health HPSA scoreHRSA Data Warehouse
Education & Human Capital20%
- Veteran educational attainment (bachelor+)ACS S2101
- Education-to-income ratioACS × BEA derived
- GI Bill beneficiary densityVA NCVAS county tables
- Approved GI Bill institution densityVA GI Bill Comparison Tool
Benefits Access20%
- VA compensation recipient densityVA NCVAS county · proxy-flagged
- VA pension recipient densityVA NCVAS county · proxy-flagged
- DOL VETS claim closure rateDOL VETS state-to-county broadcast
- VR&E participant densityVA NCVAS · proxy-flagged
Community Infrastructure20%
- Veteran non-profit density per 1k vetsIRS Exempt Organizations · NTEE-filtered
- Cost-burdened veteran householdsHUD CHAS 5-year
- Fair Market Rent vs. county incomeHUD FMR · annual
- USPS vacancy rateHUD USPS · quarterly
How the composite is built
- 1. Scale — min-max each KPI on the live universe, flip sign where lower-is-better
- 2. Weight within domain — equal 25% to each of the four KPIs
- 3. Weight across domains — equal 20% to each of the five
- 4. Publish — emit composite + per-domain subscores + fragility flags + manifest
Unequal weights imply a prior about which domain matters most. For a first-release composite with a 21-county seed universe, that prior is not earned. Equal weighting is the most defensible null — users who disagree can reweight downstream from the per-domain subscores, which are exposed in the release manifest.
Composite results — partial-publish discipline, not vaporware
The live index covers half of its twenty KPIs at full fidelity today. The coverage ladder is the release plan, and it is visible.
Floor coverage
Partial-publish flag TRUE. Ten KPIs at stable fidelity; ten KPIs proxy-flagged with disclosed fallback.
Expected lift
Day 2.1 widened Socrata column matchers (+2) and rerouted VA Facilities to the authenticated API (+1).
End-of-sprint
HRSA HPSA integration, IRS EO refresh, and VA NCVAS proxy uplift close the remaining gaps.
The partial-publish contract
A composite that waits until all twenty KPIs are perfect never ships. One that publishes fabricated values to hit a coverage number is worse — it launders data quality into a false headline. VRSI splits the difference. Each KPI either resolves cleanly from its federal source, or it emits a disclosed proxy and a fragility flag. The release manifest carries the live coverage ratio, a per-KPI status vector, and a human-readable rationale for every proxy in play. A downstream consumer can choose to include or exclude proxy-flagged KPIs from their own view.
The suite covers source parsers, schema-drift guards, column-name widening, state-to-county broadcast math, min-max scaling, weighted aggregation, manifest shape, proxy flag propagation, and regression fixtures for every live KPI. It is CI-gated — no release proceeds on a red suite.
Per-domain signal — where the variance actually lives
Each domain carries its own geography and its own fragility envelope. The composite is only as useful as the domains that feed it, so they are reported individually before they are combined.
Economic Stability
The four economic KPIs track demand-side (unemployment, LFP) and supply-side (RPP, per-capita income) conditions. Seed-universe variance is dominated by the urban-rural gradient: metropolitan counties carry higher RPP and higher per-capita income but similar LFP, while non-metro counties carry lower unemployment rates with substantially lower income — a common sign of labor-force detachment rather than labor-market tightness. A veteran-specific LFP series (ACS S2101) separates genuine tightness from attrition.
Health & Clinical Access
Disability prevalence correlates only loosely with economic strength. What drives the Health subscore in the seed universe is clinical access — VA facility density and HPSA scores — not the underlying disability rate. The Day 1.9c reroute of VA facility data from the 404-ing Socrata CSV to the authenticated VA Facilities API (api.va.gov/services/va_facilities/v1/facilities, paginated JSON envelope, lowercase apikey header) restored full-fidelity county coverage on this KPI without relaxing the test suite.
Education & Human Capital
GI Bill density and approved-institution density carry the heaviest educational signal because attainment alone is sensitive to the underlying cohort’s age distribution. Counties near large approved institutions show predictably higher GI Bill beneficiary density; the signal is geographic more than it is behavioral, and the composite treats it that way.
Benefits Access
This is the single most fragile domain. Three of the four KPIs depend on VA NCVAS county-level tables, which update on a once-a-year cadence and are historically the most prone to format drift on the source side. Day 2.1 widened _BENEFITS_OPTIONAL_COLUMNS to 19/19/15 Socrata-style matchers per KPI and added state-column detection. Expected next-run outcome: two Benefits KPIs move off proxy. DOL VETS is the only non-VA series here and drives a stabilizing effect on the domain subscore.
Community Infrastructure
The Community domain is carried by HUD: CHAS cost-burden shares, Fair Market Rent ratios, and USPS vacancy rates. IRS Exempt Organizations contributes the veteran-non-profit-density KPI, NTEE-filtered. The HUD series are unusually clean at the county grain, which is part of why Community ended up with the highest live coverage of any domain at Day 2.0.
Cross-domain insights — what the composite is telling us
Patterns visible once the five domains are aligned to a single geographic grain.
| Cross-domain pattern | Direction |
|---|---|
| Economic Stability ↔ Community Infrastructure | Strongly positive |
| Economic Stability ↔ Health & Clinical Access | Weak / noisy |
| Education & Human Capital → Benefits Access | Positive |
| Health & Clinical Access ↔ Benefits Access | Non-monotonic by county type |
Economic strength and community infrastructure move together cleanly — counties with lower veteran unemployment and higher per-capita income also tend to have lower cost-burden shares and lower vacancy. That is not surprising, but it is rare to see it at the county grain.
Economic strength does not predict clinical access. The HPSA and VA-facility signals respond to federal infrastructure decisions, not to local labor markets. A wealthy county can still be a mental-health desert, and the composite does not blur that distinction.
PCA on the normalized KPI matrix places the first principal component at roughly 40% of explained variance, loading most heavily on the Economic and Community domains. The second component separates counties on the Health / Benefits axis. That two-axis structure shows up again in k-means clustering with k = 4: Urban Benefited (high on both axes), Rural Resilient (moderate-economic, strong-community), Transition Hubs (high-education, mixed-benefits), and At-Risk (low across the board). These are the four archetypes that emerged in Day 2.1 unsupervised analysis.
Modeling insights Unsupervised — descriptive
PCA and k-means run on the twenty-KPI matrix after imputation of proxy-flagged cells. The intent is exploratory, not predictive.
The PCA biplot separates the seed counties along two meaningful axes. PC1 (~40% variance) is a general “regional prosperity” axis — labor market + cost of living + community infrastructure load together. PC2 (~18% variance) captures the health-and-benefits axis — counties with strong VA facility access and active compensation pipelines push in one direction, counties with high HPSA scores and thin benefits throughput push in the other.
K-means at k = 4 produces four archetypes with face-valid county membership in the seed universe:
| Archetype | PC1 loading | PC2 loading | Characterization |
|---|---|---|---|
| Urban Benefited | High | High | Strong on both axes. Metropolitan, high VA density, stable labor. |
| Rural Resilient | Medium | Medium–High | Moderate economics, strong community infrastructure, active non-profits. |
| Transition Hubs | Medium–High | Medium | High education KPIs, benefits access lagging expectations. |
| At-Risk | Low | Low | Weak across all five domains. Candidate for policy attention. |
Unsupervised structure is not a causal claim. The archetypes describe where counties sit in the seed-universe KPI space at a single point in time; they do not predict outcomes, and they do not support individual-level inference. Any claim that “being an At-Risk county causes outcome X” is outside what VRSI can defend.
Risks and fragility — disclosed up front
Every composite is fragile somewhere. VRSI publishes where.
The Benefits Access domain is the most fragile, and it is fragile in a specific, reproducible way: three of its four KPIs depend on VA NCVAS county-level releases that historically ship with inconsistent column headers between years. The Day 2.1 fix widened the Socrata-style column matchers to 19/19/15 candidate needles per KPI and added a state-column fallback that lets the loader detect county-level data even when the column naming changes. That reduces the failure mode from “pipeline breaks” to “KPI drops to proxy flag.”
External API fragility is the second risk class. The Day 1.9c VA Facilities reroute caught a real failure: the prior Socrata CSV endpoint started returning 404, which would have silently zeroed the facility-density KPI. The fallback is the authenticated VA Facilities JSON API with key rotation, paginated envelopes, and a format-dispatch pattern that lets the loader accept either Socrata-style or API-style responses without code duplication.
HRSA HPSA scores and IRS Exempt Organizations refreshes sit on quarterly update cadences; the pipeline respects their cadence and does not attempt to synthesize intra-quarter values. USPS vacancy and HUD FMR are the most reliable series in the stack — they have not required any schema repair across the build log.
Recommendations — three tiers
Policy readers, data-investment readers, and pipeline-maturation readers each get a distinct actionable ask.
Tier 1 — Policy
County-grain visibility is the lever that unlocks targeted action. VA and DOL VETS pipelines administered at the state level mask within-state heterogeneity by design. The At-Risk archetype identified in Chapter Six is the operational target for resource reallocation — these counties show low subscores across all five domains simultaneously, which is the signature that state-level averages smooth away.
Tier 2 — Data investment
The single highest-leverage investment is the VA NCVAS county-level release cadence and schema stability. Every proxy flag in the Benefits domain resolves the moment that release is consistent year-over-year. A secondary investment is HRSA HPSA primary-care and mental-health score integration with the veteran-specific population denominator rather than the total-population denominator currently exposed.
Tier 3 — Pipeline maturation
The Day 2.1 schema-widening pattern generalizes. Any federal source that ships semi-structured CSVs with unstable column headers (NCVAS, some BLS LAUS partitions, parts of the HUD CHAS release) benefits from a Socrata-style matcher with disclosed candidate-column lists. The same pattern is a candidate for open-sourcing as a standalone utility package.
Roadmap
Day 1.9c is done. Day 2.2 is next. Day 2.3 through Day 2.8 are queued.
Done
- Day 1.9c — VA Facilities reroute to authenticated API
- Day 2.0 — Partial-publish floor at 10/20
- Day 2.1 — Socrata column-matcher widening + state-column detection
Next
- Day 2.2 — Live rerun with 2.1 widening — target 13/20
- Day 2.3 — HRSA HPSA integration
- Day 2.4 — Tableau embed prep · website iframe
Queued
- Day 2.5 — IRS EO refresh, NTEE-filter hardening
- Day 2.6 — VA NCVAS proxy uplift · year-over-year check
- Day 2.7 — Full-universe scale (3,143 counties)
- Day 2.8 — Coverage target 17/20, v2026.1 release candidate
Download and further reading
-
VRSI 2026.1 Executive DeckSixteen-slide executive walkthrough — problem, methodology, composite results, per-domain deep dives, modeling insights, roadmap, and full twenty-KPI appendix dictionary.Download .pptx
-
Workforce Stability case studyThe first flagship analytics case study — ACS PUMS-based, PWGTP-weighted, with an illustrative retention model.Read Case Study
-
Investigations indexAll active case studies, drafts, and next-up items.Back to Investigations