ForesightEval: a quality standard for strategic foresight
When AI writes a scenario analysis for your board, how do you know it's any good? ForesightEval is the protocol we built to answer that question — seven measurable dimensions that separate foresight you can stake a decision on from analysis that merely reads well.
The problem
Looking right is not being right
Fluency masks failure
Models produce authoritative prose that reads like strategy. But fluency is a surface property — it tells you nothing about whether the causal reasoning holds up.
Benchmarks test the wrong thing
Existing benchmarks score isolated predictions. Foresight is a different discipline — its value lies in stress-testing strategy against multiple futures, not calculating the probability of one.
Alignment kills honesty
Modern AI models are trained to be helpful. That training teaches them to agree, avoid discomfort, and default to consensus. For risk management, where the entire point is naming uncomfortable truths, this is a structural failure.
Our approach
Three principles, built into every score
Measure what matters
It is simple to score whether a model’s probability estimate was correct. It is hard to score whether a scenario is coherent, whether it surfaces the disruption a board hasn’t considered, or whether it translates into action inside ninety days. ForesightEval does the hard version, because the easy version is not what strategy teams actually need.
Penalize comfort, reward courage
The most dangerous AI foresight is the kind that quietly agrees with the strategy already on the table. ForesightEval explicitly scores whether a model named the uncomfortable scenario, challenged the assumption, or blinked. Analysis that only confirms what leadership already believes does not pass the bar.
Every score, fully decomposable
A quality metric you cannot audit is not a quality metric. Every ForesightEval score breaks down to its seven dimensions, each dimension to its evidence, each piece of evidence to its source. No black box, because trusting a black-box evaluator of AI is the same problem ForesightEval was built to solve.
In practice
Every Future Space carries a ForesightEval score
ForesightEval currently runs as the internal quality layer on every Future Space DSGHT.ai publishes. The score is calculated before release, visible on the analysis page, and decomposable to the per-dimension level — so the quality claim can be audited against the evidence.
This is not yet a cross-model benchmark — that track opens with the first retrospective backtests later in 2026. What follows is the standard DSGHT.ai holds its own production work to, published openly rather than kept internal.
AI-Driven Public Sector 2030
Strategic Anticipation Quotient
8.6/ 10| Dimension | Score |
|---|---|
| Scenario Quality | 9.0 |
| Epistemic Grounding | 10 |
| Unpalatable Truths | 10 |
| Weak Signal Detection | 7.8 |
| Actionability | 9.0 |
| Living Foresight | 7.5 |
| Explainability | 7.0 |
Scored by the DSGHT.ai internal pipeline. Cross-model scoring, human-vs-AI comparison, and retrospective backtests are on the 2026 roadmap.