Skip to main content

ForesightEval: a quality standard for strategic foresight

When AI writes a scenario analysis for your board, how do you know it's any good? ForesightEval is the protocol we built to answer that question — seven measurable dimensions that separate foresight you can stake a decision on from analysis that merely reads well.

The problem

Looking right is not being right

Fluency masks failure

Models produce authoritative prose that reads like strategy. But fluency is a surface property — it tells you nothing about whether the causal reasoning holds up.

Benchmarks test the wrong thing

Existing benchmarks score isolated predictions. Foresight is a different discipline — its value lies in stress-testing strategy against multiple futures, not calculating the probability of one.

Alignment kills honesty

Modern AI models are trained to be helpful. That training teaches them to agree, avoid discomfort, and default to consensus. For risk management, where the entire point is naming uncomfortable truths, this is a structural failure.

Our approach

Three principles, built into every score

Measure what matters

It is simple to score whether a model’s probability estimate was correct. It is hard to score whether a scenario is coherent, whether it surfaces the disruption a board hasn’t considered, or whether it translates into action inside ninety days. ForesightEval does the hard version, because the easy version is not what strategy teams actually need.

Penalize comfort, reward courage

The most dangerous AI foresight is the kind that quietly agrees with the strategy already on the table. ForesightEval explicitly scores whether a model named the uncomfortable scenario, challenged the assumption, or blinked. Analysis that only confirms what leadership already believes does not pass the bar.

Every score, fully decomposable

A quality metric you cannot audit is not a quality metric. Every ForesightEval score breaks down to its seven dimensions, each dimension to its evidence, each piece of evidence to its source. No black box, because trusting a black-box evaluator of AI is the same problem ForesightEval was built to solve.

In practice

Every Future Space carries a ForesightEval score

ForesightEval currently runs as the internal quality layer on every Future Space DSGHT.ai publishes. The score is calculated before release, visible on the analysis page, and decomposable to the per-dimension level — so the quality claim can be audited against the evidence.

This is not yet a cross-model benchmark — that track opens with the first retrospective backtests later in 2026. What follows is the standard DSGHT.ai holds its own production work to, published openly rather than kept internal.

AI-Driven Public Sector 2030

CEE · 2030 horizonCompleted April 2026

Strategic Anticipation Quotient

8.6/ 10
DimensionScore
Scenario Quality9.0
Epistemic Grounding10
Unpalatable Truths10
Weak Signal Detection7.8
Actionability9.0
Living Foresight7.5
Explainability7.0

Scored by the DSGHT.ai internal pipeline. Cross-model scoring, human-vs-AI comparison, and retrospective backtests are on the 2026 roadmap.

© 2026 DSGHT.ai All rights reserved.