If you assigned 80% confidence to a decision, how often should it succeed? The answer is 80% of the time. If your 80%-confidence decisions succeed 95% of the time, you are underconfident. If they succeed 55% of the time, you are overconfident. The gap between your assigned confidence and your actual success rate is your calibration score — and it is one of the most powerful predictors of long-term decision quality.
Most executives have never measured it.
What confidence calibration actually measures
Confidence calibration is not about how confident you feel. It measures how accurately your confidence predicts outcomes. A perfectly calibrated decision-maker who says "I'm 70% confident" succeeds 70% of the time across a large sample of such decisions. Their internal probability estimates are honest representations of uncertainty — not anchored to ego, optimism, or social expectation.
Research on calibration in professional contexts consistently shows the same pattern: most professionals are significantly overconfident. Studies of physicians, lawyers, investment managers, and executives find that professionals assign 90% confidence to predictions that are correct roughly 70% of the time. The gap is systematic and stable across years of experience.
Experience, counterintuitively, often makes calibration worse — because experienced professionals develop stronger confidence in their intuitions without necessarily developing better intuitions.
Why calibration matters more than outcomes
Outcomes are noisy. A well-made decision can produce a bad outcome through bad luck. A poorly-made decision can produce a good outcome through good luck. Over a large sample, this noise averages out — but individual outcomes tell you very little about individual decision quality.
Calibration cuts through this noise. A leader with good calibration is demonstrably translating uncertainty into accurate probabilities. They know what they know and know what they don't. This produces consistently better decisions at the portfolio level, even when individual outcomes vary.
How to measure your calibration
Measuring calibration requires three things: a record of decisions with confidence scores assigned at the time of the decision, a record of outcomes, and enough volume to calculate meaningful averages.
The minimum useful sample is around 30–50 decisions in a single category. Above 100 decisions, calibration data becomes highly actionable — precise enough to identify specific categories where confidence is systematically miscalibrated.
What miscalibration looks like in practice
The most common pattern is category-specific overconfidence. An executive might be well-calibrated on operational decisions — they have made hundreds of them and developed accurate intuitions — but significantly overconfident on strategic decisions, which are less frequent and involve more novel uncertainty.
Investment managers often discover they are overconfident on early-stage investments but well-calibrated on follow-on decisions where they have more data. The pattern is predictable once you have the data.
Improving your calibration
Three reliable techniques:
Reference class forecasting
Before assigning confidence, identify the base rate for similar decisions. How often do new product launches in your category succeed? Anchor your confidence to the base rate before adjusting for specific factors.
Pre-mortem analysis
Before finalising your confidence score, spend 10 minutes imagining the decision failed. What went wrong? This surfaces risks that optimism suppresses and often produces more realistic estimates.
Track and review consistently
Calibration only improves with feedback. Without a structured record of predictions and outcomes, the natural human tendency is to remember predictions as closer to outcomes than they actually were. A decision journal with confidence scores is the minimum viable system for building calibration awareness over time.
Start tracking your decisions with Reflect OS
Log decisions in under 60 seconds. Review at 30, 90, and 180 days. See exactly where your judgement is strong — and where it costs you.
Get started — 90-day guarantee