Prediction Performance — 6 Month Track Record
Monthly accuracy by method across all 6,910 resolved factory predictions. Oct 2025 – Mar 2026.
MONTHLY ACCURACY — RANGE CONTAINMENT / MOMENTUM CONTINUATION / OVERALL
Range Containment (solid gold)
Momentum Continuation (solid dim)
Overall Accuracy (dashed)
Analyst Insight
Range containment accuracy has remained above 86% for 5 consecutive months (Oct–Feb), peaking at 92.1% in December. Momentum continuation has degraded from 61.4% (Oct) to 43.2% (Mar) — an 18pp decline over 6 months. Alpha decay is the primary hypothesis: the 7-day and 30-day momentum signals were likely over-fit to the Oct 2025 regime. Under active review for replacement or isotonic recalibration. March 2026 overall (68.2%) reflects n=22 only; treat as preliminary.
Probability Calibration Assessment
A well-calibrated model's stated probability should equal its observed frequency. Deviations indicate systematic bias. Transparency here is a feature, not a flaw — it is the basis for the Platt scaling correction currently in training.
CALIBRATION CURVE — STATED PROBABILITY vs OBSERVED FREQUENCY — 6,910 RESOLVED PREDICTIONS
Dot size proportional to N. Gold = error <5pp (good). White = error 5–15pp (moderate). Red = error >15pp (critical). Dashed diagonal = perfect calibration. Gray band = ±5pp acceptable zone.
| Probability Bucket | Stated | Observed Actual | N Predictions | Error | Status |
|---|---|---|---|---|---|
| 30% | 0.300 | 0.585 | 644 | 0.285 | CRITICAL |
| 40% | 0.400 | 0.504 | 276 | 0.104 | MODERATE |
| 60% | 0.600 | 0.470 | 266 | 0.130 | MODERATE |
| 70% | 0.700 | 0.726 | 2,067 | 0.026 | GOOD |
| 80% | 0.800 | 0.857 | 1,582 | 0.057 | CAUTION |
| 90% | 0.900 | 0.928 | 2,075 | 0.028 | GOOD |
Calibration Correction In Progress
The 0.3 probability bucket shows systematic underconfidence: actual observed rate 58.5% vs stated 30.0% — a 28.5pp gap across 644 predictions.
Root cause: GBM model base rates not adjusted for selection bias in low-probability predictions. The momentum model assigns 30% probability to range predictions where the GBM drift is negative, but range containment at 30% stated actually resolves correctly 58.5% of the time. The model is systematically over-penalizing.
Fix: Platt scaling calibration layer being trained on all 6,910 resolved factory predictions. Isotonic regression applied to momentum model output separately to handle non-monotonic calibration in the 0.4–0.6 range.
Expected improvement: Mean calibration error from 10.5% to <3%. The 60% bucket (currently over-confident, actual 47%) will also be corrected. Section updates automatically when calibration layer is deployed.
Root cause: GBM model base rates not adjusted for selection bias in low-probability predictions. The momentum model assigns 30% probability to range predictions where the GBM drift is negative, but range containment at 30% stated actually resolves correctly 58.5% of the time. The model is systematically over-penalizing.
Fix: Platt scaling calibration layer being trained on all 6,910 resolved factory predictions. Isotonic regression applied to momentum model output separately to handle non-monotonic calibration in the 0.4–0.6 range.
Expected improvement: Mean calibration error from 10.5% to <3%. The 60% bucket (currently over-confident, actual 47%) will also be corrected. Section updates automatically when calibration layer is deployed.
Prediction Method Performance
Two methods currently in the factory. Range containment is production-grade with 89.4% accuracy. Momentum continuation is under accuracy review following 6-month decline to 52.3%.
Production Grade
Range Containment
Accuracy
89.4%
Brier Score
0.101
Predictions
4,876
Status
PRODUCTION
Coverage
BTC, ETH, SPX, NASDAQ, GOLD, OIL, DXY, VIX, AAPL, MSFT, GOOGL, META, AMZN, NVDA, COPPER, SILVER
Methodology
Predicts whether an asset will stay within a ±N% range over H days. Uses Geometric Brownian Motion model trained on historical volatility regimes. Range widths: ±5%, ±10%, ±15%, ±20%. Horizons: 3, 7, 14, 21 days. Ensemble across 4 volatility regimes.
Performance Range
Best: Dec 2025 — 92.1% | Worst: Jan 2026 — 86.3%
Live Sample Prediction
"Will Bitcoin stay between $59,633 and $72,885 (±10% of $66,259) over the next 3 days?"
P = 79.8%
Under Review
Momentum Continuation
Accuracy
52.3%
Brier Score
0.269
Predictions
2,034
Status
UNDER REVIEW
Coverage
BTC, ETH, SPX, NASDAQ, GOLD, VIX, OIL, DXY
Methodology
Predicts whether an asset's current momentum direction will persist over H days. Uses 7-day and 30-day momentum signals. Positive momentum predicts continuation; negative predicts reversal persistence. Horizons: 7, 14 days.
Accuracy Trend
Degrading: Oct 2025 (61.4%) → Nov (57.2%) → Dec (54.8%) → Jan (48.6%) → Feb (46.1%) → Mar (43.2%)
Deprecation Status
VIX regime persistence (21 predictions) was deprecated at 47.6% accuracy. Momentum continuation is now below the 55% accuracy threshold and faces deprecation by May 2026 unless accuracy recovers. Replacement under evaluation: rolling-window trend following with GARCH volatility conditioning.
Accuracy by Asset / Entity
Sorted by accuracy, descending. All 16 tracked assets. Reference line at 78.5% overall average. Color encodes accuracy tier.
ENTITY ACCURACY — ALL 16 ASSETS — 6,910 RESOLVED PREDICTIONS
>85% (gold tier)
75–85% (standard)
65–75% (watch)
<65% (review)
Analyst Insight
DXY (95.7%), GOLD (90.5%), SPX (89.7%), and COPPER (89.2%) lead on range containment — low intraday volatility relative to predicted range width. Large-cap equities (GOOGL 85.2%, NASDAQ 84.2%, AAPL 83.4%) cluster tightly in the 83–86% band. BTC (70.8%) and ETH (62.8%) show elevated volatility making range containment harder. VIX (57.2%) is the weakest: high-volatility regime assets are structurally harder to range-contain and the momentum model performed poorly here. SILVER (56.0%) is near the deprecation threshold — review scheduled for Q2 2026.
Verified Causal Relationships
Granger causality tests on 1,825 daily data points per pair. Significance threshold: p < 0.05. The ARMIS prediction engine uses these edges for downstream effect modeling and causal inference chains. Click column headers to sort.
| Cause → Effect | F-Statistic ↓ | p-Value | Lag (days) | Confidence | Interpretation |
|---|
Granger causality tests whether past values of X improve prediction of Y beyond Y's own history. F-statistic measures the improvement. Significant result (p < 0.05) means a Granger-causal relationship exists — not necessarily physical or economic causation. All tests use 1,825 daily observations, 5-lag VAR specification, heteroskedasticity-robust standard errors.
Key Correlation Relationships (90-Day Rolling, Decay-Weighted)
Exponentially decay-weighted 90-day rolling correlations vs full-history. Divergence column highlights regime shifts. Computed from daily price returns across all tracked assets.
| Pair | 90d Correlation | Full-History | Divergence | Interpretation |
|---|---|---|---|---|
| CPI ↔ M2 | +0.984 | +0.433 | +0.551 | Money supply driving current inflation cycle |
| NASDAQ ↔ SPX | +0.968 | +0.959 | +0.009 | Effectively same asset — treat as one position |
| ETH ↔ SOL | +0.935 | +0.713 | +0.222 | Crypto unity regime — treat as one position (90d) |
| SPX ↔ VIX | -0.850 | -0.754 | -0.096 | Fear index: extremely stable inverse relationship |
| NASDAQ ↔ VIX | -0.833 | -0.714 | -0.119 | Same fear-equity mechanism as SPX/VIX |
| BTC ↔ ETH | +0.876 | +0.784 | +0.092 | Crypto unity — correlated risk-on/risk-off |
| Credit Spread ↔ VIX | +0.723 | +0.612 | +0.111 | Risk-off unified signal: credit + vol moving together |
| DXY ↔ Yield 10Y | +0.404 | +0.301 | +0.103 | Rate differential driving dollar strength |
| Oil ↔ Yield 10Y | +0.385 | +0.183 | +0.202 | Inflation expectations embedded in both |
| BTC ↔ SPX | +0.381 | +0.289 | +0.092 | Risk asset correlation — moderate, not dominant |
| Fed Rate ↔ M2 | -0.452 | -0.078 | -0.374 | QT regime dominant — quantitative tightening suppressing M2 |
| Oil ↔ VIX | +0.240 | -0.106 | +0.346 | Geopolitical risk premium — regime flip from historical inverse |
| DXY ↔ Gold | -0.246 | -0.381 | +0.135 | Dollar-gold inverse weakening in current regime |
| M2 ↔ SPX | +0.267 | +0.512 | -0.245 | M2-equity link weakening as QT compresses money supply |
| SPX ↔ Gold | +0.142 | +0.088 | +0.054 | Weak positive — both supported by risk appetite currently |
Regime Divergence Analysis
When 90d correlation dramatically exceeds full-history correlation, it signals a regime shift. CPI/M2 (90d: +0.984 vs history: +0.433) indicates the current regime is unusually money-supply driven — consistent with post-COVID monetary expansion and QT unwinding. Fed Rate/M2 inversion (90d: -0.452 vs history: -0.078) confirms quantitative tightening is the dominant macro force. Oil/VIX sign flip (+0.240 vs historical -0.106) reflects geopolitical risk premium from Russia-Ukraine conflict elevating both simultaneously.
Live Forward Predictions
132 active predictions generated by the forward engine. All unresolved. Updated daily at 20:25 UTC. Deadline colors: gold = urgent (≤3 days), dim = soon (≤7), faint = far (>7).
| Asset | Question | Probability | Method | Deadline | Days Left |
|---|---|---|---|---|---|
| Loading predictions... | |||||
P > 75%
50–75%
P < 50%
DEADLINE Gold = ≤3 days urgent