ARMIS — Prediction Analytics

Prediction Performance — 6 Month Track Record

Monthly accuracy by method across all 6,910 resolved factory predictions. Oct 2025 – Mar 2026.

MONTHLY ACCURACY — RANGE CONTAINMENT / MOMENTUM CONTINUATION / OVERALL

Range Containment (solid gold)

Momentum Continuation (solid dim)

Overall Accuracy (dashed)

Analyst Insight

Range containment accuracy has remained above 86% for 5 consecutive months (Oct–Feb), peaking at 92.1% in December. Momentum continuation has degraded from 61.4% (Oct) to 43.2% (Mar) — an 18pp decline over 6 months. Alpha decay is the primary hypothesis: the 7-day and 30-day momentum signals were likely over-fit to the Oct 2025 regime. Under active review for replacement or isotonic recalibration. March 2026 overall (68.2%) reflects n=22 only; treat as preliminary.

Probability Calibration Assessment

A well-calibrated model's stated probability should equal its observed frequency. Deviations indicate systematic bias. Transparency here is a feature, not a flaw — it is the basis for the Platt scaling correction currently in training.

CALIBRATION CURVE — STATED PROBABILITY vs OBSERVED FREQUENCY — 6,910 RESOLVED PREDICTIONS

      Dot size proportional to N. Gold = error <5pp (good). White = error 5–15pp (moderate). Red = error >15pp (critical). Dashed diagonal = perfect calibration. Gray band = ±5pp acceptable zone.
    

Probability Bucket	Stated	Observed Actual	N Predictions	Error	Status
30%	0.300	0.585	644	0.285	CRITICAL
40%	0.400	0.504	276	0.104	MODERATE
60%	0.600	0.470	266	0.130	MODERATE
70%	0.700	0.726	2,067	0.026	GOOD
80%	0.800	0.857	1,582	0.057	CAUTION
90%	0.900	0.928	2,075	0.028	GOOD

Calibration Correction In Progress

The 0.3 probability bucket shows systematic underconfidence: actual observed rate 58.5% vs stated 30.0% — a 28.5pp gap across 644 predictions.

Root cause: GBM model base rates not adjusted for selection bias in low-probability predictions. The momentum model assigns 30% probability to range predictions where the GBM drift is negative, but range containment at 30% stated actually resolves correctly 58.5% of the time. The model is systematically over-penalizing.

Fix: Platt scaling calibration layer being trained on all 6,910 resolved factory predictions. Isotonic regression applied to momentum model output separately to handle non-monotonic calibration in the 0.4–0.6 range.

Expected improvement: Mean calibration error from 10.5% to <3%. The 60% bucket (currently over-confident, actual 47%) will also be corrected. Section updates automatically when calibration layer is deployed.

Prediction Method Performance

Two methods currently in the factory. Range containment is production-grade with 89.4% accuracy. Momentum continuation is under accuracy review following 6-month decline to 52.3%.

Production Grade

Range Containment

Accuracy

89.4%

Brier Score

0.101

Predictions

4,876

Status

PRODUCTION

Coverage

BTC, ETH, SPX, NASDAQ, GOLD, OIL, DXY, VIX, AAPL, MSFT, GOOGL, META, AMZN, NVDA, COPPER, SILVER

Methodology

Predicts whether an asset will stay within a ±N% range over H days. Uses Geometric Brownian Motion model trained on historical volatility regimes. Range widths: ±5%, ±10%, ±15%, ±20%. Horizons: 3, 7, 14, 21 days. Ensemble across 4 volatility regimes.

Performance Range

Best: Dec 2025 — 92.1% | Worst: Jan 2026 — 86.3%

Live Sample Prediction

"Will Bitcoin stay between $59,633 and $72,885 (±10% of $66,259) over the next 3 days?"

P = 79.8%

Under Review

Momentum Continuation

Accuracy

52.3%

Brier Score

0.269

Predictions

2,034

Status

UNDER REVIEW

Coverage

BTC, ETH, SPX, NASDAQ, GOLD, VIX, OIL, DXY

Methodology

Predicts whether an asset's current momentum direction will persist over H days. Uses 7-day and 30-day momentum signals. Positive momentum predicts continuation; negative predicts reversal persistence. Horizons: 7, 14 days.

Accuracy Trend

Degrading: Oct 2025 (61.4%) → Nov (57.2%) → Dec (54.8%) → Jan (48.6%) → Feb (46.1%) → Mar (43.2%)

Deprecation Status

VIX regime persistence (21 predictions) was deprecated at 47.6% accuracy. Momentum continuation is now below the 55% accuracy threshold and faces deprecation by May 2026 unless accuracy recovers. Replacement under evaluation: rolling-window trend following with GARCH volatility conditioning.

Accuracy by Asset / Entity

Sorted by accuracy, descending. All 16 tracked assets. Reference line at 78.5% overall average. Color encodes accuracy tier.

ENTITY ACCURACY — ALL 16 ASSETS — 6,910 RESOLVED PREDICTIONS

>85% (gold tier)

75–85% (standard)

65–75% (watch)

<65% (review)

Analyst Insight

DXY (95.7%), GOLD (90.5%), SPX (89.7%), and COPPER (89.2%) lead on range containment — low intraday volatility relative to predicted range width. Large-cap equities (GOOGL 85.2%, NASDAQ 84.2%, AAPL 83.4%) cluster tightly in the 83–86% band. BTC (70.8%) and ETH (62.8%) show elevated volatility making range containment harder. VIX (57.2%) is the weakest: high-volatility regime assets are structurally harder to range-contain and the momentum model performed poorly here. SILVER (56.0%) is near the deprecation threshold — review scheduled for Q2 2026.

Verified Causal Relationships

Granger causality tests on 1,825 daily data points per pair. Significance threshold: p < 0.05. The ARMIS prediction engine uses these edges for downstream effect modeling and causal inference chains. Click column headers to sort.

Cause → Effect	F-Statistic ↓	p-Value	Lag (days)	Confidence	Interpretation

Granger causality tests whether past values of X improve prediction of Y beyond Y's own history. F-statistic measures the improvement. Significant result (p < 0.05) means a Granger-causal relationship exists — not necessarily physical or economic causation. All tests use 1,825 daily observations, 5-lag VAR specification, heteroskedasticity-robust standard errors.

Key Correlation Relationships (90-Day Rolling, Decay-Weighted)

Exponentially decay-weighted 90-day rolling correlations vs full-history. Divergence column highlights regime shifts. Computed from daily price returns across all tracked assets.

Pair	90d Correlation	Full-History	Divergence	Interpretation
CPI ↔ M2	+0.984	+0.433	+0.551	Money supply driving current inflation cycle
NASDAQ ↔ SPX	+0.968	+0.959	+0.009	Effectively same asset — treat as one position
ETH ↔ SOL	+0.935	+0.713	+0.222	Crypto unity regime — treat as one position (90d)
SPX ↔ VIX	-0.850	-0.754	-0.096	Fear index: extremely stable inverse relationship
NASDAQ ↔ VIX	-0.833	-0.714	-0.119	Same fear-equity mechanism as SPX/VIX
BTC ↔ ETH	+0.876	+0.784	+0.092	Crypto unity — correlated risk-on/risk-off
Credit Spread ↔ VIX	+0.723	+0.612	+0.111	Risk-off unified signal: credit + vol moving together
DXY ↔ Yield 10Y	+0.404	+0.301	+0.103	Rate differential driving dollar strength
Oil ↔ Yield 10Y	+0.385	+0.183	+0.202	Inflation expectations embedded in both
BTC ↔ SPX	+0.381	+0.289	+0.092	Risk asset correlation — moderate, not dominant
Fed Rate ↔ M2	-0.452	-0.078	-0.374	QT regime dominant — quantitative tightening suppressing M2
Oil ↔ VIX	+0.240	-0.106	+0.346	Geopolitical risk premium — regime flip from historical inverse
DXY ↔ Gold	-0.246	-0.381	+0.135	Dollar-gold inverse weakening in current regime
M2 ↔ SPX	+0.267	+0.512	-0.245	M2-equity link weakening as QT compresses money supply
SPX ↔ Gold	+0.142	+0.088	+0.054	Weak positive — both supported by risk appetite currently

Regime Divergence Analysis

When 90d correlation dramatically exceeds full-history correlation, it signals a regime shift. CPI/M2 (90d: +0.984 vs history: +0.433) indicates the current regime is unusually money-supply driven — consistent with post-COVID monetary expansion and QT unwinding. Fed Rate/M2 inversion (90d: -0.452 vs history: -0.078) confirms quantitative tightening is the dominant macro force. Oil/VIX sign flip (+0.240 vs historical -0.106) reflects geopolitical risk premium from Russia-Ukraine conflict elevating both simultaneously.

Live Forward Predictions

132 active predictions generated by the forward engine. All unresolved. Updated daily at 20:25 UTC. Deadline colors: gold = urgent (≤3 days), dim = soon (≤7), faint = far (>7).

Asset	Question	Probability	Method	Deadline	Days Left
Loading predictions...

P > 75%

50–75%

P < 50%

DEADLINE Gold = ≤3 days urgent