Machine Learning Pipeline
Three base learners, stacked meta-learning, isotonic calibration, and online monitoring. Production-grade ML for prediction market probability estimation.
Three Learners, One Calibrated Signal
LightGBM, XGBoost, and Logistic Regression each see the same features but learn different decision boundaries. A Ridge stacking meta-learner combines their calibrated outputs. Isotonic regression enforces monotonic probability mapping on every base model.
lgb_params = {
"objective": "binary",
"n_estimators": 500,
"learning_rate": 0.03,
"max_depth": 4,
"num_leaves": 15,
"min_child_samples": 30,
"reg_alpha": 0.5, # L1
"reg_lambda": 1.0, # L2
}
# XGBoost — mirrors LGB regularization
xgb_params = {
"objective": "binary:logistic",
"n_estimators": 500,
"learning_rate": 0.03,
"max_depth": 4,
"min_child_weight": 5,
"gamma": 0.1,
}
# Isotonic calibration on held-out cal set
cal = IsotonicRegression(
y_min=0.001, y_max=0.999,
out_of_bounds="clip"
)
48 Engineered Features Across 7 Families
Each market snapshot is transformed into a rich feature vector spanning microstructure, momentum, volume, cross-market, external signals, time encoding, and market quality. Tree models handle NaN natively; the Logistic path uses learned median imputation.
Features are computed by the FeatureEngineering class with rolling
windows at 5-minute, 15-minute, and 1-hour horizons. Cyclical time features
use sin/cos encoding. Stability selection and mRMR pruning remove noisy inputs.
- NaN-safe: tree models learn optimal split directions for missing data
- Winsorized preprocessing prevents outlier distortion
- Feature hash guard detects schema misalignment on model load
- Bayesian hyperopt via Optuna for automated tuning
Momentum 6
- return_5m
- return_15m
- return_1h
- volatility_1h
- price_vs_twap
- price_acceleration
Volume 4
- volume_24h_log
- volume_zscore
- volume_rate_of_change
- open_interest_log
Order Flow 7
- spread_cents
- spread_pct
- spread_velocity
- bid_depth_log
- ask_depth_log
- imbalance
- microprice_vs_mid
Cross-Market 3
- polymarket_spread
- polymarket_spread_zscore
- polymarket_momentum
Time 7
- hour_of_day_sin / cos
- day_of_week_sin / cos
- is_weekend
- market_age_days
- time_to_resolution_days
Sentiment 4
- news_sentiment
- expert_forecast
- expert_confidence
- signal_agreement
Microstructure 5
- price_bucket
- distance_from_50
- is_extreme_price
- efficiency_score
- liquidity_score
# [train | embargo | cal | embargo | stack]
class PurgedKFold:
"""de Prado (2018) Ch. 7"""
n_splits = 5
embargo_pct = 0.01 # 1%
class WalkForwardCV:
"""Expanding-window forward test"""
n_splits = 5
min_train_pct = 0.3
embargo_pct = 0.01
# Ticker-group purging: if a ticker
# appears in test fold, ALL its samples
# are removed from training fold
# Preprocessor fitted INSIDE each fold
# (prevents winsor bounds leakage)
No Lookahead, No Leakage
Standard k-fold cross-validation gives inflated metrics on financial data due to autocorrelation and same-ticker label leakage. The pipeline uses Purged K-Fold with 1% embargo periods at every boundary, following de Prado (2018).
Walk-Forward CV simulates actual deployment: train on the past, predict the future, advance the window, repeat. This is the gold standard for evaluating prediction market models -- the only evaluation that matches production conditions.
- Embargo periods prevent autocorrelation leakage
- Ticker-group purging removes all samples of test tickers from train
- Preprocessor fitted inside each fold -- no test data leakage
- Promotion gating: model must beat baseline Brier on held-out 15%
Continuous Brier Monitoring
Every resolved market outcome is fed back to the OnlineEnsemble.
A rolling window of the last 100 predictions tracks Brier score in real time.
When the score degrades past 0.30, the system auto-retrains on the full Parquet
history -- not just the in-memory buffer.
The feature hash guard prevents silent misalignment: a SHA-256 hash of the feature schema is saved alongside every model. On load, the hash is compared against the live codebase. If features have been added or removed since training, a warning fires and retrain is recommended.
- Rolling Brier on last 50 observations triggers retrain
- PSI per-feature drift detection (critical at 0.25+)
- Concept drift: base-rate shift + calibration degradation
- Hyperopt-derived config preserved across retrains
def retrain_if_needed(self,
min_samples=200,
brier_threshold=0.30
):
recent = mean(brier[-50:])
if recent > brier_threshold:
return True # trigger
# Feature hash guard on model load
saved_hash = meta["feature_hash"]
live_hash = sha256(FeatureEngineering
.FEATURE_NAMES)
if live_hash != saved_hash:
warn("feature_hash_mismatch")
# PSI drift thresholds
# < 0.10 → ok
# 0.10-0.25 → investigate
# >= 0.25 → retrain
Four Market Regimes, Automatic Adjustment
A 5-component efficiency score (spread, volume, age, cross-market, news) classifies each market into a regime. Kelly fraction, minimum edge threshold, and execution urgency are all adjusted per-regime. The classifier uses weighted thresholds at 0.25, 0.50, and 0.75 boundaries.
Highly Efficient
Efficient
Inefficient
Highly Inefficient
spread_weight = 0.30
volume_weight = 0.25
age_weight = 0.15
cross_market_weight = 0.15
news_weight = 0.15
# Regime boundaries
highly_efficient = 0.75
efficient = 0.50
inefficient = 0.25
ML Feeds the Bayesian Estimator
The ensemble's calibrated probability is not used in isolation. It flows into
the FairValueEstimator alongside cross-market prices, polling data,
expert forecasts, and historical base rates. All signals are combined via
weighted Bayesian updating in log-odds space.
Satopaa et al. (2014) extremization pushes the aggregated probability away from
50% with a tunable parameter d=1.3, correcting for the known
underconfidence of averaged forecasts. Calibration-curve adjustments then correct
for any remaining systematic bias in specific probability buckets.
- Log-odds aggregation for numerical stability
- Satopaa extremization (d=1.3) corrects underconfidence
- Dynamic prior weight shrinks as signal count grows
- Bucket-level calibration bias correction
def extremize(p, d=1.3):
p_d = p ** d
q_d = (1 - p) ** d
return p_d / (p_d + q_d)
# Bayesian estimator config
FairValueEstimator(
prior_weight = 0.3,
extremization_d = 1.3,
consensus_bonus = 0.1,
prior_weight_min = 0.1,
prior_weight_max = 0.5,
)
# Signal sources combined in
# log-odds space:
# ML ensemble probability
# Cross-market prices
# Polling / expert forecasts
# Historical base rates
Put the Pipeline to Work
Production-grade ML that calibrates, monitors, and retrains itself. Every probability, backed by data.