The Debate
In financial time-series forecasting, there's an ongoing tension between classical statistical models (ARIMA, GARCH) and modern deep learning approaches (LSTM, Transformers). Each has fundamental strengths and weaknesses that make them suited to different regimes and data characteristics.
Classical Models: ARMA/ARIMA/GARCH
ARIMA (AutoRegressive Integrated Moving Average) models the conditional mean of a time series through three components: autoregression (AR), differencing (I), and moving average (MA).GARCH extends this by modeling conditional variance — capturing volatility clustering.
Strengths
- Statistically rigorous with well-understood theoretical properties
- Interpretable parameters with economic meaning
- Works well with small datasets — no overfitting risk from millions of parameters
- Excellent for volatility forecasting (GARCH family)
Weaknesses
- Linear assumptions limit expressiveness
- Cannot capture complex nonlinear dependencies
- Stationarity requirement often requires heavy preprocessing
LSTM Networks
Long Short-Term Memory networks use gated recurrent cells to learn long-range temporal dependencies. They process sequences element-by-element, maintaining a hidden state that can theoretically capture patterns spanning hundreds of time steps.
Strengths
- Captures nonlinear patterns classical models miss entirely
- Can incorporate multivariate features (technical indicators, sentiment, volume)
- No stationarity requirement — learns from raw data
Weaknesses
- Requires large training datasets (5+ years of daily data minimum)
- Black-box: no interpretability of learned features
- Prone to overfitting on noisy financial data
- Computationally expensive to train and tune
Empirical Results
In my experiments on daily Euro STOXX 50 returns (2010-2025), results were nuanced:
- Short-term (1-5 days): ARIMA-GARCH marginally outperformed LSTM in terms of RMSE and directional accuracy.
- Medium-term (5-20 days): LSTM showed 8-12% improvement in RMSE, likely due to capturing nonlinear momentum effects.
- Regime-dependent: GARCH excelled during high-volatility regimes; LSTM was better during trending markets.
My Take: Ensemble Everything
The key insight is that these approaches are complementary, not competing. An ensemble combining GARCH volatility forecasts with LSTM mean forecasts consistently outperformed either model individually by 15-20% on risk-adjusted metrics. The statistical rigor of classical models provides guardrails that prevent the deep learning component from making extreme predictions in data-sparse regimes.