Architecture Overview
A production risk engine must compute Value-at-Risk (VaR) and Conditional VaR (CVaR/Expected Shortfall) for potentially thousands of positions with sub-second latency. This article walks through the design decisions, infrastructure choices, and mathematical foundations behind the risk engine I built.
The Math: VaR and CVaR
Value at Risk (VaR)answers: “What is the maximum loss I can expect at a given confidence level over a given time horizon?” Formally, VaR at confidence level α is the α-quantile of the portfolio loss distribution:
VaR_α = inf{ x ∈ ℝ : P(L ≤ x) ≥ α }
CVaR (Expected Shortfall)goes further — it answers: “Given that we exceed VaR, what is the average loss?” This captures tail severity, not just tail frequency.
Computation Approaches
- Historical Simulation: Replay actual historical return scenarios. Simple, non-parametric, but limited by the data window.
- Parametric (Variance-Covariance): Assume returns follow a specific distribution. Fast, but fragile under fat tails.
- Monte Carlo Simulation: Generate thousands of random scenarios from a fitted stochastic process. Most flexible, but computationally intensive.
System Design
The engine consists of four layers:
- Data Ingestion: Streaming tick data via WebSocket, stored in time-series database (TimescaleDB).
- Risk Computation Core: NumPy/SciPy-based engine computing portfolio-level VaR/CVaR using vectorized operations across 10,000+ Monte Carlo paths.
- API Layer: FastAPI endpoints serving real-time risk metrics with sub-200ms response times.
- Visualization: React + Recharts frontend displaying loss distributions, VaR waterfall charts, and stress-test scenarios.
Performance Optimizations
Key optimization decisions that brought computation time from 12 seconds to 180ms:
- Vectorized NumPy operations instead of Python loops (100x speedup)
- Cholesky decomposition for correlated random variable generation
- LRU caching of covariance matrix calculations
- Async FastAPI endpoints with background task queuing for heavy computations