Vaibhav Mangroliya | Quantitative Developer

Architecture Overview

A production risk engine must compute Value-at-Risk (VaR) and Conditional VaR (CVaR/Expected Shortfall) for potentially thousands of positions with sub-second latency. This article walks through the design decisions, infrastructure choices, and mathematical foundations behind the risk engine I built.

The Math: VaR and CVaR

Value at Risk (VaR)answers: “What is the maximum loss I can expect at a given confidence level over a given time horizon?” Formally, VaR at confidence level α is the α-quantile of the portfolio loss distribution:

VaR_α = inf{ x ∈ ℝ : P(L ≤ x) ≥ α }

CVaR (Expected Shortfall)goes further — it answers: “Given that we exceed VaR, what is the average loss?” This captures tail severity, not just tail frequency.

Computation Approaches

Historical Simulation: Replay actual historical return scenarios. Simple, non-parametric, but limited by the data window.
Parametric (Variance-Covariance): Assume returns follow a specific distribution. Fast, but fragile under fat tails.
Monte Carlo Simulation: Generate thousands of random scenarios from a fitted stochastic process. Most flexible, but computationally intensive.

System Design

The engine consists of four layers:

Data Ingestion: Streaming tick data via WebSocket, stored in time-series database (TimescaleDB).
Risk Computation Core: NumPy/SciPy-based engine computing portfolio-level VaR/CVaR using vectorized operations across 10,000+ Monte Carlo paths.
API Layer: FastAPI endpoints serving real-time risk metrics with sub-200ms response times.
Visualization: React + Recharts frontend displaying loss distributions, VaR waterfall charts, and stress-test scenarios.

Performance Optimizations

Key optimization decisions that brought computation time from 12 seconds to 180ms:

Vectorized NumPy operations instead of Python loops (100x speedup)
Cholesky decomposition for correlated random variable generation
LRU caching of covariance matrix calculations
Async FastAPI endpoints with background task queuing for heavy computations