Back to all insights

Building a High-Performance Risk Engine

Sep 28, 20258 min read

Architecture Overview

A production risk engine must compute Value-at-Risk (VaR) and Conditional VaR (CVaR/Expected Shortfall) for potentially thousands of positions with sub-second latency. This article walks through the design decisions, infrastructure choices, and mathematical foundations behind the risk engine I built.

The Math: VaR and CVaR

Value at Risk (VaR)answers: “What is the maximum loss I can expect at a given confidence level over a given time horizon?” Formally, VaR at confidence level α is the α-quantile of the portfolio loss distribution:

VaR_α = inf{ x ∈ ℝ : P(L ≤ x) ≥ α }

CVaR (Expected Shortfall)goes further — it answers: “Given that we exceed VaR, what is the average loss?” This captures tail severity, not just tail frequency.

Computation Approaches

  • Historical Simulation: Replay actual historical return scenarios. Simple, non-parametric, but limited by the data window.
  • Parametric (Variance-Covariance): Assume returns follow a specific distribution. Fast, but fragile under fat tails.
  • Monte Carlo Simulation: Generate thousands of random scenarios from a fitted stochastic process. Most flexible, but computationally intensive.

System Design

The engine consists of four layers:

  1. Data Ingestion: Streaming tick data via WebSocket, stored in time-series database (TimescaleDB).
  2. Risk Computation Core: NumPy/SciPy-based engine computing portfolio-level VaR/CVaR using vectorized operations across 10,000+ Monte Carlo paths.
  3. API Layer: FastAPI endpoints serving real-time risk metrics with sub-200ms response times.
  4. Visualization: React + Recharts frontend displaying loss distributions, VaR waterfall charts, and stress-test scenarios.

Performance Optimizations

Key optimization decisions that brought computation time from 12 seconds to 180ms:

  • Vectorized NumPy operations instead of Python loops (100x speedup)
  • Cholesky decomposition for correlated random variable generation
  • LRU caching of covariance matrix calculations
  • Async FastAPI endpoints with background task queuing for heavy computations