Summing Uniform Distributions

The distribution of the sum of N uniformly distributed variables came up in a recent RdTools PR. The approach taken there was to just sample each distribution and sum the samples, but I wondered how it could be done analytically. This notebook is inspired by this StackExchange post and extends the derivation from just summing two uniform distributions on $[0,1]$ to two distributions with arbitrary bounds.

Given two independent random variables $A$ and $B$ with probability densities $f_a(x)$ and $f_b(x)$, the probability density $f_c(x)$ of their sum $A + B = C$ is given by the convolution $f_a * f_b$:

$$ f_c(x) = \int_{-\infty}^{\infty} f_a(x’) f_b(x - x’) \mathrm{d}x’ $$

For the case where $A$ and $B$ are uniformly distributed, we have:

$$ f_a(x) = \begin{cases} \frac{1}{a_2 - a_1} & a_1 \le x \le a_2 \ 0 & \mathrm{otherwise} \end{cases} $$

$$ f_b(x) = \begin{cases} \frac{1}{b_2 - b_1} & b_1 \le x \le b_2 \ 0 & \mathrm{otherwise} \end{cases} $$

Using these definitions, let’s visualize in the $X’-X$ plane where the integrand is nonzero:

import matplotlib.pyplot as plt
import numpy as np
a1 = 0.2
a2 = 1.0
b1 = 1.0
b2 = 2.5

Xlim = [1, 4]
Xplim = [0, 1.2]
Xp, X = np.meshgrid(np.linspace(*Xplim, 1000), np.linspace(*Xlim, 1000))
nonzero = (a1 <= Xp) & (Xp <= a2) & (b1 <= X - Xp) & (X - Xp <= b2)

plt.imshow(nonzero, extent=Xplim + Xlim, aspect='auto', origin='lower')

plt.axhline(b1 + a1, c='cyan')
plt.axhline(b1 + a2, c='cyan')
plt.axhline(b2 + a1, c='cyan')
plt.axhline(b2 + a2, c='cyan');

The cross-section of the parallelogram is the range of $x’$ where the integrand is nonzero, so the lower and upper bounds of each cross-section define the limits of integration as a function of $x$. Notice that it can be divided into three domains with nonzero probability, drawn here separated by the cyan lines. Going up from the bottom, they are defined by:


           Domain bounds           

   Integration bounds   


$b_1 + a_1 \le x \le b_1 + a_2$

$a_1 \le x’ \le x - b_1$


$b_1 + a_2 \le x \le b_2 + a_1$

$a_1 \le x’ \le a_2$


$b_2 + a_1 \le x \le b_2 + a_2$

$x - b_2 \le x’ \le a_2$

Finally, note that with the restricted bounds of integration $x_1$ and $x_2$, the integrand becomes the constant $(a_2-a_1)^{-1}(b_2-b_1)^{-1}$ and so the integral simplifies:

$$ \begin{align} f_c(x) &= \frac{1}{(a_2-a_1)(b_2-b_1)} \int_{x’_1}^{x’_2} \mathrm{d}x’ \ &= \frac{x’_2 - x’_1}{(a_2-a_1)(b_2-b_1)} \end{align} $$

where $x’_1$ and $x’_2$ are given by the integration bounds in the above table as a function of $x$.

Note that we have assumed $b_1 + a_2 \le b_2 + a_1$. If that is not the case, swap the $a$s and $b$s.

# Numerical approach
N = 10000000
A = np.random.uniform(a1, a2, N)
bd = 1/(b2-b1)
B = np.random.uniform(b1, b2, N)
C = A + B

plt.hist(C, bins=100, density=True, label='numerical PDF')

# Analytical approach
scale = 1/(a2-a1)/(b2-b1)

def f_c(x, a1, a2, b1, b2):
    if a1 + b2 < b1 + a2:
        a1, a2, b1, b2 = b1, b2, a1, a2

    y = np.zeros_like(x)
    region1 = (b1 + a1 <= x) & (x <= b1 + a2)
    region2 = (b1 + a2 <= x) & (x <= b2 + a1)
    region3 = (b2 + a1 <= x) & (x <= b2 + a2)
    y[region1] = scale * (x[region1] - b1 - a1)
    y[region2] = scale * (a2 - a1)
    y[region3] = scale * (a2 - (x[region3] - b2))
    return y

x = np.linspace(a1+b1, a2+b2, 1000)
y = f_c(x, a1, a2, b1, b2)
plt.plot(x, y, label='analytical PDF')

Going back to the original problem of summing not just two random variables but $N$ of them, it seems reasonable to expect that increasing $N$ will increase the order of the polynomials involved, and visualizing it for $N=3$ does show some higher-order behavior:

A = np.random.uniform(a1, a2, N)
bd = 1/(b2-b1)
B = np.random.uniform(b1, b2, N)
C = np.random.uniform(1.6, 2.0, N)
D = A + B + C

plt.hist(D, bins=100, density=True, label='numerical PDF')