Summing Uniform Distributions#

The distribution of the sum of N uniformly distributed variables came up in a recent RdTools PR. The approach taken there was to just sample each distribution and sum the samples, but I wondered how it could be done analytically. This notebook is inspired by this StackExchange post and extends the derivation from just summing two uniform distributions on \([0,1]\) to two distributions with arbitrary bounds.

Given two independent random variables \(A\) and \(B\) with probability densities \(f_a(x)\) and \(f_b(x)\), the probability density \(f_c(x)\) of their sum \(A + B = C\) is given by the convolution \(f_a * f_b\):

\[ f_c(x) = \int_{-\infty}^{\infty} f_a(x') f_b(x - x') \mathrm{d}x' \]

For the case where \(A\) and \(B\) are uniformly distributed, we have:

\[\begin{split} f_a(x) = \begin{cases} \frac{1}{a_2 - a_1} & a_1 \le x \le a_2 \\ 0 & \mathrm{otherwise} \end{cases} \end{split}\]
\[\begin{split} f_b(x) = \begin{cases} \frac{1}{b_2 - b_1} & b_1 \le x \le b_2 \\ 0 & \mathrm{otherwise} \end{cases} \end{split}\]

Using these definitions, let’s visualize in the \(X'-X\) plane where the integrand is nonzero:

import matplotlib.pyplot as plt
import numpy as np
a1 = 0.2
a2 = 1.0
b1 = 1.0
b2 = 2.5

Xlim = [1, 4]
Xplim = [0, 1.2]
Xp, X = np.meshgrid(np.linspace(*Xplim, 1000), np.linspace(*Xlim, 1000))
nonzero = (a1 <= Xp) & (Xp <= a2) & (b1 <= X - Xp) & (X - Xp <= b2)

plt.imshow(nonzero, extent=Xplim + Xlim, aspect='auto', origin='lower')
plt.xlabel("x'")
plt.ylabel("x")

plt.axhline(b1 + a1, c='cyan')
plt.axhline(b1 + a2, c='cyan')
plt.axhline(b2 + a1, c='cyan')
plt.axhline(b2 + a2, c='cyan');
../_images/f855b259c1c3cb5b4237806809a1db8a8f63a324d9dd537a73b2d97f9bc75a58.png

The cross-section of the parallelogram is the range of \(x'\) where the integrand is nonzero, so the lower and upper bounds of each cross-section define the limits of integration as a function of \(x\). Notice that it can be divided into three domains with nonzero probability, drawn here separated by the cyan lines. Going up from the bottom, they are defined by:

Domain

           Domain bounds           

   Integration bounds   

1

\(b_1 + a_1 \le x \le b_1 + a_2\)

\(a_1 \le x' \le x - b_1\)

2

\(b_1 + a_2 \le x \le b_2 + a_1\)

\(a_1 \le x' \le a_2\)

3

\(b_2 + a_1 \le x \le b_2 + a_2\)

\(x - b_2 \le x' \le a_2\)

Finally, note that with the restricted bounds of integration \(x_1\) and \(x_2\), the integrand becomes the constant \((a_2-a_1)^{-1}(b_2-b_1)^{-1}\) and so the integral simplifies:

\[\begin{split} \begin{align} f_c(x) &= \frac{1}{(a_2-a_1)(b_2-b_1)} \int_{x'_1}^{x'_2} \mathrm{d}x' \\ &= \frac{x'_2 - x'_1}{(a_2-a_1)(b_2-b_1)} \end{align} \end{split}\]

where \(x'_1\) and \(x'_2\) are given by the integration bounds in the above table as a function of \(x\).

Note that we have assumed \(b_1 + a_2 \le b_2 + a_1\). If that is not the case, swap the \(a\)s and \(b\)s.

# Numerical approach
N = 10000000
A = np.random.uniform(a1, a2, N)
bd = 1/(b2-b1)
B = np.random.uniform(b1, b2, N)
C = A + B

plt.hist(C, bins=100, density=True, label='numerical PDF')


# Analytical approach
scale = 1/(a2-a1)/(b2-b1)

def f_c(x, a1, a2, b1, b2):
    if a1 + b2 < b1 + a2:
        a1, a2, b1, b2 = b1, b2, a1, a2

    y = np.zeros_like(x)
    region1 = (b1 + a1 <= x) & (x <= b1 + a2)
    region2 = (b1 + a2 <= x) & (x <= b2 + a1)
    region3 = (b2 + a1 <= x) & (x <= b2 + a2)
    y[region1] = scale * (x[region1] - b1 - a1)
    y[region2] = scale * (a2 - a1)
    y[region3] = scale * (a2 - (x[region3] - b2))
    return y

x = np.linspace(a1+b1, a2+b2, 1000)
y = f_c(x, a1, a2, b1, b2)
plt.plot(x, y, label='analytical PDF')
plt.xlabel('x')
plt.legend();
../_images/464b07149de9c01982fd2c476a584d874a4adf36be99988ebab6819b15d69a1d.png

Going back to the original problem of summing not just two random variables but \(N\) of them, it seems reasonable to expect that increasing \(N\) will increase the order of the polynomials involved, and visualizing it for \(N=3\) does show some higher-order behavior:

A = np.random.uniform(a1, a2, N)
bd = 1/(b2-b1)
B = np.random.uniform(b1, b2, N)
C = np.random.uniform(1.6, 2.0, N)
D = A + B + C

plt.hist(D, bins=100, density=True, label='numerical PDF')
plt.legend();
../_images/2251da5cc746bfcf4ecb9aa3e17189bc535f1a2eab92564f895e6b57c3cfdce1.png