Article — Hypergeometric Distribution Calculator
Hypergeometric distribution calculator
The hypergeometric distribution describes drawing k successes in n draws without replacement from a population of size N containing K successes. The exact probability is P(X = k) = C(K, k) · C(N−K, n−k) / C(N, n). For a 5-card hand from a standard deck of 52 with 13 hearts, P(exactly 2 hearts) = 78 × 9,139 / 2,598,960 ≈ 0.2743 or 27.43%.
Use it whenever you sample without replacement from a finite population and care about exact probabilities. Card games, quality-control batch inspection, audit sampling, lottery analysis, and capture-recapture population estimates all rely on the hypergeometric. When N is large compared with n (rule of thumb: n ≤ 5% of N), the simpler binomial distribution gives nearly identical answers and is often used as a convenient approximation.
What is the hypergeometric distribution?
The hypergeometric distribution is a discrete probability distribution for the number of successes in n draws without replacement from a finite population of size N that contains K successes and N − K failures. Each draw changes the composition of the remaining pool, so the trials are dependent. The probability of any particular sample composition follows directly from counting how many such samples exist.
The distribution is parameterised by three values: N, K, and n. The random variable X counts successes in the sample. X can range from a minimum of max(0, n − (N − K)) to a maximum of min(n, K). The minimum is non-zero when the sample size n is so large relative to N − K that some successes must be drawn.
The hypergeometric distribution formula
The probability mass function (PMF) is a single elegant expression in binomial coefficients.
P(X = k) = C(K, k) · C(N−K, n−k) / C(N, n) PMFP(X ≤ k) = Σᵢ₌ₘᵏ P(X = i) CDF, m = max(0, n − (N − K))E(X) = nK/N mean = sample size × proportionVar(X) = n(K/N)(1 − K/N)(N − n)/(N − 1) variance with FPCThe denominator C(N, n) counts the total number of ways to choose any sample of size n from N. The numerator counts samples with exactly k successes — pick k from the K successes, and n − k from the N − K failures. Their ratio is the probability. The mean matches intuition: sample size times population success-proportion.
Hypergeometric distribution example
Draw 5 cards from a 52-card deck. What is the probability of exactly 2 hearts?
- N = 52, total cards in the deck.
- K = 13, hearts (the "successes").
- n = 5, cards drawn.
- k = 2, hearts observed.
- C(13, 2) = 78 — ways to pick 2 hearts from 13.
- C(39, 3) = 9,139 — ways to pick 3 non-hearts from 39.
- C(52, 5) = 2,598,960 — total 5-card hands.
- P(X = 2) = 78 × 9,139 / 2,598,960 ≈ 0.2743 or 27.43%.
The full distribution for k = 0 through 5 gives 22.15%, 41.14%, 27.43%, 8.15%, 1.07%, and 0.05%. The mean is E(X) = 5 × 13/52 = 1.25 hearts. The mode is at k = 1. The probability of at least one heart is 1 − P(X = 0) = 1 − 0.2215 = 0.7785 or 77.85%.
Hypergeometric vs binomial distribution
Both distributions count successes in a fixed number of trials. The only difference is replacement.
For the binomial Bin(n, p), each trial has the same fixed probability p of success. For the hypergeometric, the success probability shifts as draws happen because the pool changes. When N is much larger than n, the shifts are tiny and the two distributions agree closely. The exact rule: if n ≤ 0.05N, the binomial approximation is within about 1–2% of the hypergeometric.
For card-game probabilities, sample-survey calculations from small populations, and any quality-control situation where the inspected sample is a significant fraction of the lot, use the hypergeometric. For large-N applications like polling national populations or testing failure rates of mass-produced parts where n is tiny compared with N, the binomial is fine.
Hypergeometric distribution applications
The capture-recapture method for estimating wildlife populations is built on the hypergeometric distribution. Tag K individuals, release them, sample n later, and count k tagged in the recapture. The maximum-likelihood population estimate is N̂ = n × K / k (the Lincoln-Petersen estimator). The hypergeometric PMF gives the confidence interval. Conservation biologists use this to track everything from bear populations to coral reef fish.
Lottery analysis: most 6/49 lotteries (pick 6 numbers from 49) use the hypergeometric. P(match exactly k of 6) = C(6, k) × C(43, 6−k) / C(49, 6). P(jackpot) = 1 / C(49, 6) ≈ 7.15 × 10⁻⁸ — about 1 in 13.98 million.
Quality control: a batch of 1,000 widgets contains an unknown number of defects. Sample 50 and find 2 defective. The hypergeometric posterior tells you the most likely number of defects in the full batch given that sample. Acceptance-sampling plans (MIL-STD-105E, ISO 2859) tabulate hypergeometric-derived acceptance thresholds for batch inspection.
Card game probabilities: poker, bridge, and Texas hold'em all rely on hypergeometric calculations. P(flush in 5-card poker) = 4 × C(13, 5) / C(52, 5) ≈ 0.00198 — about 1 in 506 hands. Pre-flop and post-flop probability tables in poker strategy books are precomputed hypergeometric PMFs.
Mean and variance of the hypergeometric
E(X) = nK/N is the expected number of successes — the sample size times the population proportion of successes. This is the same as the binomial mean np if p = K/N. The variance differs by the finite-population correction (FPC):
Var(X) = n × (K/N) × (1 − K/N) × (N − n)/(N − 1).
The first three factors are exactly the binomial variance np(1 − p). The fourth factor, (N − n)/(N − 1), shrinks variance whenever n is a meaningful fraction of N. Concretely, if you sample 60% of the population (n = 0.6N), the FPC is 0.4 — variance is 60% lower than the with-replacement equivalent. At the extreme n = N you sample everyone, variance is zero, and the count equals K with certainty.
Common hypergeometric mistakes
Always check that k is achievable. The valid range is max(0, n − (N − K)) ≤ k ≤ min(n, K). Asking "P(X = 6) for n = 5" is impossible — you cannot have more successes than draws. The calculator returns 0 outside the valid range.
If you sample with replacement (each draw goes back before the next), the trials are independent and the distribution is binomial, not hypergeometric. Conversely, if you sample without replacement but treat it as binomial, you slightly overstate variance — especially when n is a large fraction of N. Match the distribution to the actual sampling protocol.
The most common error is using binomial when you should use hypergeometric. For small populations, the two give meaningfully different answers. Inspecting 10 widgets from a lot of 50 with 5 defects: hypergeometric P(X = 1) = 0.4313, binomial P(X = 1) = 0.3874. The 4-percentage-point gap matters for acceptance-sampling decisions.
The second mistake is confusing K (successes in population) with k (successes in sample). They are different quantities. K is a parameter of the distribution; k is the random variable's value. The notation N, K, n, k is standard but easy to mix up — define each one explicitly before plugging into the formula.
The third error is using approximations when exact calculation is feasible. Modern computers can evaluate hypergeometric probabilities for N up to 10⁶ or more using log-gamma functions to avoid overflow. There is rarely a reason to settle for the binomial approximation when the exact calculation is one function call away.