Hypergeometric Distribution Calculator

Find the probability of k successes in n draws without replacement.

Science PMF + CDF 4 examples
Rate this calculator · 5.0 (1)

Hypergeometric distribution

P(X = k) = C(K, k) · C(N−K, n−k) / C(N, n)

Instructions — Hypergeometric Distribution Calculator

  1. Enter N — total population size.
  2. Enter K — number of "success" elements in the population.
  3. Enter n — sample size drawn without replacement.
  4. Enter k — number of successes observed in the sample.
  5. Read P(X = k), the cumulative probabilities, mean, variance, and standard deviation.

Examples include a deck of cards (52 cards, 13 hearts, draw 5), quality control (100 parts with 5 defects, sample 10), and lottery (49 numbers, 6 winners, draw 6).

Formulas

Probability mass function (PMF):

$$P(X = k) = \frac{\binom{K}{k} \binom{N-K}{n-k}}{\binom{N}{n}}$$

Cumulative distribution (CDF):

$$P(X \leq k) = \sum_{i=m}^{k} \frac{\binom{K}{i} \binom{N-K}{n-i}}{\binom{N}{n}}$$

where m = max(0, n − (N − K)).

Mean and variance:

$$E(X) = \frac{nK}{N}, \quad \text{Var}(X) = n \cdot \frac{K}{N} \cdot \frac{N-K}{N} \cdot \frac{N-n}{N-1}$$

The final term (N−n)/(N−1) is the finite population correction — it shrinks variance below the binomial value when n is a meaningful fraction of N.

Reference

  • Sampling without replacement from a finite population with two outcome classes.
  • Quality control — defects in a batch sample.
  • Card games — hand composition in deck-based games.
  • Audit sampling — irregularities in a finite ledger.
  • Capture-recapture — wildlife population estimation.
  • Drug screening — disease in a tested subpopulation.
  • Hypergeometric → binomial: as N → ∞ with K/N fixed, the hypergeometric converges to the binomial Bin(n, K/N).
  • Rule of thumb: use binomial approximation when n ≤ 5% of N — finite-population correction is negligible.

Article — Hypergeometric Distribution Calculator

Hypergeometric distribution calculator

The hypergeometric distribution describes drawing k successes in n draws without replacement from a population of size N containing K successes. The exact probability is P(X = k) = C(K, k) · C(N−K, n−k) / C(N, n). For a 5-card hand from a standard deck of 52 with 13 hearts, P(exactly 2 hearts) = 78 × 9,139 / 2,598,960 ≈ 0.2743 or 27.43%.

Use it whenever you sample without replacement from a finite population and care about exact probabilities. Card games, quality-control batch inspection, audit sampling, lottery analysis, and capture-recapture population estimates all rely on the hypergeometric. When N is large compared with n (rule of thumb: n ≤ 5% of N), the simpler binomial distribution gives nearly identical answers and is often used as a convenient approximation.

What is the hypergeometric distribution?

The hypergeometric distribution is a discrete probability distribution for the number of successes in n draws without replacement from a finite population of size N that contains K successes and N − K failures. Each draw changes the composition of the remaining pool, so the trials are dependent. The probability of any particular sample composition follows directly from counting how many such samples exist.

The distribution is parameterised by three values: N, K, and n. The random variable X counts successes in the sample. X can range from a minimum of max(0, n − (N − K)) to a maximum of min(n, K). The minimum is non-zero when the sample size n is so large relative to N − K that some successes must be drawn.

The hypergeometric distribution formula

The probability mass function (PMF) is a single elegant expression in binomial coefficients.

Hypergeometric distribution formulas
P(X = k) = C(K, k) · C(N−K, n−k) / C(N, n) PMF
P(X ≤ k) = Σᵢ₌ₘᵏ P(X = i) CDF, m = max(0, n − (N − K))
E(X) = nK/N mean = sample size × proportion
Var(X) = n(K/N)(1 − K/N)(N − n)/(N − 1) variance with FPC

The denominator C(N, n) counts the total number of ways to choose any sample of size n from N. The numerator counts samples with exactly k successes — pick k from the K successes, and n − k from the N − K failures. Their ratio is the probability. The mean matches intuition: sample size times population success-proportion.

Hypergeometric distribution example

Draw 5 cards from a 52-card deck. What is the probability of exactly 2 hearts?

  • N = 52, total cards in the deck.
  • K = 13, hearts (the "successes").
  • n = 5, cards drawn.
  • k = 2, hearts observed.
  • C(13, 2) = 78 — ways to pick 2 hearts from 13.
  • C(39, 3) = 9,139 — ways to pick 3 non-hearts from 39.
  • C(52, 5) = 2,598,960 — total 5-card hands.
  • P(X = 2) = 78 × 9,139 / 2,598,960 ≈ 0.2743 or 27.43%.

The full distribution for k = 0 through 5 gives 22.15%, 41.14%, 27.43%, 8.15%, 1.07%, and 0.05%. The mean is E(X) = 5 × 13/52 = 1.25 hearts. The mode is at k = 1. The probability of at least one heart is 1 − P(X = 0) = 1 − 0.2215 = 0.7785 or 77.85%.

Hypergeometric vs binomial distribution

Both distributions count successes in a fixed number of trials. The only difference is replacement.

Binomial
P fixed
with replacement, large N
Hypergeometric
P changes
without replacement, finite N

For the binomial Bin(n, p), each trial has the same fixed probability p of success. For the hypergeometric, the success probability shifts as draws happen because the pool changes. When N is much larger than n, the shifts are tiny and the two distributions agree closely. The exact rule: if n ≤ 0.05N, the binomial approximation is within about 1–2% of the hypergeometric.

For card-game probabilities, sample-survey calculations from small populations, and any quality-control situation where the inspected sample is a significant fraction of the lot, use the hypergeometric. For large-N applications like polling national populations or testing failure rates of mass-produced parts where n is tiny compared with N, the binomial is fine.

Hypergeometric distribution applications

Did you know

The capture-recapture method for estimating wildlife populations is built on the hypergeometric distribution. Tag K individuals, release them, sample n later, and count k tagged in the recapture. The maximum-likelihood population estimate is N̂ = n × K / k (the Lincoln-Petersen estimator). The hypergeometric PMF gives the confidence interval. Conservation biologists use this to track everything from bear populations to coral reef fish.

Lottery analysis: most 6/49 lotteries (pick 6 numbers from 49) use the hypergeometric. P(match exactly k of 6) = C(6, k) × C(43, 6−k) / C(49, 6). P(jackpot) = 1 / C(49, 6) ≈ 7.15 × 10⁻⁸ — about 1 in 13.98 million.

Quality control: a batch of 1,000 widgets contains an unknown number of defects. Sample 50 and find 2 defective. The hypergeometric posterior tells you the most likely number of defects in the full batch given that sample. Acceptance-sampling plans (MIL-STD-105E, ISO 2859) tabulate hypergeometric-derived acceptance thresholds for batch inspection.

Card game probabilities: poker, bridge, and Texas hold'em all rely on hypergeometric calculations. P(flush in 5-card poker) = 4 × C(13, 5) / C(52, 5) ≈ 0.00198 — about 1 in 506 hands. Pre-flop and post-flop probability tables in poker strategy books are precomputed hypergeometric PMFs.

Mean and variance of the hypergeometric

E(X) = nK/N is the expected number of successes — the sample size times the population proportion of successes. This is the same as the binomial mean np if p = K/N. The variance differs by the finite-population correction (FPC):

Var(X) = n × (K/N) × (1 − K/N) × (N − n)/(N − 1).

The first three factors are exactly the binomial variance np(1 − p). The fourth factor, (N − n)/(N − 1), shrinks variance whenever n is a meaningful fraction of N. Concretely, if you sample 60% of the population (n = 0.6N), the FPC is 0.4 — variance is 60% lower than the with-replacement equivalent. At the extreme n = N you sample everyone, variance is zero, and the count equals K with certainty.

Common hypergeometric mistakes

Tip

Always check that k is achievable. The valid range is max(0, n − (N − K)) ≤ k ≤ min(n, K). Asking "P(X = 6) for n = 5" is impossible — you cannot have more successes than draws. The calculator returns 0 outside the valid range.

Don't use hypergeometric for with-replacement sampling

If you sample with replacement (each draw goes back before the next), the trials are independent and the distribution is binomial, not hypergeometric. Conversely, if you sample without replacement but treat it as binomial, you slightly overstate variance — especially when n is a large fraction of N. Match the distribution to the actual sampling protocol.

The most common error is using binomial when you should use hypergeometric. For small populations, the two give meaningfully different answers. Inspecting 10 widgets from a lot of 50 with 5 defects: hypergeometric P(X = 1) = 0.4313, binomial P(X = 1) = 0.3874. The 4-percentage-point gap matters for acceptance-sampling decisions.

The second mistake is confusing K (successes in population) with k (successes in sample). They are different quantities. K is a parameter of the distribution; k is the random variable's value. The notation N, K, n, k is standard but easy to mix up — define each one explicitly before plugging into the formula.

The third error is using approximations when exact calculation is feasible. Modern computers can evaluate hypergeometric probabilities for N up to 10⁶ or more using log-gamma functions to avoid overflow. There is rarely a reason to settle for the binomial approximation when the exact calculation is one function call away.

FAQ

The hypergeometric distribution describes the probability of obtaining exactly k successes in n draws without replacement from a population of size N containing K successes. Each draw changes the composition of the remaining population, so the outcomes are not independent — that distinguishes it from the binomial distribution, which models draws with replacement.
P(X = k) = C(K, k) × C(N−K, n−k) / C(N, n), where C(a, b) is the binomial coefficient (a choose b). The numerator counts ways to pick k successes from K and n−k failures from N−K; the denominator counts ways to pick any sample of size n from N. The ratio gives the probability.
Draw 5 cards from a standard 52-card deck. What is the probability of getting exactly 2 hearts? Here N = 52, K = 13 (hearts), n = 5, k = 2. P(X = 2) = C(13,2) × C(39,3) / C(52,5) = 78 × 9,139 / 2,598,960 ≈ 0.2743 or 27.43%.
Binomial: draws are independent (sampling with replacement, or infinite population). Each trial has the same probability of success. Hypergeometric: draws are without replacement from a finite population. Each draw changes the composition. When n is small relative to N (rule of thumb: n ≤ 5% of N), the two distributions give nearly identical results.
E(X) = nK/N. Intuitively, the sample proportion (K/N) times the sample size equals the expected number of successes. For 5 cards from a 52-card deck with 13 hearts, E(X) = 5 × 13/52 = 1.25 hearts on average.
Var(X) = n × (K/N) × (1 − K/N) × (N − n)/(N − 1). The last factor is the finite-population correction, which is always ≤ 1. It reduces variance compared with the binomial Var = np(1−p), reflecting that without-replacement sampling has lower variability when n approaches N.
Sum P(X = i) over the relevant range. For P(X ≤ k) sum from i = max(0, n − (N − K)) up to k. For P(X ≥ k) sum from k up to min(n, K). For example, P(at least 1 ace in a 5-card hand) = 1 − P(0 aces) = 1 − C(48,5)/C(52,5) = 1 − 0.6588 = 0.3412.
Use hypergeometric whenever you sample without replacement from a finite population and care about exact probabilities. Use binomial when the population is effectively infinite (large N relative to n) or when sampling with replacement. Rule of thumb: if n/N ≤ 0.05, the binomial approximation is within about 1–2% of the exact hypergeometric probability.