Allele Frequency Calculator (Hardy-Weinberg)

Compute allele frequencies p and q from observed counts of AA, Aa, and aa genotypes.

Nature p + q = 1 p² + 2pq + q² HWE test built-in
Rate this calculator · 3.5 (2)

Allele frequencies p & q

Hardy-Weinberg · χ² HWE test

Instructions — Allele Frequency Calculator (Hardy-Weinberg)

  1. Count individuals in your sample by genotype: AA (homozygous dominant), Aa (heterozygous), aa (homozygous recessive).
  2. Enter each count. The total N = AA + Aa + aa.
  3. Read the allele frequencies p (allele A) and q (allele a). The calculator also computes expected genotype frequencies under Hardy-Weinberg and a chi-square test for fit.

Each diploid individual has two alleles at the locus. So AA contributes two A alleles, Aa contributes one of each, and aa contributes two a alleles. The total allele count is 2N.

Formulas

From genotype counts

p = (2 × nAA + nAa) / 2N
q = (2 × naa + nAa) / 2N
p + q = 1

Hardy-Weinberg expected genotypes

P(AA) = p²   P(Aa) = 2pq   P(aa) = q²
p² + 2pq + q² = 1

Chi-square test for HWE

χ² = Σ (O − E)² / E

Compare against χ² critical at α = 0.05 with df = 1: the threshold is 3.841. If your χ² is below 3.841, the population does not deviate significantly from HWE.

Five assumptions of Hardy-Weinberg

  • No mutation
  • Random mating (no assortative mating, no inbreeding)
  • No natural selection at the locus
  • No migration in or out
  • Infinite (or very large) population — no genetic drift

Reference

Worked example: PTC tasting

PTC (phenylthiocarbamide) tasting is controlled by a dominant T allele. In a sample of 200 students: 80 are TT, 96 are Tt, and 24 are tt. Compute:

  • p (T) = (2 × 80 + 96) / 400 = 256 / 400 = 0.64
  • q (t) = (2 × 24 + 96) / 400 = 144 / 400 = 0.36
  • p + q = 1.00
  • Expected AA = 0.64² × 200 = 81.9; observed = 80
  • Expected Aa = 2 × 0.64 × 0.36 × 200 = 92.2; observed = 96
  • Expected aa = 0.36² × 200 = 25.9; observed = 24
  • χ² = 0.044 + 0.156 + 0.142 = 0.34 (well under 3.841 — passes HWE)

Common allele frequencies in human populations

Trait / AlleleFrequency qNotes
Cystic fibrosis ΔF5080.02 (Europeans)Carrier rate ~1 in 25
Sickle cell HbS0.05–0.10 (W. Africa)Heterozygote malaria advantage
Tay-Sachs0.013 (Ashkenazi)Carrier rate ~1 in 30
PKU0.01 (Europeans)Newborn screening universal
Lactase persistence0.7–0.9 (N. Europe)Strong recent selection

Carrier frequency from q

For a rare recessive disease with affected frequency q² (e.g., 1 in 10,000), the carrier frequency is 2pq ≈ 2q (since p ≈ 1). So q² = 1/10,000 → q = 0.01 → carrier frequency ≈ 0.02 or 1 in 50.

Article — Allele Frequency Calculator (Hardy-Weinberg)

Allele frequency calculator — Hardy-Weinberg population genetics

Allele frequency is the proportion of a specific allele in the gene pool. For a two-allele locus, p is the frequency of the dominant allele and q is the frequency of the recessive allele, with p + q = 1. From genotype counts, p = (2 × n_AA + n_Aa) / 2N, where N is the total individual count. Under Hardy-Weinberg equilibrium, genotype frequencies follow p² + 2pq + q² = 1.

The Hardy-Weinberg principle is the null model of population genetics. It predicts genotype frequencies in a population where mutation, selection, migration, drift, and non-random mating are all absent. Any real population that deviates from Hardy-Weinberg expectations is doing so because of one of those five evolutionary forces — which makes Hardy-Weinberg the most useful baseline in the field.

What is allele frequency?

An allele is one of two or more alternative forms of a gene at a single locus. Allele frequency is the fraction of all copies of that gene in a population that are a particular allele. If 60 percent of all gene copies at a locus are allele A and 40 percent are allele a, then p = 0.60 and q = 0.40.

Counting allele frequency requires counting alleles, not individuals. A diploid organism has two copies at each locus, so a population of 100 individuals has 200 alleles at any given locus. A homozygous AA individual contributes two A alleles; a heterozygous Aa contributes one of each; a homozygous aa contributes two a alleles.

Did you know

The Hardy-Weinberg principle was published in 1908 by two scientists working independently: G.H. Hardy (a Cambridge mathematician who thought the result was "trivial") and Wilhelm Weinberg (a German physician who saw it in clinical contexts). Their separate papers became the founding equation of population genetics.

The allele frequency formula

From observed genotype counts, the formulas for p and q are short. Each homozygous AA individual contributes 2 A alleles; each heterozygote contributes 1 A and 1 a; each homozygous aa contributes 2 a alleles. Sum up and divide by the total allele count (2N).

Allele frequency math
p = (2·n_AA + n_Aa) / 2N
q = (2·n_aa + n_Aa) / 2N
p + q = 1
Expected AA = p² × N
Expected Aa = 2pq × N
Expected aa = q² × N

For a sample of 200 individuals with 80 AA, 96 Aa, 24 aa: p = (160 + 96) / 400 = 0.64. q = 1 − p = 0.36. Cross-check by the recessive formula: q = (48 + 96) / 400 = 0.36.

Hardy-Weinberg expected allele frequencies

Once you know p and q, the Hardy-Weinberg expectation gives the genotype frequencies that the population should show under five conditions: no mutation, random mating, no selection at the locus, no migration, and infinite (or very large) population size. Expected frequencies are p² for AA, 2pq for Aa, and q² for aa.

The 2 in the heterozygote term comes from two pathways. An Aa individual can be made by an A egg meeting an a sperm, or an a egg meeting an A sperm. Both are equally likely under random mating, so heterozygote frequency is 2 × p × q rather than p × q.

Chi-square test for allele frequency equilibrium

Real populations rarely match Hardy-Weinberg expectations exactly. The chi-square test asks whether the deviation between observed and expected counts is large enough to reject the null. The formula sums (observed − expected)² ÷ expected across all three genotypes. The result is compared against the critical value for one degree of freedom at the chosen significance level (3.841 at α = 0.05).

! Why not use percent for p and q

Allele frequencies are expressed as decimal fractions between 0 and 1, not percentages. A frequency of 0.60 means 60 percent of alleles are the dominant form, but writing it as "60 %" in the formula breaks the math: p × q = 60 × 40 = 2,400 instead of 0.24.

From allele frequency to carrier rate

For a rare recessive disease, the affected (homozygous recessive) frequency equals q². The carrier (heterozygous) frequency is 2pq. When q is small, p is close to 1, so the carrier frequency simplifies to roughly 2q. This is the standard back-calculation in genetic counseling.

  • Cystic fibrosis = affected rate 1 in 2,500 (Europeans). q = √(1/2500) = 0.02. Carrier rate ≈ 2q = 0.04 (1 in 25).
  • Sickle cell disease = affected rate 1 in 400 (US African-Americans). q = 0.05. Carrier rate ≈ 0.10 (1 in 10).
  • Tay-Sachs = affected rate 1 in 3,500 (Ashkenazi Jewish). q = 0.017. Carrier rate ≈ 0.034 (1 in 30).
  • Phenylketonuria (PKU) = affected rate 1 in 10,000. q = 0.01. Carrier rate ≈ 0.02 (1 in 50).
  • Albinism = affected rate 1 in 17,000 worldwide. q = 0.0077. Carrier rate ≈ 1 in 65.
  • Galactosemia = affected rate 1 in 60,000. q = 0.0041. Carrier rate ≈ 1 in 122.

Allele frequencies in human populations

Some alleles vary dramatically by population. The lactase persistence allele LCT-13910T sits at q = 0.7–0.9 in Northern Europeans, q = 0.2–0.4 in Middle Eastern populations, and q < 0.05 in East Asians. This is one of the strongest signatures of recent natural selection in the human genome — the result of 7,000 years of dairy farming.

Tip

When comparing allele frequencies between populations, plot them on a 2D PCA of all observed variants. Populations cluster by geographic ancestry, with frequency differences that map closely to migration history. The 1000 Genomes Project provides allele frequency data for 2,500+ individuals from 26 populations — the gold standard for reference.

What changes allele frequency over time

Five evolutionary forces shift allele frequencies. Mutation introduces new alleles at low rates (~10⁻⁸ per base per generation in humans). Natural selection raises beneficial alleles and lowers harmful ones, sometimes by 10–50 percent per generation under strong pressure. Migration averages frequencies between populations. Genetic drift causes random fluctuations, with stronger effects in small populations. Non-random mating (assortative or inbreeding) shifts genotype frequencies without changing allele frequencies.

Common allele frequency mistakes

Three errors repeat in introductory genetics. First, confusing allele frequency with genotype frequency — p is per allele, not per individual. Second, assuming the dominant allele must be the more common one; dominance is about phenotype, not frequency. Many recessive alleles (like lactase persistence in adults) are more common than the dominant form. Third, applying Hardy-Weinberg to admixed populations: if your sample mixes two ancestry groups with different allele frequencies, the combined sample shows excess homozygotes (the Wahlund effect) even when each subgroup is in HWE separately. A practical fix is to stratify by self-reported ancestry before running the test, then combine the per-stratum results using a Mantel-Haenszel approach. Small sample sizes also bite: with N below about 50, the chi-square approximation becomes unreliable, and an exact test (Guo and Thompson 1992) replaces it as the standard.

FAQ

Allele frequency is the proportion of one specific allele in the gene pool — counted per allele, so 2N alleles in N diploid individuals. Genotype frequency is the proportion of individuals carrying a particular genotype combination (e.g., Aa). They are linked under Hardy-Weinberg by the equations p² + 2pq + q² = 1.
For two alleles A and a: p = (2 × homozygous-dominant count + heterozygous count) / (2 × total individuals). Each diploid contributes two alleles, so heterozygotes split between p and q. The recessive frequency q follows from q = 1 − p.
It tests whether your observed genotype counts match the expectations under HWE (p², 2pq, q²). A χ² value above the critical 3.841 (df=1, α=0.05) signals that one of the five HWE assumptions is violated — typically selection, non-random mating, or population structure.
No. For a two-allele locus, p + q = 1 by definition. For three alleles, p + q + r = 1. Any sum greater than 1 means a counting error — usually double-counting heterozygotes.
If the affected (homozygous recessive) frequency is q² = X, then q = √X. The carrier (heterozygous) frequency is 2pq. For a 1-in-10,000 disease: q = 0.01, carrier rate ≈ 2 × 0.99 × 0.01 = 0.0198, or about 1 in 50.
Because it predicts genotype frequencies in a non-evolving population. Any deviation points to an evolutionary force — selection, mutation, migration, drift, or non-random mating. HWE is the baseline you compare real populations against, the same way ecology uses null distributions.
Not directly. For X-linked loci, allele frequencies differ between sexes, and HWE expectations adjust accordingly. Females (XX) follow p² + 2pq + q² for the X allele, while males (XY) carry only one allele so the affected frequency equals q rather than q².
Inbreeding (or any deviation from random mating) increases homozygosity. Allele frequencies (p, q) stay the same, but observed AA and aa counts rise above p² and q², while Aa drops below 2pq. The inbreeding coefficient F quantifies the gap: f(Aa) = 2pq(1 − F).