Z-Test Calculator

Run a one-sample z-test.

Science 3 α levels Two-tailed + one-tailed
Rate this calculator · 3.0 (1)

One-sample z-test

z = (x̄ − μ₀) / (σ / √n)

Instructions — Z-Test Calculator

  1. Enter the sample mean (x̄) — the average value observed in your sample.
  2. Enter the hypothesised population mean (μ₀) — the value you are testing against.
  3. Enter the population standard deviation (σ) — required for a z-test. If σ is unknown, use a t-test instead.
  4. Enter the sample size (n). The z-test assumes n ≥ 30 unless the data are normally distributed.
  5. Pick a tail: two-tailed (H₁: μ ≠ μ₀), right-tailed (H₁: μ > μ₀), or left-tailed (H₁: μ < μ₀).
  6. Read the z-statistic, p-value, 95% confidence interval, and decision at three significance levels.

Formulas

Test statistic:

$$z = \frac{\bar{x} - \mu_0}{\sigma / \sqrt{n}}$$

Standard error of the mean:

$$SE = \frac{\sigma}{\sqrt{n}}$$

Two-tailed p-value:

$$p = 2 \cdot P(Z > |z|) = 2 \cdot \left[1 - \Phi(|z|)\right]$$

95% confidence interval:

$$\bar{x} \pm 1.96 \cdot SE$$

Critical values for the standard normal: ±1.6449 (α=0.10 two-sided), ±1.9600 (α=0.05), ±2.5758 (α=0.01).

Reference

  • Assumptions: population σ known, observations independent, data approximately normal (or n ≥ 30 by CLT).
  • One-sample z-test compares sample mean to a hypothesised population mean.
  • Two-tailed critical z values: ±1.6449 (α=0.10), ±1.9600 (α=0.05), ±2.5758 (α=0.01).
  • One-tailed critical z values: 1.2816 (α=0.10), 1.6449 (α=0.05), 2.3263 (α=0.01).
  • Cohen effect size d = (x̄ − μ₀)/σ: small d ≈ 0.2, medium d ≈ 0.5, large d ≈ 0.8.
  • Use t-test instead when σ is unknown or n < 30 with non-normal data.

Article — Z-Test Calculator

Z-test calculator

A z-test compares a sample mean to a known population mean using the standard normal distribution. The test statistic is z = (x̄ − μ₀) / (σ/√n). With a sample mean of 102, population mean 100, σ = 15, and n = 36, the z-statistic is 0.80, giving a two-tailed p-value of 0.4237 — not significant at any conventional α level.

The z-test belongs to the family of parametric hypothesis tests. It assumes you know the population standard deviation in advance, which is rare in practice but common in textbook problems, quality control with established process variation, and large-sample survey work where σ is well-estimated. When σ is unknown, the t-test takes over — and for n ≥ 30 the two tests give nearly identical results.

What is a z-test?

A one-sample z-test is a statistical procedure for deciding whether a sample mean is consistent with a hypothesised population mean. The null hypothesis H₀ states that the true population mean equals μ₀. The alternative H₁ states that it differs — either in some direction (one-tailed) or in either direction (two-tailed). The z-statistic measures how many standard errors the sample mean lies from μ₀.

The standard normal distribution provides the reference. A z of ±1.96 corresponds to the two-tailed 5% significance level; ±2.576 corresponds to 1%. Any value of |z| larger than the critical value at your chosen α leads to rejecting H₀. The equivalent p-value approach computes the probability of seeing |z| or larger under H₀ and rejects when p < α.

Z-test formula and procedure

The full procedure is five steps. Step three is the formula; steps one and two are setup; steps four and five are interpretation.

Z-test formula and steps
1. State H₀ and H₁ H₀: μ = μ₀; H₁: μ ≠ μ₀ (or one-tailed)
2. Choose α typically 0.05, sometimes 0.01 or 0.10
3. Compute z z = (x̄ − μ₀) / (σ/√n)
4. Find p-value two-tailed: p = 2 × [1 − Φ(|z|)]
5. Decide reject H₀ if p < α

Critical z values for two-tailed tests are 1.6449 (α=0.10), 1.9600 (α=0.05), and 2.5758 (α=0.01). For one-tailed tests, divide α by one: 1.2816 (α=0.10), 1.6449 (α=0.05), 2.3263 (α=0.01). These are the standard normal quantiles you compare |z| against.

Z-test example, step by step

A factory produces metal rods with target length 100 cm and known historical standard deviation 15 cm. A quality control sample of 36 rods averages 102 cm. Is the production line drifting?

  • H₀: μ = 100 cm (process on target).
  • H₁: μ ≠ 100 cm (process drifted, two-tailed).
  • α: 0.05.
  • SE: 15 / √36 = 2.50 cm.
  • z: (102 − 100) / 2.50 = 0.80.
  • p (two-tailed): 2 × (1 − Φ(0.80)) = 2 × 0.2119 = 0.4237.
  • Decision: 0.4237 > 0.05, so fail to reject H₀.
  • 95% CI: 102 ± 1.96 × 2.50 = [97.1, 106.9] cm. Contains 100, consistent with H₀.

The 2 cm observed deviation is plausible random variation given the process variability and sample size. To detect a real 2 cm drift at α = 0.05 with 80% power, you would need n ≈ 441 rods — much larger than the n = 36 sample.

Z-test vs t-test

The choice between z-test and t-test hinges on whether σ is known. The z-test uses the known population σ in the denominator. The t-test substitutes the sample standard deviation s and uses the Student t-distribution, which has heavier tails to compensate for the extra uncertainty.

Z-test
σ known
large n or normal data
T-test
σ estimated
any n, more general

For n ≥ 30, t and z give nearly identical p-values because the t-distribution converges to the normal. For n < 30 with unknown σ, the t-test is strictly preferable — using the z-test in that regime underestimates p-values and inflates the false-positive rate. In practice, modern statistical software defaults to t-test for one-sample mean comparisons.

Z-test p-value and significance

The p-value is the probability of observing a z-statistic at least as extreme as the calculated one, assuming H₀ is true. Small p values (p < α) lead to rejecting H₀. The conventional thresholds are α = 0.05 in most fields, α = 0.01 in stricter contexts (medical trials, particle physics), and α = 0.10 in exploratory or social-science work.

Did you know

The 5% significance threshold has no theoretical basis. Ronald Fisher chose it as a convenient round number in 1925's Statistical Methods for Research Workers, writing that "the value for which P = 0.05... is convenient to take this point as a limit in judging whether a deviation is to be considered significant." Almost a century later, replication crises in psychology and biomedicine have prompted calls to lower the default to 0.005 or to abandon fixed thresholds entirely.

Z-test effect size and power

Statistical significance and practical significance are not the same thing. With a sufficiently large sample, even tiny differences become statistically significant — but the effect may be too small to care about. Always report effect size alongside the p-value.

Cohen's d for a one-sample test is (x̄ − μ₀)/σ. Conventional benchmarks are d = 0.2 (small), 0.5 (medium), 0.8 (large). Sample size for 80% power at α = 0.05 two-tailed is approximately n = (1.96 + 0.84)² / d² ≈ 7.85 / d². So d = 0.5 needs n ≈ 31; d = 0.2 needs n ≈ 196; d = 0.8 needs n ≈ 12.

Significant ≠ important

A clinical trial with n = 50,000 may flag a 0.1 mm drug-induced height change as "highly significant" (p < 0.001) even though no one cares about a 0.1 mm effect. Conversely, a small pilot study (n = 10) may miss a 30% effect (p = 0.20) and report no significant difference even though the effect is real and important. Always interpret p-values in light of effect size and sample size.

Common z-test mistakes

Tip

Decide one-tailed vs two-tailed before looking at the data. Switching to one-tailed after seeing the direction of your sample is a form of p-hacking that doubles your false-positive rate. The choice should follow from the hypothesis, not the result.

The most common mistake is treating an unknown sample SD as if it were the known population σ. Plugging s in place of σ and running a z-test gives slightly anti-conservative p-values, especially with small samples. The correct move is the t-test. Most textbook problems pretend σ is known to keep things simple, but real data rarely cooperate.

The second common mistake is multiple testing without correction. Running 20 z-tests on the same data at α = 0.05 gives an expected one false positive even if every H₀ is true. The Bonferroni correction (divide α by the number of tests) is conservative but easy; the Benjamini–Hochberg false-discovery-rate procedure is more powerful for large numbers of tests.

A subtler trap is interpreting a non-significant p as "no effect." Failing to reject H₀ is not evidence for H₀ — it is consistent with no effect, a small effect, or insufficient power to detect a real effect. To support H₀ you need a different framework, such as a confidence interval that excludes meaningful effect sizes or an equivalence test.

FAQ

A z-test is a parametric hypothesis test that compares a sample mean to a known population mean when the population standard deviation σ is known. It uses the standard normal distribution to compute a p-value. The test statistic is z = (x̄ − μ₀) / (σ/√n). It is most appropriate for large samples (n ≥ 30) where the central limit theorem justifies the normality assumption.
z = (x̄ − μ₀) / (σ/√n), where x̄ is the sample mean, μ₀ is the hypothesised population mean, σ is the population standard deviation, and n is the sample size. The denominator σ/√n is called the standard error of the mean. Larger z values indicate stronger evidence against the null hypothesis.
Use a z-test when the population σ is known and either the data are normally distributed or n ≥ 30. Use a t-test when σ is unknown (replaced by the sample standard deviation s) — which is the more common real-world situation. For n ≥ 30, the t-distribution converges to the standard normal, so the two tests give nearly identical results.
The p-value is the probability of observing a test statistic at least as extreme as the one obtained, assuming the null hypothesis is true. If p < α (your chosen significance level, typically 0.05), reject H₀ and conclude there is statistically significant evidence against it. If p ≥ α, fail to reject H₀ — the data are consistent with the null.
A two-tailed test (H₁: μ ≠ μ₀) checks for any deviation from the null mean in either direction. A one-tailed test checks one specific direction: right-tailed for H₁: μ > μ₀, left-tailed for H₁: μ < μ₀. Two-tailed tests are more conservative because they split α between two tails; one-tailed tests have more power if you correctly predict the direction in advance.
The 95% CI is x̄ ± 1.96 × σ/√n. It is the range of plausible values for the population mean. If you repeated the experiment many times and computed a 95% CI each time, about 95% of those intervals would contain the true population mean. If μ₀ falls outside this interval, the two-tailed z-test rejects H₀ at α = 0.05.
Convention is n ≥ 30 for the central limit theorem to ensure the sampling distribution of the mean is approximately normal regardless of the underlying distribution. For known-normal data, the z-test works at any sample size. Required sample size for a given power depends on the effect size — typically n = 64 detects d = 0.5 at 80% power, two-tailed, α = 0.05.
Effect size measures the magnitude of the difference, independent of sample size. Cohen d = (x̄ − μ₀)/σ classifies effects as small (0.2), medium (0.5), or large (0.8). A z-test can return statistical significance with a tiny effect if n is large, so always report both p-value and effect size. A significant p with d = 0.05 is rarely practically meaningful.