P-Value Calculator

P-value calculator for z-tests and t-tests.

Science Exact CDF z & t tests
Rate this calculator · 4.0 (4)

P-Value Calculator

z-test and t-test · one- or two-tailed · significance at α 0.05 / 0.01 / 0.001

Instructions — P-Value Calculator

1

Pick the test type

Use z-test if you know the population standard deviation, or if your sample is large enough (n ≥ 30) for the central limit theorem to make the normal approximation accurate. Use t-test when you estimate the standard deviation from a sample, especially for n < 30. The two distributions converge as df grows: at df = 100 they are nearly identical.

2

Set the tail

Pick two-tailed if your alternative hypothesis is "different from H₀" — the most common default. Pick one-tailed if you predicted a direction (greater or less) before collecting data. Switching to one-tailed after seeing the data is p-hacking and is not allowed in registered research.

3

Enter the statistic, get the p-value

Type the test statistic. For a t-test, also type the degrees of freedom (typically n - 1 for a one-sample t-test or n₁ + n₂ - 2 for an independent two-sample t-test). The calculator returns the p-value and reports whether it falls below your chosen α.

Critical values to remember: z = 1.96 gives two-tailed p = 0.05. z = 2.576 gives p = 0.01. z = 3.29 gives p = 0.001.
For t-tests: the critical value at df = 30 is t = 2.042 for p = 0.05 two-tailed — close to z = 1.96, and the gap shrinks further as df grows.

Formulas

The p-value is the probability of observing a test statistic at least as extreme as the one you got, assuming the null hypothesis is true. The calculator integrates the relevant probability density function from your statistic to the tail.

Definition
$$ p = P(T \geq t_{obs} \mid H_0) $$
The probability of seeing the observed test statistic or one more extreme, computed under the null distribution.
z-test (two-tailed)
$$ p = 2 \, \bigl(1 - \Phi(|z|)\bigr) $$
Φ is the standard normal CDF. For z = 1.96, this gives p = 0.05. The calculator uses the Hastings approximation from Abramowitz & Stegun, accurate to 7.5 × 10⁻⁸.
t-test (two-tailed)
$$ p = 2 \, \bigl(1 - F_t(|t|, df)\bigr) $$
F_t is the Student t CDF. The calculator evaluates it through the regularized incomplete beta function, the standard approach in R, Python (scipy.stats), and Numerical Recipes.
One-tailed forms
$$ p_{right} = 1 - F(t_{obs}) \quad p_{left} = F(t_{obs}) $$
Use the right tail if H₁ predicts T > t₀, the left tail if H₁ predicts T < t₀. One-tailed p-values are half the two-tailed value (when the result lies in the predicted direction).
Beta-function relation
$$ F_t(t, df) = 1 - \tfrac{1}{2} I_{x}(df/2, 1/2), \;\; x = \tfrac{df}{df + t^2} $$
The regularized incomplete beta I_x(a,b) is the workhorse function for computing t-distribution probabilities. Numerical Recipes §6.4 covers the continued-fraction implementation used here.
Significance decision
$$ \text{Reject } H_0 \iff p \leq \alpha $$
The cutoff α is set before the test. Conventional values: 0.05 in social science, 0.01 in biology and medicine, 0.001 (or 5σ ≈ 3 × 10⁻⁷) in particle physics.

Reference

Z-test critical values
zOne-tailed pTwo-tailed pCommon label
1.0000.15870.31731σ deviation
1.2820.10000.2000p = 0.10 cutoff (one-tailed)
1.6450.05000.1000p = 0.05 (one-tailed)
1.9600.02500.0500p = 0.05 (two-tailed)
2.0000.02280.04552σ deviation
2.3260.01000.0200p = 0.01 (one-tailed)
2.5760.00500.0100p = 0.01 (two-tailed)
3.0000.001350.00273σ deviation
3.2910.00050.0010p = 0.001 (two-tailed)
5.0002.9e-75.7e-75σ (particle physics)

Student t critical values (two-tailed)

Critical values of the Student t distribution at the most common significance levels. As df grows, these approach the z-distribution values in the table above.

t @ α = 0.05
dft crit
52.571
102.228
152.131
202.086
302.042
502.009
1001.984
1.960
t @ α = 0.01
dft crit
54.032
103.169
152.947
202.845
302.750
502.678
1002.626
2.576

Note: the calculator computes p-values to better than 4 decimal places using the same numerical methods (Numerical Recipes Hastings approximation for the normal CDF and the regularized incomplete beta function for Student t) that underlie scipy.stats and R's base statistics package.

Article — P-Value Calculator

The p-value calculator, with the math that runs underneath

A p-value is the probability of observing a test statistic at least as extreme as the one you got, assuming the null hypothesis is true. For a two-tailed z-test, p = 2 × (1 − Φ(|z|)). For a two-tailed t-test, p = 2 × (1 − F_t(|t|, df)). The smaller the value, the stronger the evidence against the null hypothesis — but the p-value alone is never the whole story.

The calculator above evaluates both formulas using the same algorithms that scipy.stats and R rely on: the Abramowitz & Stegun Hastings approximation for the normal CDF and the regularized incomplete beta function for the Student t CDF. Results match R's pnorm() and pt() to at least four decimal places across the working range.

What a p-value actually is

Imagine the null hypothesis is true. Under that assumption, your test statistic follows a known distribution — the standard normal for a z-test, the Student t for a t-test. The p-value is the probability that this random variable lands at least as far out as your observed statistic, in the direction your alternative hypothesis specifies.

Ronald Fisher introduced the concept in 1925 in Statistical Methods for Research Workers. His definition is the one used today: "if P is small, we have either an exceptionally rare event or the null hypothesis is false." The smaller the p-value, the more uncomfortable it becomes to attribute the result to chance.

Did you know

Fisher chose the 0.05 threshold because it corresponded to "about two standard deviations" on the normal distribution. He never claimed it was a fundamental cutoff — only that "a value of P = 0.05 will be regarded as a convenient point." Decades of journal practice turned the convenience into a rule.

How to calculate a p-value

Every p-value calculation has three ingredients: a test statistic, a reference distribution, and a tail specification.

  • Test statistic: the number that summarizes your data under H₀. Common forms: z, t, χ², F.
  • Reference distribution: what the statistic looks like if H₀ is true. Standard normal for z, Student t for t-tests, chi-square for goodness-of-fit, F for variance ratios.
  • Tail: one-sided (H₁ predicts a direction) or two-sided (H₁ says "different in either direction").
  • The p-value: the integral of the reference distribution past the observed statistic, in the chosen tail(s).

For a two-tailed test the p-value is twice the upper-tail probability of the absolute statistic. For a one-tailed test it is just the relevant tail. The calculator handles all three options.

P-value from a z-test

The z-test uses the standard normal distribution. The two-tailed p-value is p = 2 × (1 − Φ(|z|)), where Φ is the cumulative distribution function. For the canonical critical value z = 1.96, this gives p = 0.05, and that is why 1.96 appears in every statistics textbook.

z-test critical values
z = 1.645 ⇒ one-tailed p = 0.05 z = 1.96 ⇒ two-tailed p = 0.05
z = 2.576 ⇒ two-tailed p = 0.01 z = 3.291 ⇒ two-tailed p = 0.001
z = 5.000 ⇒ two-tailed p ≈ 5.7 × 10⁻⁷ (5σ, particle physics)

The numerical method for Φ in this calculator is the Hastings approximation, equation 26.2.17 in Abramowitz & Stegun. The maximum error is 7.5 × 10⁻⁸, well below the precision you would ever report in a manuscript.

P-value from a t-test

The t-test uses Student's t-distribution, which has heavier tails than the normal. The shape depends on the degrees of freedom (df). For a one-sample t-test, df = n - 1. For a Welch two-sample t-test on samples of size n₁ and n₂, df is computed by the Welch-Satterthwaite formula. For a paired t-test, df = n - 1 where n is the number of pairs.

The two-tailed p-value is p = 2 × (1 − F_t(|t|, df)). The calculator computes F_t through the regularized incomplete beta function, the method used in R, Python's scipy.stats, and Press et al.'s Numerical Recipes chapter 6.4. For df = 30, the critical t-value at α = 0.05 (two-tailed) is t = 2.042 — already close to z = 1.96, and the gap shrinks further as df grows.

One-tailed versus two-tailed p-values

Two-tailed is the safe default. The alternative hypothesis is "different from H₀," with no direction specified. A two-tailed test treats positive and negative deviations equally.

One-tailed tests have more statistical power but only when the predicted direction is correct. You must specify the direction before collecting data. If the result lies in the predicted direction, the one-tailed p-value is half the two-tailed value. If it lies in the wrong direction, the one-tailed p-value is greater than 0.5 — the test cannot find significance no matter how extreme the wrong-direction effect is.

Do not switch to one-tailed after the result

Choosing a one-tailed test after looking at the data is the classic p-hacking move: it cuts the reported p-value in half without any new information. Pre-register the test direction (in OSF, AsPredicted, or a journal protocol) and stick with it. The 2016 ASA statement on p-values explicitly warns against this practice.

P-value thresholds and what they mean

Different fields use different significance thresholds because the cost of a false positive differs. Social science accepts α = 0.05 in part because subsequent replications usually weed out false discoveries. Medical research often runs at α = 0.01 because a false-positive treatment claim can be life-threatening. Particle physics demands p < 3 × 10⁻⁷ (the 5σ standard) because it tests millions of channels, so even a strict-looking threshold becomes loose once the multiple-testing burden is included.

Social science
α = 0.05
one-in-twenty cutoff
Particle physics
α ≈ 3 × 10⁻⁷
5σ standard (Higgs, 2012)

Common p-value misinterpretations

The 2016 American Statistical Association statement lists six common errors. Three matter for almost every reader:

  • P is not the probability that H₀ is true. P is computed under H₀. To get P(H₀ | data), you need Bayes' theorem and a prior — which p-values do not provide.
  • P does not measure effect size. A tiny effect in a huge sample gives a small p-value. Always report the effect size (Cohen's d, η², odds ratio) alongside.
  • P > 0.05 is not evidence for H₀. It is absence of evidence against H₀, which is not the same thing. Low-power studies miss real effects often.
Tip

If you find yourself reporting a p-value, also report the confidence interval and the effect size. The CI tells readers the plausible range of the true effect; the effect size tells them whether it matters in practice. A 95% CI that excludes 0 conveys the same significance verdict as p < 0.05, but with much more information.

A short history of the p-value

Karl Pearson invented the chi-square p-value in 1900 for testing goodness of fit. Fisher generalized the concept in 1925 with z and t. Jerzy Neyman and Egon Pearson reformulated hypothesis testing in 1933 as a decision-theoretic framework with fixed α and β — the version most statistics textbooks teach. The two camps disagreed about almost everything: Fisher considered p-values as continuous evidence; Neyman and Pearson treated them as bright-line decisions. The hybrid "p < 0.05 cutoff" that dominates published research today is not what either camp actually proposed.

The American Statistical Association published its formal statement on p-values in 2016, followed by a special supplement of The American Statistician in 2019 titled "Moving to a World Beyond p < 0.05." The discussion is ongoing. The p-value is not going away, but the binary significant/non-significant verdict is increasingly seen as a relic of a less sophisticated era of statistical practice.

Did you know

The 2012 discovery of the Higgs boson at CERN was reported at 5σ significance — a one-in-3.5 million chance of seeing the bump at the right mass by accident. Both the ATLAS and CMS experiments hit the threshold independently. The conservative 5σ standard exists in particle physics specifically because the number of hypotheses tested in a typical analysis is enormous, and weaker cutoffs would generate constant false discoveries.

FAQ

A p-value is the probability of observing a test statistic at least as extreme as the one you got, assuming the null hypothesis is true. It does NOT mean the probability the null hypothesis is true. The American Statistical Association's 2016 statement on p-values calls this confusion "the most common misinterpretation in statistics."
By convention, p ≤ 0.05 is the threshold for "statistically significant" in social and biological science. Particle physics uses p ≤ 3 × 10⁻⁷ (the "5σ" standard). The 0.05 threshold was suggested by Ronald Fisher in 1925 as a convenient mark; it is not derived from any deeper principle.
Two-tailed: H₁ says the parameter is different from H₀ in either direction. One-tailed: H₁ says it is greater (or less, but not both). One-tailed p-values are half the two-tailed value when the result lies in the predicted direction. You must pick the tail before collecting data — switching tails after seeing the result is p-hacking.
For a two-tailed z-test, p = 2 × (1 − Φ(|z|)), where Φ is the standard normal CDF. For z = 1.96, this gives p ≈ 0.05. The calculator uses the Hastings approximation from Abramowitz & Stegun, accurate to 7.5 × 10⁻⁸.
For a two-tailed t-test, p = 2 × (1 − F_t(|t|, df)), where F_t is the Student t CDF. The calculator evaluates F_t through the regularized incomplete beta function, the same approach used in R's pt() and Python's scipy.stats.t.cdf().
It means there is a 4% probability of seeing the observed (or more extreme) data under the null hypothesis. At α = 0.05 this is "significant" — you reject H₀ — but it does NOT mean the alternative hypothesis is 96% likely. The p-value is about the data given H₀, not about H₀ given the data.
It means the observed data is exactly what you would expect under the null hypothesis. There is no evidence against H₀. Reporting "p = 0.5" is the same as saying you found nothing — which is perfectly fine to publish, but the publication bias against null results means it usually does not get printed.
In strict 0.05-cutoff thinking, p = 0.049 reaches significance and p = 0.051 does not. In practice the two are nearly identical and the binary cutoff is arbitrary. The ASA 2016 statement recommends reporting exact p-values and the effect size, not just the "sig / non-sig" verdict. Many journals now require effect sizes and confidence intervals alongside p-values.
Particle physics tests millions of hypotheses (every event, every channel, every cut). If you test enough hypotheses, you will randomly hit p < 0.05 by chance. The 5σ standard (p ≈ 3 × 10⁻⁷) corrects for this multiple-testing problem and is required for any claim of a new particle. The 2012 Higgs boson discovery hit 5σ in two independent ATLAS and CMS analyses.
z-test when you know the population standard deviation, or for very large samples where the sample SD is essentially the population SD. t-test when you estimate the SD from the sample. In practice almost all researchers use the t-test — knowing the population SD is rare. The two converge as df grows; at df = 100 they differ by less than 1.2% on the 0.05 critical value.