T-Statistic Calculator

Compute the one-sample t-statistic, degrees of freedom (n-1), and two-tailed p-value from Student's t distribution.

Science Gosset 1908 R/SciPy parity
Rate this calculator · 4.5 (4)

One-sample t-statistic

Student's t distribution · two-tailed p-value · df = n-1

Instructions — T-Statistic Calculator

1

Enter your sample statistics

You need four numbers: the sample mean (x̄), the hypothesised population mean (μ), the sample standard deviation (s), and the sample size (n). The calculator handles the rest.

2

Pick a significance level

0.05 is the default in most scientific fields. 0.01 raises the bar for strict pre-registered work; 0.10 loosens it for exploratory pilots. The choice does not change t or p — only the decision rule.

3

Read t, df, and p

t measures how many standard errors separate your sample mean from the hypothesised value. p is the two-tailed probability of seeing a t this extreme by chance under the null. p ≤ α rejects the null.

Sample size matters: the t distribution converges to the normal as n grows. Below n = 30 the heavier tails punish small-sample inference; above n = 100 the difference from z is invisible.
Two-tailed by default: use a two-tailed p unless your hypothesis explicitly predicts a direction. Most journals require two-tailed by default.

Formulas

The one-sample t-test compares a sample mean against a known or hypothesised population mean when the population standard deviation is unknown. The estimator was published by William Sealy Gosset in 1908 under the pen name “Student” while he worked for the Guinness brewery in Dublin.

The t statistic
$$ t = \frac{\bar{x} - \mu}{s / \sqrt{n}} $$
x̄ is the sample mean, μ is the hypothesised population mean, s is the sample standard deviation, n is the sample size. Numerator measures distance; denominator measures the standard error of the mean.
Degrees of freedom
$$ df = n - 1 $$
One degree of freedom is consumed estimating the sample mean from the data. Smaller df means heavier tails in the t distribution — harder to reject the null on the same evidence.
Standard error of the mean
$$ SE = \frac{s}{\sqrt{n}} $$
The standard error of the sample mean. Halving the noise (s) and quadrupling the sample (n) both shrink SE by a factor of 2.
Two-tailed p-value
$$ p = P(|T| \geq |t|) = I_{x}\left(\tfrac{df}{2}, \tfrac{1}{2}\right),\; x = \tfrac{df}{df + t^{2}} $$
Computed from Student's t CDF using the regularized incomplete beta function. This is the exact same algorithm SciPy and R use under the hood.
Effect size (Cohen's d)
$$ d = \frac{\bar{x} - \mu}{s} $$
A standardised mean difference. d = 0.2 small, 0.5 medium, 0.8 large. Reports effect size, not just significance — an essential complement to p.
95% confidence interval
$$ CI_{95} = \bar{x} \pm t_{0.025,\,df} \cdot SE $$
If the CI contains μ, you fail to reject the null at α = 0.05. The CI carries the same information as the p-value but communicates it in original units.

Reference

Two-tailed critical t values
dfα = 0.10α = 0.05α = 0.01α = 0.001
16.31412.70663.657636.62
22.9204.3039.92531.598
52.0152.5714.0326.869
101.8122.2283.1694.587
151.7532.1312.9474.073
201.7252.0862.8453.850
301.6972.0422.7503.646
501.6762.0092.6783.496
1001.6601.9842.6263.390
∞ (z)1.6451.9602.5763.291

Interpretation guide

Match observed |t| against the critical column for your df. Larger values reject the null at that α.

Cohen's d effect sizes
dEffect
0.10Negligible
0.20Small
0.50Medium
0.80Large
1.20Very large
p-value reporting
pConvention
≥ 0.05Not significant (NS)
< 0.05Significant (*)
< 0.01Highly significant (**)
< 0.001Very highly significant (***)
< 0.0001Report as < 0.0001

APA 7th edition asks authors to report exact p to three decimals (or as < 0.001), the test statistic, degrees of freedom, and an effect size. Stars alone are no longer enough.

Article — T-Statistic Calculator

The t-statistic explained: from Gosset's brewery to modern p-values

The t-statistic is a standardised distance: how many standard errors of the mean separate your sample average from the value the null hypothesis predicts. The one-sample formula is t = (x̄ − μ) / (s/√n) with degrees of freedom n − 1. William Sealy Gosset published the underlying distribution in 1908 in Biometrika under the pen name "Student" while working as a chemist at Guinness in Dublin. The estimator he derived for small-sample inference has stayed in nearly every statistics package built since.

The two-tailed p-value answers a single question: how often would I get a t this far from zero by chance alone if the null were true? Small p values point to evidence against the null; large p values mean the data are consistent with it. The test does not prove anything, only weighs evidence.

What is the t-statistic?

The t-statistic compares an observed mean to a benchmark while accounting for sample noise. Take the difference between what you measured and what you expected, then divide by the standard error of the mean (s/√n). The result is unitless. A t of 3 means the sample mean is three standard errors above the hypothesised value.

The t exists because the population standard deviation σ is rarely known. If you knew σ, you would use a z-statistic with the normal distribution. In real research you estimate σ from the sample as s, and that estimation adds noise. Gosset's 1908 paper showed exactly how much: the sampling distribution shifts from normal to a heavier-tailed cousin that depends on sample size through degrees of freedom.

Did you know

Gosset's day job at Guinness involved comparing small batches of barley and yeast where samples might be just five or ten observations. The normal distribution gave wildly optimistic confidence in those tiny samples. His 1908 paper, "The Probable Error of a Mean", quietly fixed brewing science before it became a pillar of statistics.

The one-sample t-statistic formula

The one-sample case is the simplest application: a single sample, a single hypothesised population mean. The formula has four inputs and one output, with degrees of freedom following automatically from sample size.

One-sample t-statistic
t = (x̄ − μ) / (s/√n) df = n − 1
SE = s/√n two-tailed p from t CDF

The numerator (x̄ − μ) carries the signal: how far is your sample mean from the value you would expect under the null? The denominator s/√n carries the noise: how precisely is the mean estimated? Big t means signal dominates noise. Small t means the data are consistent with the null.

One subtlety: the t-statistic is symmetric in sign. The two-tailed p-value treats t = 2.1 and t = −2.1 identically — both lie 2.1 standard errors from zero. Direction matters only if you registered a one-tailed hypothesis in advance, which most contemporary editors will scrutinise carefully.

Degrees of freedom and the t-distribution

Degrees of freedom (df) is the count of independent pieces of information available for estimating variability. The one-sample test starts with n observations, spends one degree estimating the sample mean, and leaves n − 1 for estimating s. The t-distribution has one shape parameter, df, that controls how heavy the tails are.

At df = 1 the tails are so heavy that variance is undefined. At df = 30 the gap from normal is small but visible; by df = 100 it is invisible. The 95% two-tailed critical value drops from 12.7 at df = 1 to 2.23 at df = 10 to 1.98 at df = 100, approaching the normal value of 1.96.

Tiny samples need bigger t

At df = 4 you need t ≥ 2.78 to reject at α = 0.05 two-tailed; at df = 100 you only need t ≥ 1.98. The same observed effect can be significant with thirty subjects and not significant with five. Power, not just effect size, drives whether you see significance.

Computing the two-tailed p-value

The two-tailed p-value equals the probability that a Student's t random variable with df degrees of freedom falls at least |t| away from zero. It is computed exactly via the regularized incomplete beta function: p = I_x(df/2, 1/2) with x = df / (df + t²). No tables, no approximations once the beta function is in hand — this is the same algorithm SciPy's stats.t.sf and R's pt() rely on internally.

The traditional convention divides p-values into bands: p ≥ 0.05 reads as "not significant", p < 0.05 as significant, p < 0.01 as highly significant, p < 0.001 as very highly significant. Modern reporting standards (APA 7th edition, Nature guidelines) require exact p to three decimals where p ≥ 0.001 and "p < 0.001" otherwise. Stars-only reporting has fallen out of favour for fifteen years.

Tip

Always report an effect size alongside the t and p. A statistically significant t with a Cohen's d of 0.05 is mathematically real and practically trivial. The American Statistical Association's 2016 statement on p-values explicitly warns against significance-only inference.

t-statistic versus z-statistic

The t and z statistics share the same numerator structure: difference between observed and hypothesised value. The split is in the denominator. The z uses the known population standard deviation σ; the t uses the sample standard deviation s, with the extra uncertainty that s is itself an estimate. That uncertainty shows up as the heavier tails of the t-distribution at small df. By n = 30 the two test statistics give almost identical p-values; by n = 100 they are indistinguishable.

z-test
σ known
Rare in practice
t-test
s estimated
Standard for real data

A worked t-statistic example

Suppose a coffee roaster claims their beans average 50 mg of caffeine per gram. You sample 30 beans, measure each, and find x̄ = 52 mg/g with s = 8 mg/g. Plugging in: t = (52 − 50) / (8/√30) = 2 / 1.4606 = 1.3693. Degrees of freedom is 30 − 1 = 29.

The two-tailed p-value at t = 1.3693 and df = 29 is approximately 0.1815. With α = 0.05, you fail to reject the null — the data are consistent with the roaster's claim. Cohen's d is (52 − 50) / 8 = 0.25, a small effect. If you had sampled 200 beans instead of 30, the same x̄ and s would give t = 3.54, p < 0.001 — significant on identical evidence per sample, just with more of it.

Common t-statistic mistakes

The most common error is treating "fail to reject" as proof of the null. The test never proves the null; it only fails to find enough evidence against it. A high p-value with small n simply means the test was underpowered to detect the effect, not that the effect does not exist.

Another frequent mistake is mixing population and sample standard deviation. If you have a known σ, use a z-test. If you computed s from the same sample you are testing, use t. Plugging a known population SD into the t formula technically still works but is non-standard and wastes information.

Finally, watch for sample independence. The one-sample t-test assumes observations are independent draws from the population. Paired measurements (before/after on the same person), clustered samples (siblings, classmates), and serially correlated time series violate this and inflate the apparent significance. Use the paired t-test for matched pairs and a mixed-effects model for clustered data.

FAQ

The t-statistic measures how many standard errors separate your sample mean from the hypothesised population mean. A t of 2.0 means the sample mean is 2 standard errors away from μ — about as far as the 95% boundary for large samples.
The p-value is the probability of observing a t-statistic at least as extreme as the one you got, assuming the null hypothesis is true. If p ≤ α (usually 0.05), reject the null. If p > α, you fail to reject — not the same thing as proving it true.
For a one-sample t-test, df = n − 1. You lose one degree of freedom estimating the sample mean. Smaller df gives the t distribution heavier tails, which means you need larger |t| to reject the null at the same α.
Use t when the population standard deviation is unknown and you estimate it from your sample, which is almost always the case in practice. Use z when σ is known (rare) or when n is very large (the two distributions converge above n ≈ 100).
Two-tailed asks whether the mean differs in either direction (more or less). One-tailed asks whether it differs in one specific direction. This calculator returns two-tailed p, which is the default in most journals. Pre-register a directional hypothesis before using one-tailed.
It means the evidence is unlikely enough under the null that you choose to act as if the alternative is true. It does not prove the alternative, and it carries the standard risk of a Type I error at rate α.
The two-tailed p-value comes from Student's t cumulative distribution function via the regularized incomplete beta function: p = I_x(df/2, 1/2) where x = df / (df + t²). It is the same algorithm SciPy's stats.t.sf and R's pt() use internally.
William Sealy Gosset published the t-distribution in 1908 in the paper “The Probable Error of a Mean” in Biometrika. He used the pen name “Student” because his employer, the Guinness brewery in Dublin, prohibited staff from publishing under their real names after an earlier paper leaked trade secrets.
The one-sample t-test is robust to mild non-normality at n ≥ 15 and very robust at n ≥ 30. Below n = 10, the test loses power and is sensitive to outliers; consider Wilcoxon signed-rank as a non-parametric alternative if the data are not roughly symmetric.