Python How to Calculate 95 Percent Confidence Interval

Calculating a 95% confidence interval in Python is essential for statistical analysis. This guide explains how to compute it using the scipy.stats module, provides a practical example, and discusses important considerations.

What is a 95% Confidence Interval?

A 95% confidence interval (CI) is a range of values that likely contains the true population parameter with 95% probability. For sample means, it's calculated using the sample mean, standard error, and critical t-value from the t-distribution.

Confidence Interval Formula

For a sample mean x̄, standard error SE, and critical t-value t:

CI = x̄ ± t × SE

The 95% confidence interval means that if we took many samples and calculated a 95% CI for each, about 95% of these intervals would contain the true population mean.

Python Method Using scipy.stats

The scipy.stats module provides the t.interval() function to calculate confidence intervals. Here's how to use it:

Note: This method assumes a normal distribution. For small samples (n < 30), use the t-distribution. For larger samples, the normal distribution approximation is acceptable.

Step-by-Step Code

Import the required functions from scipy.stats
Calculate the sample mean and standard deviation
Use t.interval() to compute the confidence interval

Python Code Example

from scipy import stats
import numpy as np

# Sample data
data = [23, 25, 28, 22, 27, 24, 26, 29, 21, 25]

# Calculate sample mean and standard deviation
sample_mean = np.mean(data)
sample_std = np.std(data, ddof=1)  # ddof=1 for sample standard deviation
n = len(data)

# Calculate 95% confidence interval
confidence_interval = stats.t.interval(
    confidence=0.95,
    df=n-1,
    loc=sample_mean,
    scale=sample_std/np.sqrt(n)
)

print(f"95% Confidence Interval: {confidence_interval}")

The output will show the lower and upper bounds of the confidence interval. For our example data, this would be approximately (22.5, 27.5).

Worked Example

Let's calculate a 95% confidence interval for the following test scores: 82, 85, 78, 90, 88, 84, 79, 86, 81, 83.

Step 1: Calculate Sample Statistics

Sample mean (x̄) = 83.5
Sample standard deviation (s) = 3.5
Sample size (n) = 10

Step 2: Find Critical t-value

For n=10, degrees of freedom (df) = 9. The critical t-value for 95% CI is approximately 2.262.

Step 3: Calculate Standard Error

SE = s / √n = 3.5 / √10 ≈ 1.17

Step 4: Compute Confidence Interval

CI = x̄ ± t × SE = 83.5 ± 2.262 × 1.17 ≈ (80.9, 86.1)

This means we're 95% confident the true population mean test score is between 80.9 and 86.1.

Interpreting Results

When you calculate a 95% confidence interval, you're making a probabilistic statement about the population parameter. Key points to remember:

The interval is not the probability that the true value is within the interval
If you took many samples and calculated 95% CIs, about 95% would contain the true value
A narrower interval indicates more precise estimation
Wider intervals occur with smaller sample sizes or higher variability

Practical Tip: Always report the sample size when sharing confidence intervals, as it affects the precision of your estimate.

Common Mistakes to Avoid

Using the wrong distribution: For small samples (n < 30), always use the t-distribution. For larger samples, you can use the normal distribution.
Incorrect degrees of freedom: Remember df = n - 1 for sample standard deviation calculations.
Misinterpreting the confidence level: A 95% CI doesn't mean there's a 95% chance the true value is in the interval.
Ignoring sample size: Smaller samples will naturally produce wider confidence intervals.

FAQ

What does a 95% confidence interval mean?: It means that if we took many samples and calculated a 95% CI for each, about 95% of these intervals would contain the true population parameter.
How do I choose between 90%, 95%, and 99% confidence levels?: Higher confidence levels (99%) give wider intervals, while lower levels (90%) give narrower intervals. Choose based on your desired precision and risk tolerance.
Can I use this method for proportions?: Yes, but you would use the normal approximation for proportions instead of the t-distribution. The scipy.stats module also provides norm.interval() for this purpose.
What if my data isn't normally distributed?: For non-normal data, consider using bootstrapping methods or transformations to achieve normality before calculating confidence intervals.
How do I report confidence intervals in a paper?: Use the format: "The 95% confidence interval was (lower bound, upper bound) based on a sample size of n."