A statistical measure that quantifies the linear correlation between two variables, ranging from -1 to +1.

Pearson Correlation Coefficient

The Pearson correlation coefficient (r) is a fundamental statistical measure that quantifies the strength and direction of the linear relationship between two continuous variables. Developed by Karl Pearson in the early 1900s, it has become one of the most widely used tools in statistical analysis.

Mathematical Definition

The coefficient is calculated using the formula:

r = Σ(x - μx)(y - μy) / (σx × σy)

Where:

μx and μy are the means of variables x and y
σx and σy are the standard deviations
Σ represents the sum

Interpretation

The coefficient produces values between -1 and +1:

+1 indicates perfect positive correlation
0 indicates no linear correlation
-1 indicates perfect negative correlation

The strength of correlation can be generally interpreted as:

|r| > 0.7: Strong correlation
0.3 < |r| < 0.7: Moderate correlation
|r| < 0.3: Weak correlation

Key Properties

Symmetry: The correlation between X and Y equals the correlation between Y and X
Scale invariance: The coefficient is unchanged by linear transformations
Dimensionless: The measure has no units and is comparable across different datasets

Limitations and Considerations

Several important caveats apply when using Pearson's correlation:

It only measures linear relationships
Sensitive to outliers
Requires normal distribution for accurate inference
Does not imply causation

Applications

The coefficient finds widespread use in:

Alternative Measures

When assumptions for Pearson's correlation aren't met, alternatives include:

Statistical Significance

The significance of a correlation coefficient can be tested using:

Best Practices

Always visualize data using scatter plots
Check for assumption violations
Consider sample size requirements
Report confidence intervals when possible
Account for potential confounding variables

The Pearson correlation coefficient remains a cornerstone of modern statistical analysis, providing a standardized way to measure relationships between variables across diverse fields of study.