Pearson Correlation Coefficient
A statistical measure that quantifies the linear correlation between two variables, ranging from -1 to +1.
Pearson Correlation Coefficient
The Pearson correlation coefficient (r) is a fundamental statistical measure that quantifies the strength and direction of the linear relationship between two continuous variables. Developed by Karl Pearson in the early 1900s, it has become one of the most widely used tools in statistical analysis.
Mathematical Definition
The coefficient is calculated using the formula:
r = Σ(x - μx)(y - μy) / (σx × σy)
Where:
- μx and μy are the means of variables x and y
- σx and σy are the standard deviations
- Σ represents the sum
Interpretation
The coefficient produces values between -1 and +1:
- +1 indicates perfect positive correlation
- 0 indicates no linear correlation
- -1 indicates perfect negative correlation
The strength of correlation can be generally interpreted as:
- |r| > 0.7: Strong correlation
- 0.3 < |r| < 0.7: Moderate correlation
- |r| < 0.3: Weak correlation
Key Properties
- Symmetry: The correlation between X and Y equals the correlation between Y and X
- Scale invariance: The coefficient is unchanged by linear transformations
- Dimensionless: The measure has no units and is comparable across different datasets
Limitations and Considerations
Several important caveats apply when using Pearson's correlation:
- It only measures linear relationships
- Sensitive to outliers
- Requires normal distribution for accurate inference
- Does not imply causation
Applications
The coefficient finds widespread use in:
- Scientific research
- Financial analysis
- Machine learning (feature selection)
- Quality control
- Pattern recognition
Alternative Measures
When assumptions for Pearson's correlation aren't met, alternatives include:
Statistical Significance
The significance of a correlation coefficient can be tested using:
- t-test
- p-value calculations
- Confidence intervals
Best Practices
- Always visualize data using scatter plots
- Check for assumption violations
- Consider sample size requirements
- Report confidence intervals when possible
- Account for potential confounding variables
The Pearson correlation coefficient remains a cornerstone of modern statistical analysis, providing a standardized way to measure relationships between variables across diverse fields of study.