What is correlation coefficient?

The data depicted in figures 1–4 were simulated from a bivariate normal distribution of 500 observations with means 2 and 3 for the variables x and y respectively. Scatter plots were generated for the correlations 0.2, 0.5, 0.8 and −0.8. The correlation coefficient describes how one variable moves in relation to another. A positive correlation indicates that the two move in the same direction, with a value of 1 denoting a perfect positive correlation. A value of -1 shows a perfect negative, or inverse, correlation, while zero means no linear correlation exists. Scatterplots may be more useful when analyzing more complex data that might have changing relationships.

I would like to that Dr. Sarah White, PhD, for her comments throughout the development of this article and Nynke R. Van den Broek, PhD, FRCOG, DFFP, DTM&H, for allowing me to use a subset of her data for illustrations. Add correlation to one of your lists below, or create a new one. The inverse Fisher transformation brings the interval back to the correlation scale. A p-value is a measure of probability used for hypothesis testing.

Data Science – Statistics Correlation

The Pearson and Spearman correlations (Bonett & Wright, 2000) between the collected bioreactor features and the cardiomyocyte content were calculated. The Pearson correlation measures the strength of the linear relationship between two variables. It has a value between -1 to 1, with a value of -1 meaning a total negative linear correlation, 0 being no correlation, and + 1 meaning a total positive correlation. The Spearman correlation measures the strength of a monotonic relationship between two variables with the same scaling as the Pearson correlation.

The assumptions of the Spearman correlation are that data must be at least ordinal and the scores on one variable must be monotonically related to the other variable. Nor does the correlation coefficient show what proportion of the variation in the dependent variable is attributable to the independent variable. That’s shown by the coefficient of determination, also known as R-squared, which is simply the correlation coefficient squared.

Derived forms of correlation

Financial spreadsheets and software can calculate the value of correlation quickly. A correlation between age and height in children is fairly causally transparent, but a correlation between mood and health in people is less so. Does improved mood lead to improved health, or does good health lead to good mood, or both? In other words, a correlation can be taken as evidence for a possible causal relationship, but cannot indicate what the causal relationship, if any, might be. Pearson, Kendall and Spearman correlation are currently computed using pairwise complete observations.

  • A correlation coefficient is a statistical measure of the degree to which changes to the value of one variable predict change to the value of another.
  • This means that the amount of pizza slices eaten by your friends has a strong positive correlation with the amount of soda your friends will drink.
  • In contrast with the correlation value, which must be between − 1 and 1, the covariance may assume any numerical value.
  • Furthermore, we compare Kendall’s correlation, Kendall’s coefficient of concordance, and the kappa tests.
  • In other words, since more people like to buy ice cream when it’s hot outdoors, the company’s overall ice cream sales tend to be greater when it’s hotter outside.
  • A few years ago a survey of employees found a strong positive correlation between “Studying an external course” and Sick Days.

Typically, negatively correlated data sets are seen as a line the goes down and to the right on a scatter plot. When you look only at the orderings or ranks, all three relationships are perfect! The left and central plots show the observations where larger x values always correspond to larger y values. The right plot illustrates the opposite case, which is perfect negative rank correlation. Rank correlation compares the ranks or the orderings of the data related to two variables or dataset features.

Example: NumPy Correlation Calculation

These illusory correlations can occur both in scientific investigations and in real-world situations. An illusory correlation is the perception of a relationship between two variables when only a minor relationship—or none at all—actually exists. An illusory correlation does not always mean inferring causation; it can also mean inferring a relationship between two variables when one does not exist.

  • You can add some text and conditional formatting to clean up the result.
  • For instance, suppose research revealed a link between the amount of time students spend on their homework (from half an hour to three hours) and the number of G.C.S.E. passes (1 to 6).
  • Maternal age is continuous and usually skewed while parity is ordinal and skewed.
  • If two variables are negatively correlated, a decreasing linear line may be draw.
  • Correlation is tightly connected to other statistical quantities like the mean, standard deviation, variance, and covariance.
  • If two lists of data have a Pearson correlation of 1 or of − 1, this implies that one set of the data is redundant.

The x-axis of the scatterplot represents one of the variables being tested, while the y-axis of the scatter plot represents the other. Test alternative hypotheses for positive, negative, and nonzero correlation between the columns of two matrices. Compare values of the correlation coefficient and p-value in each case.


The correlation coefficient between historical returns can indicate whether adding an investment to a portfolio will improve its diversification. Assessments of correlation strength based on the correlation coefficient value vary by application. In physics and chemistry, a correlation coefficient should be lower than -0.9 or higher than 0.9 for the correlation to be considered meaningful, while in social sciences the threshold could be as high as -0.5 and as low as 0.5. This type of risk is specific to a company, industry, or asset class. Investing in different assets can reduce your portfolio’s correlation and reduce your exposure to unsystematic risk. Investment managers, traders, and analysts find it very important to calculate correlation because the risk reduction benefits of diversification rely on this statistic.


Two objects that correlated inversely (ie, one falling when the other rises) would have a Pearson score near − 1 (See Glossary items, Correlation distance, Normalized compression distance). Pearson’s correlation test measures relations between two continuous variables. We discuss the application of different types of correlation in this chapter. We also discuss the difference between correlation and concordance.

The Pearson correlation coefficient (r) is used to denote the linear relationship between two variables x and y whereby the Pearson value must be between -1 and +1. If Pearson’s r is negative, then the relationship is also negative, and if it is positive, the relationship is positive. Correlation measures the relationship, or association, between two variables by looking at how the variables change with respect to each other. Statistical correlation also corresponds to simultaneous changes between two variables, and it is usually represented by linear relationships. This is because a correlation describes how two or more variables are related, and not whether they cause changes in one another.

  • Low correlation describes a weaker correlation, meaning that the two variables are probably not related.
  • The Pearson correlation of a with b is 1 because the values of b are simply double the values of a; hence the values in a and b correlate perfectly with one another.
  • Again, the first row of xy represents one feature, while the second row represents the other.

For example, a trader might use historical Correlations to predict whether a company’s shares will rise or fall in response to a change in interest rates or commodity prices. Similarly, a portfolio manager might aim to reduce their risk by ensuring that the individual assets within their portfolio are not overly correlated with one another. Put option contracts become more profitable when the underlying stock price decreases. In other words, as the stock price increases, the put option prices go down, which is a direct and high-magnitude negative correlation. Correlation, in the finance and investment industries, is a statistic that measures the degree to which two securities move in relation to each other.

Leave a Reply