You are on page 1of 3

How to Create a Correlation Matrix using Pandas

datatofish.com/correlation-matrix-pandas

In this short guide, I’ll show you how to create a Correlation Matrix using Pandas. I’ll also
review the steps to display the matrix using Seaborn and Matplotlib.

To start, here is a template that you can apply in order to create a correlation matrix using
pandas:

df.corr()

Step 1: Collect the Data


For example, I collected the following data about 3 variables:

A B C

45 38 10

37 31 15

42 26 17

35 28 21

39 33 12

Step 2: Create a DataFrame using Pandas


Next, create a DataFrame in order to capture the above dataset in Python:

import pandas as pd

data = {'A': [45,37,42,35,39],


'B': [38,31,26,28,33],
'C': [10,15,17,21,12]
}

df = pd.DataFrame(data,columns=['A','B','C'])
print (df)

Once you run the code, you’ll get the following DataFrame:
Step 3: Create a Correlation Matrix using Pandas
Now, create a correlation matrix using this template:

df.corr()

This is the complete Python code that you can use to create the correlation matrix for our
example:

import pandas as pd

data = {'A': [45,37,42,35,39],


'B': [38,31,26,28,33],
'C': [10,15,17,21,12]
}

df = pd.DataFrame(data,columns=['A','B','C'])

corrMatrix = df.corr()
print (corrMatrix)

Run the code in Python, and you’ll get the following matrix:

Step 4 (optional): Get a Visual Representation of the Correlation Matrix


using Seaborn and Matplotlib
You can use the seaborn and matplotlib packages in order to get a visual representation of
the correlation matrix.

First import the seaborn and matplotlib packages:

import seaborn as sn
import matplotlib.pyplot as plt

Then, add the following syntax at the bottom of the code:

sn.heatmap(corrMatrix, annot=True)
plt.show()
So the complete Python code would look like this:

import pandas as pd
import seaborn as sn
import matplotlib.pyplot as plt

data = {'A': [45,37,42,35,39],


'B': [38,31,26,28,33],
'C': [10,15,17,21,12]
}

df = pd.DataFrame(data,columns=['A','B','C'])

corrMatrix = df.corr()
sn.heatmap(corrMatrix, annot=True)
plt.show()

Run the code, and you’ll get the following correlation matrix:

You might also like