Professional Documents
Culture Documents
Steps:
Step 1: Import the relevant python libraries for the analysis
Import pandasaspd
Import numpyasnp
Import matplotlib.pyplotasplt
from sklearn import linear_model
from scipy import stats as st
import math
Step 2: Load the train and test dataset and clean the dataset
dirty_training_set = pd.read_csv('train.csv')
dirty_test_set = pd.read_csv(‘test.csv')
training_set = dirty_training_set.dropna()
test_set = dirty_test_set.dropna()
print ("Rows before clean: ", dirty_training_set.size, "\n")
print ("Rows after clean: ", training_set.size, "\n")
plt.subplot(1, 2, 1)
plt.title('X training set')
plt.hist(x_training_set)
plt.subplot(1, 2, 2)
plt.subplot(1, 2, 1)
plt.title('X training set')
plt.boxplot(x_training_set)
plt.subplot(1, 2, 2)
plt.title('Y training set')
plt.boxplot(y_training_set)
plt.show()
Output:
Output:
Output:
Conclusion:
As we expected it's a really good fit.