Professional Documents
Culture Documents
General Instructions:
1. Provide appropriate comments in your code.
2. Perform all task programmatically using Python libraries.
‘XYZ’ hardware service center is specialized in servicing the CPUs. The center has maintained the details
about the CPUs they have serviced which is available in “machine_data.csv”. The dataset has 209 rows
and 9 columns. [Source of raw dataset: UCI repository] The details of the columns are as follows:
Based on this data, ‘XYZ’ hardware service center would like to build a predictive model that predicts the
performance score for the new CPUs.
As a data science expert, you are expected to build the best model for the given scenario.
Problem statement:
3. Select ‘score’ as the target variable to be predicted and remaining features as predictors.
4. Split the data into training and testing data set in the ratio 80:20.
5. As part of model building, perform the following activities:
a. Based on the training data, build a Linear Regression model.
b. Find the train and the test score for the built model.
c. Calculate the adjusted R-Squared values on both the train and the test data.
6. Calculate the VIF values for all the features considered while building the model using the train data.
7. Based on the model built, predict the performance score of a new test sample/ new CPU instance
which is given below:
(Note: In the above hardware instance, the vendor value '14' is the label encoded value for vendor
'harris'.)