You are on page 1of 5

DATA ANALYTICS & BUSINESS INTELLIGENCE

Multiple Linear Regression

For this tutorial, you will need to download the Toyota Corolla csv file from Week 5 section on
Canvas and save it to a convenient location.

Open R and launch Rattle.

1. Load the Toyota Corolla data set.


2. Ensure that Id is recognised as an Ident variable, that Price is the Target variable and that all the
others are Input variables. Ignore Cylinders (all the cars have the same number).
3. Untick partition and click on Execute.
Click on Explore. Select Summary as the Type and select Summary in the list of options. Click
on Execute.

Scroll through the output you obtained. How many different models are there? How many different
colours?
Now select Distributions as the Type. Here you will be able to produce visual representations of
the variables. Select one type for each variable. Histograms or boxplots are most useful for numeric
variables. Bar plots are most useful for categorical variables. Ensure that Advanced Graphics is
unticked under Settings and do NOT install the package “doBy”. Click on Execute.

Describe the location, spread and shape of the distributions of each of the numeric variables. What
is the most common category for each of the categorical variables? Why do you think we selected
Histogram for Met_Color and Automatic?
Now select Correlation as the Type. Ensure that Pearson is selected in the drop-down menu, and
select Ordered. This orders the variables by size of correlation. Ensure Advanced Graphics is
unticked under Settings. Click on Execute. You will be prompted to install the package “ellipse” –
click Yes.

The correlations between numeric variables are printed out, and a visual representation of the
correlation matrix is also produced. Which three numeric variables are most highly correlated?

Note: If you decide to transform any variables, you can do so using the Transform tab and should
do so before running the model. Rescale and Cleanup are the most useful options.
Select the Model tab and select Linear as the Type. Click on Execute.

What a disaster! Too many variables have caused a very complicated model to be fitted.

Go back to the Data tab and Ignore all variables except for the three that you found with high
correlation. Click on Execute. Return to the Model tab and have another go.

What is the predictive model for Price that Rattle produces? Write down the equation.

When you have a well-behaved predictive model, select the Evaluate tab. Under Type, tick Pr v
Ob. Click Execute. What do you see?

You might also like