You are on page 1of 1

Modeling Procedures

1. Load the data using a ‘File’ node in the ‘Data’ group.


1) Choose the file named ‘Cover_Type.xlsx’ and then click the ‘Reload’ button on the
right. (It may take about 30 seconds to load the data)
2) Choose the sheet named ‘Raw Data’. (It may take about 20 seconds to change the
sheet)
3) Change the Role of attribute ‘Cover_type’ from ‘feature’ to ‘target’ by double
clicking.
4) Click ‘Apply’ and close the window.

2. Enable some visualization on data using the ‘distributions’ node in the ‘Visualize’
group.

3. Deal with missing values through the ‘Impute’ node in the ‘Data’ group.
1) Use ‘Average / Most Frequent’ for all numeric variables.
2) Use ‘Random Values’ for all categorical variables.
3) Create a new ‘Data Table’ named ‘Raw Data’ to view the imputed data.

4. Split the data into ‘train’ set and ‘test’ set (60:40), using the ‘Data Sampler’ node in
the ‘Data’ group.

5. Build a decision tree model with the “train’ set we just had using the ‘Tree’ node in
the ‘Model’ group. You can look at the tree model by adding the ‘Tree Viewer’ node in
the ‘Visualize’ group.

6. Make Prediction by using the ‘Predictions’ node in the ‘Evaluate’ group. Use the
tree model we just built, and the ‘test’ set as test data.

7. Check the prediction result by double clicking the ‘Predictions’ node. Save the result
of prediction as ‘.xlsx’ format using the ‘Save Data’ node.

8. Compare models using the ‘Test and Score’ node in the ‘Evaluate’ group. Use this
function to compare results of the ‘SVM’ and ‘Logistic Regression’ models.

9. Save the whole chart on canvas as the flow (.ows) file. You may reuse this flow next
time.

10. Upload the flow (.ows) file, Excel (.xlsx) file and the screenshots of ‘Prediction’,
‘Tree viewer’ and ‘Test and Score’ to moodle as your submitted assignments.

You might also like